Vol. XXXII Issue 1 Article 4 - Journal of Basic & Applied Genetics

Vol. XXXII Issue 1
Article 4

DOI: 10.35407/bag.2021.32.01.04

ARTÍCULOS ORIGINALES

Analysis of mixed data to select bananas clones (Musa SPP.) To be included in a germplasm bank

Análisis de datos mixtos para seleccionar clones de banana (Musa SPP.) A ser incluidos en un banco de germoplasma

Del Medico A. P. ¹

Tenaglia G.²

Lavalle A. L.³

Vitelleschi M. S.⁴

Pratta G. R.^{1,5 *}

¹ Instituto de Investigaciones en Ciencias Agrarias de Rosario, Consejo Nacional de Investigaciones Científicas y Técnicas.

² Centro de Investigación y Desarrollo Tecnológico para la Pequeña Agricultura Familiar, Instituto Nacional de Tecnología Agropecuaria.

³ Departamento de Estadística, Universidad Nacional del Comahue.

⁴ Consejo de Investigaciones de la Universidad Nacional de Rosario, Instituto de Investigaciones Teóricas y Aplicadas de la Escuela de Estadística, Facultad de Ciencias Económicas y Estadística, Universidad Nacional de Rosario.

⁵ Facultad de Ciencias Agrarias, Universidad Nacional de Rosario.

* Corresponding author: Guillermo R. Pratta gpratta@unr.edu.ar
ORCID 0000-0002-3682-0946

ABSTRACT

In an asexually reproducing hybrid such as banana (Musa spp.), the assessment of clones in the short term is limited because replications are frequently unavailable in the proper number. The aim of this work is to propose the Multiple Factor Analysis of Mixed Data (MFAmix) as a tool for establishing objective criteria to identify banana clones that preserve variability for qualitative and quantitative variables. In the long term, the aim is the development of a banana germplasm bank. MFAmix was applied on a population composed of 124 banana clones collected from different farmers’ fields and four controls. Two groups of variables related to the agronomic aptitude of the clones were evaluated, one composed of nine quantitative variables, and the other, composed of three dichotomous qualitative variables. A Selection Index (SI) was built from the MFAmix coordinates in order to rank the clones and select a subset that allows to preserve the existing genetic variability. The first two axes of MFAmix explained a 49.47% of the total data variability. The set of the banana clones was successfully characterized based on quantitative and qualitative variables. In the long term, the creation of a banana germplasm bank should consider the height and diameter of the plant, the rachis bunch weight and the hands weight, and the qualitative variable plant leafiness.

Key words: Asexual hybrid, Collection of germplasm, Multivariate analysis, Musaceae

RESUMEN

En un híbrido de reproducción asexual como banana (Musa spp.), la evaluación de los clones en el corto plazo es limitada debido a que generalmente no se cuenta con el número adecuado de repeticiones. El objetivo de este trabajo es aplicar la técnica de Análisis Factorial Múltiple de Datos Mixtos (AFMmix) como una herramienta para establecer criterios objetivos de manera de identificar clones de banana que preserven la variabilidad de los caracteres cualitativos y cuantitativos. A largo plazo, el objetivo es desarrollar un banco de germoplasma de banana. Se aplicó el AFMmix a una población de 124 clones de banana recolectados de diferentes campos de productores y cuatro testigos comerciales. Se evaluaron dos grupos de variables relacionadas con la aptitud agronómica de los clones, uno compuesto por nueve caracteres cuantitativos, y el otro, por tres caracteres cualitativos dicotómicos. Se construyó un Índice de Selección (IS) a partir de las coordenadas del AFMmix de manera de ordenar a los clones de banana para seleccionar un subconjunto de ellos que permita conservar la variabilidad genética existente. Los dos primeros ejes del AFMmix explicaron un 49,47% de la variabilidad total de los datos. Se caracterizó satisfactoriamente al conjunto de clones de banana a través de las variables cuantitativas y cualitativas. A largo plazo, en la creación de un banco de germoplasma de banana se debe considerar a la altura y diámetro de la planta, al peso del raquis y peso de las manos, y al carácter cualitativo frondosidad de la planta.

Palabras clave: Híbrido asexual, Colección de germoplasma, Análisis multivariado, Musaceae

Received: 12/19/2020
Revised version received: 05/03/2021
Accepted: 06/17/2021

INTRODUCTION

Banana (Musa spp.) is a crop of fundamental importance for the economies of many developing countries. In terms of gross production value, it is the fourth most important food crop in the world, after rice, wheat and maize (Arias et al., 2004). In northern Argentina, a subtropical area, the banana crop suffers from suboptimal climate conditions, affecting diversity. Thus, there are genotypes which are adapted to environments which are less favorable for traditional production. Therefore, in Argentina there is a crop diversity which is unique in the world (Ermini et al., 2013, 2016). The banana is an asexual reproduction hybrid, the selection of clones in the short term, is limited due to the lack of the appropriate number of repetitions. A germplasm bank is a collection of live plant material which aims to preserve the genetic variability existing in one or more species of interest. Germplasm banks are the main means to protect the plant diversity of the different crop species, and identify accessions for breeding programs, basic researches, and production. Agronomy problems derived from the excessive uniformity of the crops can be solved by introducing local varieties. Hence both the conservation and use of the genetic variability available in germplasm banks are presently revaluated (Defacio, 2009). Some authors reported the use of Principal Components Analysis (PCA) (Defacio, 2009) and Generalized Procrustes Analysis (GPA) (Bramardi, 2005) to identify varieties to conserve in a germplasm bank. The disadvantages of using these methodologies are as follows. Through PCA, it is not possible to analyze qualitative variables as active variables; i.e. they can only be introduced as supplementary variables, not intervening in calculating the coordinates of individuals and variables. Through GPA, it is possible to work with the synthetic variables resulting from applying PCA on the quantitative variables and Principal Coordinate Analysis (PCoA) on the qualitative variables. On this occasion, it is not possible to determine which variables are the ones which contribute the most to the formation of the axes or factors, making it difficult to characterize the cultivars through the evaluated variables. There is a relatively new methodology, the Multiple Factor Analysis of Mixed Data (MFAmix) (Pagès, 2002, 2014) that allows the characterization of the cultivars according to the quantitative and qualitative variables simultaneously and determines which variables mostly contribute to the total variability. MFAmix provides equations that are linear combinations of the original variables and form axes of highest variation that allow to differentiate cultivars. Therefore, the aim of this report was to propose the MFAmix as a tool for establishing objective criteria to identify banana clones that preserve variability for qualitative and quantitative variables. In the long term, the objective is to form a banana germplasm bank representing most of the plant diversity available in the northeastern region of Argentina.

MATERIALS AND METHODS

Plant material

The study population was composed of 124 banana clones collected from different producers’ fields in the province of Formosa, Argentina (Ermini et al., 2018) and four commercial varieties extensively used in the world production that were the experimental controls: Williams (Control 1), Jaffa (Control 2), Gal Azul (Control 3) and Gran Enanao (Control 4) (Figure 1).

Figure 1. Trees and bunches of banana fruit that correspond to clones collected from different farmers’ fields.

An augmented block design (Cotes and Ñústez, 2001) was carried out with 15 blocks of 14 plants each, where only the controls have repetitions. It was accomplished at the experimental field of INTA Formosa which is located in northern Argentina (26°11’31.8”S, 58°12’22.4”W), during the 2016-2017 crop season. Two groups of phenotypic variables related to the agronomic aptitude of the clones were evaluated, one composed of nine continuous quantitative variables, and the other, composed of three dichotomous qualitative variables. The quantitative variables were plant height (m), plant diameter (cm), rachis bunch weight (kg), hands weight (kg), second hand diameter (cm), last hand diameter (cm), second hand length (cm), last hand length (cm) and peel thickness (mm). In a previous communication (Del Medico et al., 2018a), the existence of genetic variability for these quantitative variables was verified by a method originally developed to take into account the lack of genotypic replications for clones due to the experimental design used. The qualitative variables were plant leafiness (low or high), bunch size (small or big) and prolificity of tier bunch (low or high).

Statistical methodology

The MFAmix (Pagès, 2002, 2014) was applied. This methodology allows the analysis of data tables in which the same group of individuals is described through a group of variables, evaluated in different conditions, moments or places. Variables can be quantitative or qualitative, with the only restriction that the nature of these variables must be the same within each group. MFAmix provides a similar weighting to both kinds of variables. In general terms, the MFAmix algorithm consists of two stages. The preliminary stage (separate analysis), in which each group of variables is analyzed separately, in the case of quantitative variables set, through a PCA and for qualitative variables set, through a Multiple Correspondence Analysis (MCA). The first eigenvalue from each of these analyses will be used in the subsequent step. The main stage (global analysis) consists in performing a PCA on the whole data resulting from the juxtaposition of the configurations obtained in the separate analysis, which are weighted by the inverse of the corresponding first eigenvalue. This weighting maintains the structure of each matrix and manages to balance the influence of the different groups of variables. The objective of the MFAmix technique is to highlight the main variability of individuals, the latter being balanced by the various groups of variables. A global measurement of the relation between the configurations of both groups of variables defined for the same individuals could be calculated through the RV coefficient (Abdi, 2007). The RV coefficient takes values between zero (the configurations are orthogonal) and one (the configurations are homothetic). A Selection Index (SI) was built from the coordinates of the quantitative variables obtained through MFAmix. SI is a linear combination of the standardized quantitative variables whose weights are their coordinates obtained through MFAmix, multiplied by the inertia explained in that factor. The quantitative variables involved in its construction were those whose contribution to each factor exceeded 2/3 of the corresponding maximum coefficient in absolute value. The construction of this SI was based on Del Medico et al. (2018b). The FactoMineR package of R statistical software was used to accomplish this analysis.

RESULTS

Multiple Factor Analysis of Mixed Data

The first PCA factor was moderately correlated with the first MCA factor (-0.61.) The rest of the correlations between the factors of the separate analyzes were low (Table 1). These correlation coefficients indicated that the intensity of the relationship between the two analyses was slight, being only linked on the first factor. Two factors were retained from the MFAmix, which explain 49.47% of the total data inertia (Table 2). No rotation method was used. The first two factors of the MFAmix were quite close to the factors of the same rank in the separate analyses, except the second factor of the quantitative group. Therefore, the use of MFAmix properly balanced the contribution of these two types of variables (Figure 2). The quantitative attributes which most contributed to the formation of the first factor were hands weight, rachis bunch weight, and diameter, width and height of the plant. In the second factor, no considerable contributions from the quantitative values were observed (Table 3). The qualitative variables mostly contributing to the first factor were bunch size and prolificity of tier bunch. On the second factor, the largest contribution was made by the qualitative variable plant leafiness, followed by bunch size (Table 3).

Table 1. Correlations between factors obtained in the preliminary stage (separate analysis) of Multiple Factorial Analysis of Mixed Data. The study population was composed of 124 banana clones and four controls. Two groups of variables related to the agronomic aptitude of the clones were evaluated, one composed of nine quantitative variables, and the other, composed of three dichotomous qualitative variables. Each group of variables was analyzed separately through a Principal Components Analysis (PCA) or Multiple Correspondence Analysis (MCA) as appropriate.

Table 2. Decomposition of total inertia by factor obtained through Multiple Factor Analysis of Mixed Data. The study population was composed of 124 banana clones and four controls. Two groups of variables related to the agronomic aptitude of the clones were evaluated, one composed of nine quantitative variables, and the other, composed of three dichotomous qualitative variables.

Figure 2. Factors of the separate analyzes on the first two axes of Multiple Factorial Analysis of Mixed Data (MFAmix). The study population was composed of 124 banana clones and four controls. Two groups of variables related to the agronomic aptitude of the clones were evaluated, one composed of nine quantitative variables, and the other composed of three dichotomous qualitative variables. Dim 1 and Dim 2 correspond to the Factor 1 and 2, respectively, obtained in the MFAmix.

Figure 3. Representation of the groups on the first two factors of Multiple Factorial Analysis of Mixed Data (MFAmix). The study population was composed of 124 banana clones and four controls. Two groups of variables related to the agronomic aptitude of the clones were evaluated, one composed of nine quantitative variables, and the other composed of three dichotomous qualitative variables. Dim 1 and Dim 2 correspond to the Factor 1 and 2, respectively, obtained in the MFAmix.

Table 4. Selection Index (SI) of banana clones classified according to plant leafiness (high or low). SI was built from the coordinates of the quantitative variables obtained through Multiple Factorial Analysis of Mixed Data (MFAmix). SI is a linear combination of the standardized quantitative variables whose weights are their coordinates obtained through MFAmix, multiplied by the inertia explained in that factor.

Descripción: a4tab4

Descripción: a4tab4b

In the representation of both groups of variables on the first factor plane, there were no observable differences on the first derived factor. However, both groups showed differences on the second factor (Figure 3).

The RV calculated between both groups of variables was equal to 0.30, indicating that the relationship between the configurations corresponding to both groups of variables under study was low, i.e., its information regarding total variability was complimentary. Considering that the RV obtained was low, that the discrepancies between the groups of variables appeared on the second factor, and that the evaluated variable which mostly contributes to the construction of such a factor was the qualitative variable plant leafiness, the banana clones were classified according to the aforementioned variable. Hence, two groups of clones were formed, one composed of Control 3, and four clones corresponding to plants with low plant leafiness, and the other one composed of Control 1, Control 2 and Control 4, and 78 clones corresponding to plants with high plant leafiness. For this reason, individuals were represented in the first principal plane of MFAmix, according to the qualitative variable plant leafiness. The first factor orders the individuals according to the quantitative variables. The second factor separates the individuals according to the qualitative variable plant leafiness. The clones corresponding to plants with high plant leafiness were found in the superior part (in red) and plants with low plant leafiness in the inferior part (in green) (Figure 4).

Figure 4. Representation of banana clones on the first two factors of Multiple Factorial Analysis of Mixed Data (MFAmix), according to the qualitative variable plant leafiness. The study population was composed of 124 banana clones and four controls. Two groups of variables related to the agronomic aptitude of the clones were evaluated, one composed of nine quantitative variables, and the other composed of three dichotomous qualitative variables. Dim 1 and Dim 2 correspond to the Factor 1 and 2, respectively, obtained in the MFAmix

Selection Index

Only the first factor was included in the construction of the SI, given that the quantitative variables did not present considerable contributions to the second MFAmix factor (Table 4) Based on data presented in Table 2 and Table 3, the SI constructed is:

SI = 1.64 (0.62 plant height + 0.77 plant diameter + + 0.82 rachis bunch weigh+ 0.83 hands weight)

The banana clones were arranged according to this SI in each of the two groups previously determined according to the plant leafiness (high or low) (Table 4). Highlighted numbers in Table 4 identify the clones selected for the construction of the germplasm bank. 40 clones were selected to create the germplasm bank, which represent approximately 30% of the total banana clones studied in this research. The selected number of clones in each group was proportional to their size. It is recommended, in order to preserve the existing genetic variability, to select clones with high, moderate and low SI inside each group.

DISCUSSION

Adequate classification and conservation of the variability present in the crops and their relatives are essential for the conformation of germplasm banks, which results critical for future breeding programs (Fundora Mayor et al., 2004). The abundance of material to evaluate, the handling limitations and the fact that, in general, many variables are studied jointly, make the conformation of a germplasm bank more difficult.
The use of quantitative and qualitative variables allows the characterization of crops in a different and complementary manner. For this reason, it is important to use an analysis technique which gets a consensus between both types of variables (Defacio, 2009). For example, Bramardi et al. (2005) evaluated cucumber cultivars for agronomic variables of qualitative and quantitative classes, using the GPA technique for the joint analysis, and Defacio (2016) evaluated local maize populations by GPA technique with the aim of simultaneously analyzing the quantitative and qualitative variables.
This methodology is used in order to deal jointly with both kinds of variables. In those cases, more numerous groups of cultivars were obtained using each kind of variable separately. However, through GPA, it is not possible to determine which variables are the most contributing to the formation of the axes or factors, which makes it difficult to characterize the cultivars through the evaluated variables. The benefit of MFAmix over other existent methodologies is that it assigns equal importance to both groups of variables. Additionally, it allows the characterization of the individuals according to the quantitative and qualitative variables, and thus to form a subset which presents the greater diversity. The MFAmix is a technique that allows deciding a selection criterion that involves variables of different nature. Therefore, this methodology is an appropriate tool for establishing objective criteria through the construction of a SI for identifying banana clones that represent the plant diversity available in the Argentinian Northeast.

CONCLUSION

In the present study, through the MFAmix technique, associations between both groups of variables were detected, and the characterization of banana clones according to quantitative and qualitative variables was successful. In the long term, the creation of a banana germplasm bank should consider the quantitative variables plant height and plant diameter, rachis bunch weight and hands weight, as well as the qualitative variable plant leafiness.

REFERENCES

Abdi H. (2007) RV coefficient and congruence coefficient. In: Salkind N.J. (Eds.) Encyclopedia of measurement and statistics. SAGE Publications Inc., Thousand Oaks, California, pp. 849-853.

Arias P., Dankers C., Liu P., Pilkauskas P. (2004) La economía mundial del banano 1985-2002. Food & Agriculture Organization.

Bramardi S.J., Bernet G.P., Asíns M.J., Carbonell E.A. (2005) Simultaneous agronomic and molecular characterization of genotypes via the generalised procrustes analysis: an application to cucumber. Crop Science. 45: 1603-1609.

Cotes J.M., Ñústez C.E. (2001) Propuesta para el análisis de diseños aumentados en fitomejoramiento: Un caso en papa. Revista Latinoamericana de la Papa. 12: 15-34.

Defacio R.A. (2009) Caracterización y evaluación de la variabilidad genética en poblaciones nativas de maíz (Zea mays L.) de la provincia de Buenos Aires en base a descriptores morfológicos y agronómicos. Master thesis, Universidad Nacional de Rosario, Argentina.

Defacio R.A. (2016) Evaluación comparativa de distintas estrategias de análisis de datos para la caracterización y ordenamiento de la variabilidad genética de poblaciones locales de maíz (Zea mays L.). Doctoral thesis, Universidad Nacional de Rosario, Argentina.

Del Medico A.P., Ermini J.L., Tenaglia G.C., Vitelleschi M.S., Lavalle A.L., Pratta G.R. (2018a) Propuesta de un enfoque estadístico para seleccionar clones de banana a partir de un diseño aumentado teniendo en cuenta la variabilidad genética. Journal of Basic and Applied Genetics. 14: 51-57.

Del Medico A.P., Ermini J.L., Tenaglia G.C., Vitelleschi M.S., Lavalle A.L., Pratta G.R. (2018b) Índice de selección para múltiples caracteres en una población de banana (Musa spp.). XXIII Reunión Científica del Grupo Argentino de Biometría. II Encuentro Argentino - Chileno de Biometría. Octubre 2018, Neuquén, Argentina, pp. 53

Ermini J.L., Pantuso F.S., Tenaglia G.C., Pratta G.R. (2013) Marcadores de AFLP en el cultivo de banana: selección de combinaciones de cebadores y caracterización de la biodiversidad. Revista de la Facultad de Ciencias Exactas, Químicas y Naturales, Universidad de Morón. 11: 83-110.

Ermini, J.L., Tenaglia, G.C., Pratta, G.R. (2016) Genetic diversity, ancestry relationships and consensus among phenotype and genotype in banana (Musa acuminata) clones from Formosa (Argentina) farmers. Plant Cell Biotechnology and Molecular Biology. 17: 267-278.

Ermini, J.L., Tenaglia, G.C., Pratta, G.R. (2018) Molecular diversity in selected banana clones (Musa AAA “Cavendish”): adapted to the subtropical environment of Formosa Province (Argentina). American Journal of Plant Science. 9: 2504-2513.

Fundora Mayor Z., Hernandez M., López R., Fernández L., Sánchez A., López J., Ravelo E.L. (2004) Analysis of the variability in collected peanut (Arachis hypogaea L.) cultivars for the establishment of core collections. Plant Genetic Resources Newsletter. 137: 9-13.

Pagès J. Analyse factorielle de données mixtes. (2002). Revue de Statistique appliquée. 52: 93-111.

Pagès J. (2014). Multiple factor analysis by example using R. CRC Press, Boca Raton.