Assessment of the association between yield parameters and polymorphic sites of the Ghd7 locus in a core-set of rice cultivars grown in Sri Lanka

Yield improvement is a major aspect in rice breeding programs. Ghd7 is a pleiotropic gene which regulates yield, plant height, and heading date in rice. Although Ghd7 has been previously cloned and sequenced in several other rice cultivars, none of the studies have been conducted for Sri Lankan rice germplasm. Therefore, in this study, we assessed the yield performance of 12 rice cultivars, genetic polymorphism of Ghd7 locus and associations between DNA markers and yield traits. Breeder seeds of the selected cultivars were obtained from RRDI, Bathalagoda, and established under greenhouse conditions at Peradeniya, Sri Lanka in Maha and Yala seasons, 2017. The cultivars were subjected to morphometric analysis, genotyped for 12 DNA markers and sequenced at Seq7-8 locus. Morphometric data were subjected to single marker analysis using General Linear Model (GLM) in SAS 9.4. Here we reported six marker haplotypes based on the arrangement of 13 DNA maker alleles at Ghd7. Moreover, twenty-three SNP/INDEL variations at Seq7-8 locus revealed close genetic relationships between the rice cultivars Bg 90-2, Bg 352 and At 307, Bg310. Four polymorphic markers (Seq7-8, Seq1-2, RM5436 and RM5346) were significantly associated with rice yield traits so that they could be used in marker assisted selection. The SNPs/INDELs of Ghd7 were significantly associated with all the yield traits except 100 seed weight and 100 endosperm weight. Thus the present study demonstrate the possibility of employing marker assisted breeding to improve rice yield using the polymorphic genomic information at Ghd7 locus.


INTRODUCTION
Rice yield is a complex agronomic trait governed by both the genetic constitution of the plant and the environment (Yan et al., 2011). Moreover, rice yield is a combination of three individual traits; namely the grain yield, plant height and heading date (Xue et al., 2008). The grain yield in turn depends on the number of panicles per plant, number of grains per panicle and grain weight (Huang et al., 2009). Plant height regulates the plant architecture while the heading date facilitates the environmental adaptation of rice. Also, the plant height and the heading date collectively have a greater contribution on rice yield (Xue et al., 2008).
Genetically, rice yield, governed by quantitative trait loci (QTLs), shows continuous variation in segregating populations. Thus individual QTLs have minor effects on overall variations of the yield (Yano and Sasaki, 1997). Ghd7 has been identified as one of such important QTLs, which plays a key role in improving the rice yield by simultaneous regulation of the plant height, grain number and the heading date (Xue et al.,2008). Higher expression of Ghd7 causes increased number of tillers, taller plants and delayed heading dates. Ghd7 is a 3,917bp long gene and its cDNA is1,014bp long. The two exon regions in Ghd7 encodes a CCT domain protein which contains 257 amino acids Xue et al., 2008). This protein was identified in both indica rice varieties such as Minghui 63 and japonica rice varieties such as Nipponbare (Xue et al., 2008;Lu et al., 2012).
Previous studies have revealed the features of Ghd7 in many rice germplasms belonging to indica and japonica varieties. Lu et al., (2012) studied the nucleotide diversity of the whole Ghd7 gene among a set of Oryza sativa cultivars. Studies have also shown that Ghd7 mediated flowering can be observed not only in rice, but also in other grasses such as maize, sorghum and purple-false brome (Brachypodium distachyon) (Yang et al., 2012). Xing et al., (2014) has successfully cloned and mapped Ghd7 to a position of 0.4 cM in Chromosome 7 in rice using Minghui 63 and Zhenshan 97 cultivars. The haplotypes based on SNPs and protein diversity of Ghd7 also have been identified (Xue et al., 2008;Lu et al., 2012). Moreover, the regulation of Ghd7 expression by various environmental signals (Weng et al., 2014) and its effects on other genes in flowering pathway of rice have been studied in detail (Matsubara et al., 2011). Furthermore, the involvement of Ghd7 in regulating various other aspects of plant development such as plant architecture, branching of tillers, hormone metabolism and plant responses to various environmental stresses have been reported (Weng et al., 2014).
However, the previous studies on Ghd7 have not considered the Sri Lankan rice germplasm. Therefore, the present study was focused on 10 Sri Lankan rice varieties and two landraces (hereafter mentioned as cultivars) in order to assess the potential of using the Ghd7 polymorphism in marker assisted breeding (MAB). The selected set of cultivars represents three yield-classes (high, moderate and low). We aimed to detect the genetic polymorphism of the Ghd7 locus among rice cultivars using DNA markers linked to the Ghd7 locus and then to assess the associations of the yield traits with DNA markers and SNP/INDEL variations present within the Seq7-8 locus which is located within Ghd7.

Plant material
A total of 12 rice cultivars under three main yield categories; high, moderate and low were selected and breeder seeds were obtained for planting from the Rice Research and Development Institute (RRDI) Bathalagoda, Sri Lanka (Table 1).

Planting and sample collection for DNA extraction
Rice seeds were soaked in water for 24 hours and wrapped in a wet tissue paper until germination. About three quarter of each pot was filled with soil and water (~ 2.5 cm) was added over the soil surface. The plants were managed in a greenhouse at University of Peradeniya, Sri Lanka according to the standard agronomic practices recommended by the Department of Agriculture (DOA), Sri Lanka. The pots were laid out as a Completely Randomized Design (CRD). Ten seeds were placed on each pot and four pots were maintained per each cultivar. In each pot, four plants were selected after two-weeks based on their health and growth and the rest of the plants were discarded. The trial was repeated in both Maha and Yala seasons in the year 2017. Leaf samples were collected from each cultivar at the age of three weeks. The collected samples were crushed in liquid nitrogen and stored at -80 ºC.

Morphological data for marker-trait association analyses
The vegetative growth parameters; plant height (PIH), culm length (CL), number of tillers (NT), leaf blade length (LBL) and leaf blade width (LBW) were measured in 16 plants per each cultivar at the ages of 3, 6, 9, 12 and 15 weeks for both Maha and Yala seasons. Then, the reproductive parameters; heading date (HD) (days for the first panicle to emerge), flag leaf length (FLL), flag leaf width (FLW) were recorded at blooming. Simultaneously, the harvesting parameters; the days to harvest from the initial establishment, overall yield per plant, seed number per plant (SN), seed length (SL), seed width, seed weight, 100 seed weight (HSW) and 100 endosperm weight (HEW), endosperm length (EL) and endosperm width (EW) were measured. All the parameters were assessed with reference to the previous reports on morphometric measurements in rice (Xue et al., 2008;Zhang et al., 2015;Liu et al., 2013).

DNA extraction, PCR and sequencing
The genomic DNA was extracted from the crushed immature leaf samples using CTAB method (Nageswara-Rao et al., 2013). Then, DNA was amplified using seven DNA markers ( Table 2). The PCR mixture contained the total volume of 10 µl which comprised of 5 µl of 2×Go Taq Green Master Mix (Promega Madison, WI USA), 0.5 µl each forward and reverse primers (10 µM), 3 µl nuclease free water and 1µl of respective template DNA (50 -80 ng/ µl). The amplification was performed in a Thermal Cycler (TP600: Takara, Otsu Shiga, Japan) under the conditions; initial denaturation at 94 ºC for 5 mins, followed by 35 cycles of 30 sec denaturation at 94 ºC, 1 min annealing at relevant primer annealing temperatures (Table 2), and 2 mins extension at 72 ºC followed by a final extension at 72 ºC for 10 mins. The amplified PCR products were resolved using 1.5% agarose gel electrophoresis. The Seq7-8 PCR products were purified using the QIA quick PCR purification kit (Catalog No: 28104, Qiagen, Hilten, Germany) and then sanger-sequenced using the Genetic Analyzer 3500 (Catalog No:622-0010, Applied Bio System®). All the generated sequences are found under the Genbank accession numbers; MH779562-MH779573.

DNA marker length polymorphism of Ghd7 and association with yield traits
All the collected morphometric data were subjected to normality test. Then the marker-trait association was evaluated with DNA marker and SNP/INDEL analyses using General Linear Model (GLM) and LS means/pdiff mean separation procedures using statistical package SAS 9.4 (SAS institute, NC, Cary, USA). DNA markers showing polymorphic bands were subjected to allele- Table 2: DNA markers used in the present study.

Location related to Ghd7
Forward primer (5'®3') Reverse primer (5'® 3') Ta*ºC Reference *Ta-Primer annealing temperature # -Monomorphic marker to detect the quality of DNA and positivity of PCR scoring. A score of '1' was assigned for the presence of the band and a score of '0' was assigned for the absence of the band. An unrooted UPGMA tree was constructed based on the binary data of the polymorphic marker alleles using Nei-Gobi method in PHYLIP software package (version 3.6) (Felsenstein, 2005). The polymorphic band data were subjected to single marker analysis separately with the yield traits; overall yield, SN, seed weight, seed width, SL, EL, EW, HSW and HEW by employing GLM procedure in SAS. The association between the yield class and the marker alleles was analyzed using cross tabulation and chisquare test in Minitab16.0 (Minitab Inc., USA).

SNP/INDEL polymorphism at Seq7-8 and its association with yield traits
The DNA sequences of Seq7-8 locus were aligned using ClustalW algorithm (Thompson et al., 1997) (Rambaut, 2014). The allelic data obtained from the sequence polymorphism were subjected to single marker analysis. The associations between the SNP/INDEL allelic states at each variable position and the yield traits including overall yield, SN, seed weight, SL, seed width, EL, EW, HSW and HEW were analyzed by using GLM procedure in SAS.

DNA length polymorphism of Ghd7
For the 12 rice cultivars, 13 alleles with an average of 1.86 alleles per locus were detected for seven DNA markers. The mean number of alleles detected in the present study is lower than results reported in similar previous studies Nagaraju et al., (2002); Joshi and Behera, (2006) and Herrera et al., (2008) (3.8, 4.58, 2.6 and 4.23 alleles respectively). However, the number of alleles detected per locus in the present study is comparable with the study reported by Singh et al., (2004). Out of the seven DNA markers, four (RM5346, RM5436, Seq1-2 and Seq7-8) were polymorphic. However, all the seven DNA markers gave successful amplifications for all 12 cultivars. Furthermore, in this study the number of alleles detected per locus was in the range of one to three. The minimum number of alleles (one) was detected in three loci; RM 5499, RM 1135 and G7rq. Two alleles were detected at three loci; RM5346, RM5436 and Seq7-8 while three alleles (the highest polymorphism) were detected for Seq1-2 marker. Figure 1 shows the allelic diversity of the seven DNA markers within Ghd7 locus for the 12 rice cultivars. The cultivar Bg 90-2 was used as the reference haplotype since it could be identified as the highest yielding genotype (data recorded at RRDI, Bathalagoda). Accordingly, ten marker haplotypes could be identified with the reference haplotype of Bg 90-2. It was noticed that, the rice cultivars under the same yield category did not fall under the same haplotype. Instead, the high-yielding: Bg 366, moderately-yielding: Bg 310 and low-yielding: Bw 272-6b cultivars fell under the same haplotype, thus reflecting a close genetic relationship among these three cultivars regardless of their yield status. In contrast, other nine individual cultivars fell under nine distinct marker haplotypes, thereby revealing a higher genetic divergence among them.
Furthermore, the unrooted unweighted neighbor-joining tree constructed based on DNA band data, categorized the 12 rice cultivars as shown in Figure 1. Accordingly, Bg 366, At 307, Bg 310, Bw 367 and Bw 272-6b were grouped into one cluster. Furthermore, At 362 and Bg 300 claded together while Bg 352 and Pachchaperumal clustered together. The cultivars within the same cluster suggest their close genetic relationship. However, Bg 90-2, Bg 250 and Suwadhal remained un-clustered with three unique haplotypes, revealing their genetic divergence with respect to all the other cultivars ( Figure 1). Moreover, the two landraces; Pachchaperumal and Suwadhal had a less genetic similarity between them which is also reflected by marker-haplotype analysis in which they fell under two distinct haplotypes. Furthermore, it was noticed that all the rice cultivars belonging to the same yield category was not co-clustered, deducing a low genomic similarity among them and revealing that Ghd7 is not the only genetic player in deciding the rice yield.

Association between the polymorphic DNA markers and the yield related traits
The genetic polymorphism among the rice cultivars further gave an insight into marker-trait associations which facilitates the marker assisted selection (MAS), where superior cultivars are selected with the use of molecular markers. The results of marker-trait association in Maha season and Yala season are given in Table 3 and Table 4 respectively (P<0.05). A total of 30 significant marker-trait associations could be detected in Maha season, while it was only 25 for the Yala season (P<0.05). These results vary from those in similar previous studies reporting multiple seasons (Vanniarajan et al., 2012). Such discrepancies might be due to the differences in number and the variations of selected rice cultivars, yield related traits, selected markers and the environment in which the particular study was conducted. Incompatibility of the results between the two seasons might be due to the environmental factors such as temperature, light conditions which affect the overall yield complicating the process of MAS (Schmierer et al., 2004;Reyna and Sneller, 2001).
In Maha season, out of the four polymorphic markers, Seq7-8 marker was detected to be significantly associated with seven traits; overall yield, SL, EL, seed weight, width, SN and EW. In Yala season, both the RM5436 and RM5346 markers were significantly associated with six traits; overall yield, SN, seed weight, seed width, EW, SL and EL, seed weight, SL, seed width, EL and EW respectively. Out of the polymorphic DNA markers, only RM5346 and Seq7-8 markers were shown to have significant associations with different yield traits in both Maha and Yala seasons. DNA Figure 1: Unrooted unweighted neighbor-joining tree constructed for rice cultivars based on polymorphic marker loci linked to the Ghd7 QTL region using Darwin software based on Jaccard coefficient (Perrier et al., 2003).The yield status is indicated at the right side of the figure. Means denoted by the same letters within the column are not significantly different at P<0.05. Presence of the allele is shown as 1 and absence is shown as 0.  *Non-estimable: Only one individual in one genotypic class marker RM5346 was significantly associated with overall yield, seed weight, SL, seed width, EL, EW and Seq7-8 was significantly associated with SL, EL, SN, seed width and EW in both the seasons (P<0.05).This suggests that RM5346 and Seq7-8 markers could be more useful than other tested markers in implementing MAS for the studied rice cultivars.
The association between the polymorphic marker allele and the yield category (high, moderate and low), analyzed by cross tabulation and chi-square test analyses are given in the Table 5 (P<0.05). The Cramer's V-square coefficient revealed that there was a weak association between the marker allele and the yield category (Cramer's V-square was closer to 0). This reveals that although the type of the DNA marker allele has considerable effects on the overall yield of the rice cultivars, it has no significant effect on the yield category (P<0.05).

SNP/INDEL polymorphism at Seq7-8 locus
The analyses of DNA length polymorphisms further facilitated the sequencing of rice cultivars at Seq7-8 locus. This locus was specifically chosen due to its significant polymorphism. The nucleotide variations in the Seq7-8 locus for 12 rice cultivars are shown in Figure 2. When all the 12 DNA sequences were aligned, a total of 23 nucleotide polymorphic positions including 11 single nucleotide polymorphisms(SNPs) were detected. At each SNP position, bi-allelic SNPs were only detected and accordingly, ten haplotypes could be detected. Moreover, it could be observed that Bg 90-2 and Bg 352 were falling under the same haplotype reflecting a close genetic relationship while At 307 and Bg 310 were falling under a separate haplotype reflecting their genetic divergence (Figure 3). The present findings revealed that the polymorphic SNPs and INDELs among these 12 rice cultivars in Seq7-8 locus could be successfully employed in predicting the genetic diversity. The sequence alignment and identification of 23 nucleotide polymorphic positions at Seq7-8 locus enabled the construction of an unrooted neighbor-joining tree ( Figure 3). Accordingly, Bg 90-2 clustered with Bg 352 and At 307 clustered with Bg 310. This high genetic similarity between these two pairs of rice cultivars is also confirmed by sequence alignment ( Figure  2).

Association between Ghd7sequence polymorphism and yield traits
The rice yield related traits evaluated in this study and the state of the association of these traits with SNP/ INDELs in Maha and Yala seasons are depicted in Tables 6 and 7 respectively (P<0.05). The association analysis revealed that only some of the SNPs/INDELs were significantly associated with rice yield traits (Table 6 and 7). Accordingly, in Maha season, only four SNPs among 23 nucleotide variations showed strong associations with the overall yield (P<0.05). It was noticed that the overall yield was significantly high in the presence of G, A, T and G bases at 747, 862, 876 and 898 base positions respectively. Moreover, all SNPs and INDELs were strongly associated with seed width. These strong associations elucidate a high contribution of nucleotide variation on the development of seeds (P<0.05). In Yala season, a total of two SNPs and three INDELs were shown to be having significant associations with the overall yield (P<0.05). A significantly higher overall yield was observed in SNPs at: 747 th (G), 876 th (T), and 889 th (G), when the respective allele was present. Moreover, a higher yield was reported in Yala season for INDELs at: 870 th (A) and 894 th (A) positions, when the corresponding alleles were absent (P<0.05). Furthermore, the highest number of significant associations (23 and 20) could be detected between seed width and polymorphic SNP/INDEL variations in both Maha and Yala seasons respectively (P<0.05). However, the number and the type of the remaining associations were highly variable between the two seasons. Therefore, it was evident that due to the variations in yield parameters shown in two different seasons, the association results showed variations which could in turn validate environmental influence attributed to the variations in the overall yield (Table 6 and Table  7).Thus, these dissimilarities of the results between the two seasons might be complicating the process of MAS as observed in Moreau et al., (2004a); Moreau et al., (2004b);Schmierer et al., (2004); Reyna and Sneller, (2001).
However, no significant associations could be detected between any of the nucleotide variations and the traits HSW and HEW in both seasons (P<0.05). This fact explicates that the nucleotide variation of Seq7-8 locus has no influence on HSW and HEW measurements. This result is in agreement with marker-trait association analysis in which, both the type of the Seq7-8 marker allele and the presence/absence of the respective allele did not influence on HSW and HEW measurements. In consolation, Suresh et al., (2017) has analyzed the associations between the whole sequence of Ghd7 gene and grain yield parameters in a set of indica rice genotypes belonging to high and low yielding categories and results are quite comparable with our findings. Previously, the nucleotide diversity of the whole Ghd7 gene and its associations with rice yield traits have been reported for different rice cultivars as mentioned above. However, in this study we analyzed the nucleotide diversity of Ghd7 and its associations with yield traits for a set of Sri Lankan rice cultivars. Since none of the Ghd7 related studies have been conducted for Sri Lankan rice germplasm, the results of the present study cannot be directly compared with those of the other studies. However, all these findings from different studies collectively give a deep insight into the features of Ghd7 QTL and its contribution on yield performance of rice.   Means denoted by the same letters within the column are not significantly different at P<0.05

CONCLUSIONS
The genetic divergence among cultivars was well supported by DNA length polymorphism and SNP/INDEL polymorphisms. Based on DNA length polymorphism six distinct marker haplotypes could be recognized. The SNP/ INDELs present in the Seq7-8 locus also reveal a high genetic diversity among the genotypes, which was validated by the sequence based haplotype alignment. The analysis of SNP/INDEL polymorphism enabled the identification of ten haplotypes. Accordingly, Bg 90-2 and Bg 352 fall under one haplotype and At 307 and Bg 310 fall under another distinct haplotype suggesting a close genetic relationship among them. Association of SNP/INDEL polymorphism with yield traits revealed a higher contribution of sequence polymorphism on determination of seed width, but a lower contribution on HSW and HEW parameters. The strong associations between four polymorphic DNA markers (RM5436, RM5346, Seq7-8 and Seq1-2) with yield traits (SN, seed weight, SL, seed width, EL, EW, HSW, HEW and overall yield) reveal their subsequent applicability in MAS.