Phylogeographic and phylogenetic analyses of selected set of wild and naturalized Solanum spp . in Sri Lanka

Solanum spp. encompass a greater share of the biodiversity in the world. Sri Lanka, one of the biodiversity hotspots in the world, is the home for wide range of Solanum spp. which have been named and morphologically characterized. However, their origins and times of origin have not yet been resolved. Hence, this study was conducted to identify the origins and origination events of 13 wild and naturalized Solanum spp. found in Sri Lanka, using DNA barcoding locus matK in comparison to worldwide Solanum spp. with phylogenetics and divergence dating approaches. In this study, 13 Sri Lankan Solanum spp. were separated into four defined phylogenetic groups viz., Old world, Morelloids, Acanthophora and Trova. The studied Sri Lankan Solanum spp. would have been originated in Africa 2.007 million years ago (MYA) in the Pleistocene epoch through mammalian migration from Mediterranean land bridges. The Australian relatives of Sri Lankan Solanum spp. would have been colonized from South Asia through South East Asia. The floristic connectivity in Pleistocene epoch may have introduced Asian Solanum spp. to South East Asia where mid Miocenic collisions between Australian and Asian plates as well as over water Long Distance Dispersal (LDD) may have caused these species to colonize in Australia. Our analysis demonstrated that most of the Solanum spp. found in Sri Lanka were introduced from India during the Pleistocene ice age. We suggest that Pre Pleistocene migrations of Solanum spp. such as S. nigrum may have occurred through overwater LDD.


INTRODUCTION
Solanaceae (nightshades) is one of the most cosmopolitan and economically important families within angiosperms (Samuels, 2009).It contains around 2500 species within 100 genera composed of 14 main clades (Knapp, 1991;Olmstead et al., 2008).Solanaceae contains major vegetable species such as eggplant (Solanum melongena), chili pepper (Capsicum spp.), potato (S. tuberosum) and tomato (S. lycopersicum) and nearly 180 nightshade species are also grown as crops in diverse countries (Samuels, 2015).Solanum is the largest genus in the Solanaceae with 1,250 to 1,750 species distributed in many parts of the world (Frodin, 2004).In general, only 7% of species in Solanaceae are considered as established crop species although there are many other underutilized Solanaceous species available with edible parts (Samuels, 2015).
Sri Lanka is one of the biodiversity hot spots in the world (Bossuyt et al., 2004) and is home to 15 wild / naturalized and 13 exotic Solanum species (Dassanayake, 1987) and many of them are found in abundance in nearby India.These Solanum species possess valuable economic potential as edible and medicinal plants (Jeyakumar et al., 2016).Different parts of the wild Solanum plants exhibit important medicinal properties including anti-diabetic (Gupta et al., 2005), anticonvulsant (Adesina, 1985), anticancer (Jain et al., 2011), antibacterial (Salar, 2009;Doss et al., 2009), and wound healing properties (Huang et al., 2008) as well as larvicidal activity against mosquitoes (Bansal et al., 2009).Although numerous taxonomic studies have been undertaken, phylogenetic analyses to decipher possible origins and divergence times of the species, bioprospecting programs to characterize important bio-active compounds and biotechnological attempts to utilize these values in economic development are limited for the Solanum species present in Sri Lanka.
History of the geographical distribution of plant species around the world is considered a topic of interest to researchers (Bell et al., 2017;Baldwin and Wagner, 2010;Davis et al., 2002).The Gondwana fragmentation followed by the continental drift is one of the key hypotheses that explains the colonization of the biota in various regions (Yuan et al., 2005;Balme, 1980;Conti et al., 2004;Macey et al., 2000).However, the divergence time for most of the present angiosperms date below 75 million years (Renner, 2004) whereas, Gondwana fragmentation occurred 80 -160 million years ago (MYA) (Hall, 1998;Chatterjee and Scotese, 1999).Although fragmentation of Gondwanaland into major continents is considered the first reason for geographic distribution of plants, the wide distribution of plant families such as nightshades can only be addressed through two other hypotheses: land to land distribution through continental connections (Wolfe, 1975;Tiffney, 1985) and over water long distance dispersal (LDD) (Renner, 2004).The formation and breakdown of the central and regional bridges and the occurrence of discrete ice ages might have connected the floristic continuity across the countries (Wolfe, 1975;Tiffney, 1985).Over water LDD is also considered as a prime causative agent for higher rates of dispersal events of sessile flora (Baldwin and Wagner, 2010;Booth, 2017;Gillespie et al., 2012;Renner, 2004).Primarily three means of LDD are proposed in Gillespie et al., (2012) namely, wind dispersal, dissemination via oceanic currents or floating rafts and migratory bird mediated distribution.These three mechanisms are considered as regional distribution events and many studies suggest that they were more instrumental in recent floral diversifications events than the effect of Gondwana fragmentation.
The Solanaceae phylogenies, cladistic relationships and the overall evolution are well studied (Knapp, 1991;Weese and Bohs, 2007;Särkinen et al., 2013;Bohs and Olmstead, 1997;Volkov et al., 2001).Species in the genus Solanum consist of many diverse origins such as Eurasian, South American, North American, Asian, Australian and African and moreover, some of these species are endemic to the original geographical ranges (i.e.native ranges).Many studies have used plastid barcoding markers such as matK, ndhF, trnS-G and trnL-F (Särkinen et al., 2013;Bohs and Olmstead, 1997;Weese and Bohs, 2007), nuclear barcoding markers such as waxy, ITS and 5s rDNA (Särkinen et al., 2013;Volkov et al., 2001) to study these species.However, none of these published studies have considered Solanum spp. in Sri Lanka causing a significant knowledge gap in the phylogenetic grouping, origins and origination events of the species present in the country.
The establishment of phylogroups and origination events of all the Solanaceae species in Sri Lanka is an uphill task, although they would provide a strong platform for germplasm conservation and, cataloging the Sri Lankan species in world wide diversity structures and for application in plant breeding, bio-prospecting and biotech industries studies in order to utilize this wealth of genomic diversity for the economic development.If molecular phylogenetic trees could be established for at least prominent and economically important Solanaceae species, molecular geneticists would enable characterizing the orthologs of the important genes present in the clades and sister-groups.Therefore, the objective of the present study was to establish the molecular phylogenetic relationships, origins and the origination events of a selected group of economically important Solanum spp. in Sri Lanka.

DNA sequencing
The purified DNA isolated from 13 wild and naturalized Solanum spp. in Sri Lanka were PCR amplified using matK primers (forward primer: 5 'CGA TCT ATT CAT TCA ATA TTT C 3' and reverse primer: 5" TCTAGCACACGAAAGTCGAAGT3") (Jeyakumar et al., 2016), the standard reagents, conditions and primer annealing temperature of 48 °C.The purified PCR products were subjected to Sanger DNA sequencing using ABI 3500 Genetic AnalyzerVersion 4405186.The obtained matK DNA sequences of these 13 species were published in Jeyakumar et al., (2016) in comparison to the morphogenetic diversity structureof the species (Table 1, GenBank Accession Numbers KX258741 to KX258754).The same dataset was used in the present analysis to estimate the origination events of these species.

Phylogenetic analysis
A set of Basic Local Alignment Search Tool (BLAST) analysis was performed for 13 matK sequences to fetch all the orthologous Solanum matK sequences available in the GenBank (Table 1).A total of 59 sequences identified through BLAST searches were aligned manually with the sequences of 13 wild and naturalized Solanum species in Sri Lanka using the software MEGA 7 (Kumar et al., 2016).A species from sister group of Solanum clade, Jaltomata bicolor, was used as the out-group to root all the trees in the phylogenetic analysis.The bootstrap+consensus algorithm was executed in Maximum Likelihood framework for 1000 replicates in RAxML-HPC2 Workflow (Stamatakis, 2006) in Cipres Science gateway (Miller et al., 2010).The GTRCAT was used as the nucleotide substitution model to get an accurate and rapid estimation of substitution rate at each site.A Heuristic tree search was initially run in Maximum Parsimony criterion in PAUP V4 (Swofford, 2001) where the strict consensus tree was used as the starting topology in RAxML.The resulting consensus tree was further modified using FigTree v1.4.3 (Rambaut, 2014).

Calibration of divergence time
The software jModel Test V.2 (Posada, 2008) was initially implemented on Cipres Science gateway (Miller et al., 2010) to measure the best fitting nucleotide substitution model in Akaike information criterion (AIC) (Akaike,1974) for the data set.The software BEAST 2.0 (Drummond and Rambaut, 2007) was used to carry out the divergence time estimation in Bayesian framework.According to the selected model, TVM+I+G, the rate (AC= 1.147, AG=1.057,AT=-0.216,CG=0.729,CT=1.057 and GT=1.0) and shape parameters were assigned (Zharkikh, 1994).The uncorrelated log normal clock (Drummond et al., 2006) was used to compensate for the variation of evolutionary rates across the lineages using a lognormal prior with 0.001 mean and 1 standard deviation as well as Birth death model as tree-prior to account for speciation and extinction.A total of eight S. dulcamara seed fossils (7.3-2.6 MYA) from Pliocene epoch (Mai,1988;Reid and Reid, 1907;Szafer, 1946) and a seed fossil S. nigrum (5.3 -11.6 MYA) (van der Beek and van der Burgh, 1987) were used to calibrate two nodes of the time tree developed in the present study.The Time to Most Recent Common Ancestor (TMRCA) for crown of the tree was calibrated using a lognormal prior with 0.001 mean, 0.1 standard deviation and 13.3 MYA offset.The TMRCA was calibrated for root of the tree to be 17 MYA using a lognormal prior.All the node calibration except the fossil data were done as described in Särkinen et al. (2013).The Markov Chain Monte Carlo (MCMC) was performed for 100 million generations with a 10% burnin to achieve maximum chain convergence.The resulting log file was analyzed to assess the Effective Sample Size (ESS) and chain convergence in Tracer v1. 4 (Rambaut and Drummond, 2007).The maximum clade credibility (MCC) tree was visualized and further edited in FigTree v1.4.3 (Rambaut, 2014).

Phylogenetic Analysis
All the Solanum matK sequences obtained from BLAST searches (E value greater than 98) with Sri Lankan Solanum spp.were included in this analysis (Table 1).The reading frame of matK locus contained 1530 base pairs (bp).The sequences less than 50% missing bp were excluded from the analysis as Maximum Likelihood (ML) inference can be deluded under the presence of large number of missing data in a non-random passion (Xi et al., 2015).Although some of thesequences included in the analysis contained missing data, both ML majority rule consensus tree and Bayesian Maximum Clade Credibility (MCC) tree branched out similarly at major nodes (Figures 1 and 2).The posterior probability (PP) values and bootstrap (bs) values were higher at these nodes while few nodes towards the tips that were present in the ML collapsed in MCC tree, yielding low PP values even though they had higher bs values.
The initial Maximum Parsimony tree had a well resolved topology given by the Heuristic search.The resolution of the tree was enough to use it as the starting tree topology in RAxML.The ML tree separated the analyzed Solanum species (Table 1) into a total of 10 phylogenetic groups and they were congruent with phylogenetic constructions obtained using different genes in previously published work (Weese and Bohs, 2007;Särkinen et al., 2013;Bohs and Olmstead, 1997).The 13 Sri Lankan Solanum species were separated into four phylogenetic groups namely Old world, Morelloids, Acanthophora and Trova (Olmstead and Bohs, 2006).The robustness of these clusters was displayed by higher node support with PP values larger than 95% and bs values of 100 (Figure 1).Since our analysis missed some of the phylogenetic groups of Solanum genus, topology at the clade connection was not well resolved.Thus sequences of the other DNA barcoding loci as well as the matK sequences for more Solanum spp.must be incorporated in future studies to identify the robust relationships with higher resolution.However, in the present analysis, major relationships between prominent sister clades remained constant (Särkinen et al., 2013).

Calibration of Divergence Time and Origination Events
The MCMC chains ran in the tree search converged maximally at 10% burn-in value.The initial 100,000 trees were discarded as burn-in.The trees were probed from stationary distribution in tree space as all the ESS values calculated were higher than 200, thus inferring our MCMC chains were run long enough (10 million runs) to sample independently and to avoid poor mixing.The divergence time for each node was calibrated according to the TMRCA calibrations.
The three wild Solanum species in Sri Lanka, S. pubescens, S. violaceum and S. trilobatum, formed a monophyletic group which was a sister to S. melongena group.This group clustered together with S. linnaeanum forming a well resolved monophyletic group with clades that are saturated with native Australian Solanum spp.The age of the recent common ancestor of this monophyletic group was calculated as 7.779 MYA [95% highest posterior density (HPD) at 3.158-10.236MYA].Interestingly Sri Lankan S. hispidum and S. torvum together formed a sister clade with Indian S. torvum.The divergence time between these two sister Solanum spp.and their Indian descendent was calculated to be 3.886 MYA (95% HDP values of 0.148 -8.621 MYA).The uncorrelated relaxed molecular clock estimated that S. nigrum inhabiting in Sri Lanka was descendent from an Indian S. nigrum 5.424 MYA (95% HPD value of 5.30-5.88MYA).All S. melongena, S. pubescens Vern.wal-thibbatu, S. violaceum Vern.thittha thibbatu and S. trilobatum Vern.wel-thibbatu were clustered under the phylogenetic group Old world.This group is closely related to the phylogenetic group Sisymbrifolium sharing a common ancestor 1.556 MYA before in Miocene epoch.
S. virginianum vern.Katuwelbatu which is native to South Asian region (Pandey et al., 2008) was cladded with S. aculeatissimum which is known to be a wild Solanum species in India.However, the native range of S. aculeatissimum is believed to be from Africa (Hepper and Jaeger, 1986).It is permissible to argue that the mammalian distribution due to late neogenic climatic changes (Cerling et al., 1997) and formation of land connections (Qiu et al., 2001) may have introduced African floral species to Asia.Especially extinct herbivore mammals such as Stegodon were known to have migrated from Africa to Asia 4 MYA (Ao et al., 2016), which might have been a perfect vector for Solanum species to distribute from Afrotropical region to Oriental region.The present analysis confirms that the African S. aculeatissimum diverged 2.007 MYA from its Asian sister taxa in Pleistocene epoch which is in parallel to mammalian migration events (Cerling et al., 1997).After introduction of S. aculeatissimum to Oriental region, it might have radiated into Asian region giving rise to S. virginianum.The species  (Rohling et al., 1998) however according to the present analysis S. nigrum Vern.Small Kalukenweriya would have been introduced to Sri Lanka before Pleistocene ice age probably through over water LDD.The mixing of S. nigrum population from India with the Sri Lankan population has not occurred since this was a very early divergent event.Moreover, the phylogenetic group Trova contained three species in the current analysis.Out of these three species, the two Sri Lankan species, S. hispidum Vern.Gonabatu and S. torvum Vern.Thibbatu were found to be sister taxa diverged 2.097 MYA in Pliocene epoch.This group was claded with S. L. T. Ranaweera et al. torvum (KC535802) isolated from India implying that S. hispidum and S. torvum were split from a common ancestor in India and then colonized to Sri Lanka.S. hispidum would have been evolved from S. torvum due to a major speciation event in Pleistocene epoch.It is logical to argue that S. hispidum is a more recent species in nightshade family.Similar pattern is shown in Sri Lankan endemic vertebrate and invertebrate species where they share similar morphological traitsal though they keep endemic genetic structures by maintaining unique haplotypes within the island irrespective of numerous chances to mix up due to the frequent formations of land bridges at low sea levels (Bossuyt et al., 2004).It provides insights into the fact that these two S. nigrum species evolving into two new species in India and Sri Lanka, but further morphological and molecular phylogenetic analyses are required to resolve their taxonomic status.
It is evident from the present analysis that the diversity of the Sri Lankan Solanum species may also be closely aligned to Australian Solanum spp.All S. melongena, S. pubersena, S. violaceum, S. trilobatum, S. torvum, S. hispidum and S. virginianum of old world clade (Särkinen et al., 2013;Olmstead et al., 2008) deeply rooted with a clade that is saturated with numerous native Australian Solanum species implying that they would have been distributed across Lydekker's and Wallace's Lines explained in Toussaint et al., (2015).Although as described in "out of India" hypothesis the Gondwana fragmentation has moved many taxa from Africa, through India to South East Asia (Ali et al., 2013), the common ancestor of this Australian-Asian clade dates back to more recent ages.Thus, the only passageway to mix Asian flora with Australian flora is through South East Asia.The Southeast Asian floral diversity is known to originate from east Eurasia and Australia.Large migratory events have been recorded in mid Miocene epoch soon after the Sunda plate collided with Australian plate (Morley, 1998).Plant dispersal LDD by the migrating birds from routes such as East Asian-Australasian Flyway is also possible.Moreover the dispersal routes in late Neogene may have opened up passage ways for animal distribution from South Asia to South East Asia.

CONCLUSION
The current analyses for the first time describes the phylogenetic positions, origin and origination events of 13 Sri Lankan wild and naturalized Solanum spp.The Solanum diversity in Sri Lanka is strictly connected with the Indian Solanum diversity although the origins of these clades are largely differ from each other.With the molecular dating results we identified two major means of diversification events namely through land bridges of Pleistocene ice age and overwater LDD.Although same species name is given the Solanum species in India and Sri Lanka, the geographical isolation events were dated back to Pleistocene epoc and before, thus a comprehensive systematic and morphological revision is needed to distinguish between Indian and Sri Lankan Solanum spp.The worldwide mixing of Solanaceae germplasm due to continental and regional bridges and migratory birds would have played a significant role in the groups' history.The disturbance to the migratory behavior of birds caused by developmental activities and environmental pollution, habitat fragmentation, global warming may therefore hinder the further reshaping of these germplasm distribution patterns.In the future if these dispersive forces are greatly weakened, the natural evolution of these species would slow down dramatically and get restricted to smaller conservation patches in bio-geographical regions.

Figure1:
Figure1 : The majority rule consensus tree constructed on Maximum Likelihood (ML) framework.The major phylogenetic groups of Solanum spp.included in this analysis are presented.The node support values are given on its respective nodes indicated either in black dots (over 95 posterior probability and 100 bootstrap value) or white dots (over 85 posterior probability and 95 bootstrap value).The nodes containing posterior probabilities lower than 85 and bootstrap values less than 90 are not shown as dots.The Operational Taxonomic Units of Sri Lankan Solanum spp.are indicated in bold letters.

Figure 2 :
Figure 2 :The Maximum Clade Credibility Tree (MCC) showing the divergence time and geographic distribution of tips (operational taxonomic units).The geological time periods parallel to our divergence dating are given below the tree and tree is colored respectively.The color key below the tree indicates the native range (A) and the current state of distribution (B) of the respective Solanum spp. in the world.The nodes of the tree are labeled with the divergence time (in Million years ago).The Operational Taxonomic Units of Sri Lankan Solanum spp.are indicated in bold letters

Table 1 :
The matk sequence data used in this study.