INTRODUCTION
At this stage in biology history, genomics bears the hallmarks of a mature approach to address several interesting scientific questions, as advanced tools enable the use of powerful new strategies to understand complex biological processes such as evolution. Nowadays, many species in our planet can potentially be a model species and be studied from a genomic point of view. Indeed, the growing availability of non-model genomes stimulates research on evolutionary biology as never before, both on new and long-standing questions, improving our understanding of genome function, speciation, adaptation and guiding conservation and management of biodiversity and threatened species around the world (Meadows & Lindblad-Toh 2017).
Mammals play key roles in numerous eco logical functions, provide important human benefits such as food, recreation and income, and include many charismatic characteristics that make them important flagships for conservation efforts (Hoffman et al. 2011). Still, conservation status of mammals around the world is worrisome and their diversity and abundance have been rapidly depleted in face of threats such as habitat loss and overexploita tion (Schipper et al. 2008). Threat levels are not uniform along mammalian groups and neither on biogeographic realms (Davidson et al. 2009), as the highest percentage of decreasing species is concentrated in tropical regions, mostly in Southeast Asia and in Neotropics (Ceballos et al. 2017).
The Neotropical realm includes all South America, Central America and the insular Caribbean and hosts a large diversity of living mammals: around 1617 recognized species, which represents almost 25% of all mamma lian extant species of the world (Burgin et al. 2018). Neotropical mammals include several endemic groups such as caviomorph rodents (capybaras and spiny rats), xenarthrans (sloths, armadillos and anteaters), phyllostomid bats, marsupials (opossums), and platyrrhine monkeys (Patterson & Costa 2012). This remarkable mammalogical diversity can be partly attributed to the large diversity of neotropical landscapes and biomes, which encompass rainforests (both temperate and tropical), deserts, savannahs, scrublands, steppes, and mountains (Tews et al. 2004). The variety of landscapes and mammalian diversity are the result of a dynamic and extensive history of geological processes, with long periods of South America’s isolation interrupted by a succession of continental connections that allowed faunal exchanges with Africa, Antarctica, Australia and North America at different times (Patterson & Costa 2012; Carrillo et al. 2015). The later and permanent connection with North America initiated the famous massive wave of faunal migration known as the Great American Biotic Interchange (GABI, Marshall et al. 1982; Bagley & Johnson 2014).
Understanding the forces that shaped Neotropical diversity has been the focus of many studies, as this knowledge is crucial to guide conservation actions. In this context, the use of genomic data to answer questions on Neotropical mammals promises to foster our understanding and our capability to properly manage Neotropical biodiversity. Despite the progress already achieved for mammalian spe cies across other biogeographic realms, the use of genomic approaches to study Neotropical mammals is still in its early stages, a scenario that is about to change due to the current effort applied in sequencing critical taxa that were previously on genomic ignorance. Accordingly, this review aims to contribute to the discussion by exploring studies using genomic approach to investigate Neotropical mammals. Here, the term genomic will be broadly used to encom pass any large-scale surveys of genetic variation. Methodological alternatives have been dealt with in other recent reviews (see Pareek et al. 2011; Gasperskaja & Kučinskas 2017) and will not be covered in detail. We first briefly give an overview of the genomic methodological evolution and then provide an overview of the progress made in several fields of Neotropical mammalogy, aiming to outline some of the most fruitful avenues for linking genomics with several scientific questions. Finally, we discuss some caveats and future perspectives in the omic era for Neotropical mammals.
A BRIEF OVERVIEW OF OMIC’S REVOLUTION
DNA sequencing technologies have provided a myriad of genetic investigation possibilities to scientists in the past 50 years. From 1968 with the use of primer extension methods to sequence as much as 12 base pairs of bacterio-phage lambda (Wu & Kaiser 1968), to recent technologies that can sequence up to 10k bases per read, such as PacBio (Rhoads & Au 2015), much have changed regarding the limitations and perspectives of many aspects of biological sciences, especially evolutionary biology and related fields.
Although molecular sequencing arises in the 1960s with protein sequencing, it was only in the mid-1970s that scientists began to obtain hundreds of DNA bases in a single afternoon. At that time, two main methods were generally used to accomplish it: a chain terminator procedure developed by Sanger and Coulson (Sanger et al. 1977) and a chemical cleavage procedure developed by Maxam & Gilbert (1977). Both methods needed the sequence manual reading using an electrophoresis polyacrylamide gel onto an X-ray film. By 1987, with the use of automated fluorescent Sanger-based sequencing, more than 10 million bases were already deposited on GenBank, although a complete mammalian genome was still unthinkable.
In the 1990s the US Congress approved the Human Genome Project (HGP), which took around USD 1 billion and many years to finally publish the first draft of a sequenced mammalian genome, the human genome, in 2001 (Lander et al. 2001). Following the human genome, several other genomes were sequenced, mainly those of particular economic and scientific interest (e.g. rice—Goff et al. 2002, mold—Galagan et al. 2003, and chimpanzee—The Chimpanzee Sequencing and Analyses Consortium 2005), but still at a low pace and high costs. From 2007 onwards, a real revolution of sequencing technologies took place, leading to what is now known as Next Generation Sequencing (NGS) or massively parallel sequencing, reducing as much as four times the previous sequencing cost (Shendure et al. 2017). With technologies such as 454, Illumina and Ion Torrent, it is now possible to sequence the same human genome from the HGP in less than a week, spending no more than USD 4000. Moreover, not only sequencing is cheaper, but technologies like PacBio can also generate DNA reads up to 10k bp long (Rhoads & Au 2015).
Since the beginning of NGS revolution and the relatively low costs to sequence a single genome, several taxon-based initiatives for genome sequencing were launched, such as the Earth Microbiome project (Gilbert et al. 2014), which aims to construct a catalogue of all Bacteria and Archaea of this planet, and several other smaller projects for sequencing the genomes of eukaryotic species, such as Genome 10K, which aims to sequence all vertebrate genomes (Koepfli et al. 2015), i5K which aims to sequence 5000 arthropods (Levine 2011), and 10KP, which aims to sequence 10 000 plant genomes (Cheng et al. 2018), among others. Recently, a very ambitious project was launched, the Earth BioGenome Project (EBP), which aims to sequence, catalog and characterize the genomes of all eukaryotic life on Earth (1.5 million species) over a period of 10 years (Lewin et al. 2018).
Next generation sequencing technologies transformed evolutionary biology and many other related areas in an exciting and promis ing field of knowledge as scientists can now use organisms that are natural replicates of evolutionary processes to understand in a deeper way the evolution of life. Moreover, NGS released non-traditional model systems from genomic ignorance and there is a consensus that expanding the number and diversity of models is important to the biological agenda (Bolker 2014; Braasch et al. 2014). Today, any species can be a new model to answer long-standing questions on biology, and the addition of newly sequenced genomes provide an ever-increasing framework for comparative analyses, such as those exemplified in the next sections.
MASSIVE DATA: HOW CAN THEY HELP?
Genomic era has already helped to identify and characterize sequences and molecular mechanisms involved in biological processes of several organisms. Among vertebrates, a great effort has been applied in the generation of mammalian reference genomes, mainly due to their close relationship with humans. This effort is applied to address both new and long- standing questions in evolutionary biology; to improve our understanding of mammalian genome function, speciation, selection and adaptation; to contribute to our understanding of mammalian physiology and health in a comparative context; and to facilitate the con servation and management of biodiversity and harvested populations of mammalian species around the world (Meadows & Lindblad-Toh 2017). To date, we were able to register 198 mammalian genomes sequenced in public databases, and Fig. 1 depicts how these genomes are distributed among mammalian orders. From these sequenced mammalian genomes, only 15 (7.5%) are from Neotropical mammals (Fig. 1). Table 1 reports in details all the Neotropical sequenced mammalian genomes so far. Considering that 25% of all mammals are from Neotropics, they are evidently underrepresented among the sequenced genomes.
In the following sections we will illustrate how new sequencing technologies are providing great insights to several fields related to evolutionary biology such as systematics, molecular ecology, conservation biology, molecular evolution and evo-devo, and how they are leading these fields to a whole new level. We will highlight how genomics are providing new insights on the study of mammals in these fields, focusing on Neotropical mammals.
Systematics and the rise of phylogenomics
Trustable phylogenetic trees are the important first step that should precede any biological study. The tree informs us about historical relationships, thus orienting the direction of evolutionary diversification. Traditionally, relationships within mammalian lineages were inferred using morphological characters and, more recently, using molecular data, from few genes to large amount of sequences—the so-called phylogenomics (Patane et al. 2018). Phylogenomics use thousands of concatenated aligned nucleotide positions from hundreds of genes or other genomic regions supplied by full genome sequences from many organ- isms to improve phylogenetic analysis (e.g. McCommarck et al. 2011; Foley et al. 2016). Although genomic data have been effective in resolving some incongruent relationships along the mammalian tree (Foley et al. 2016; Chen et al. 2017), using large-scale data on phylogeny reconstruction may be more useful to provide a strongly supported foundation for a given species tree, since the power of considerable increase in taxa and genetic traits did not meet expectations to resolve the taxonomic inconsistencies left behind with the use of few molecular genes for a large number of taxonomic groups (Pyron 2015). The use of genomic data for the construction of phylogenies enables for a more accurate inference of the relationships among the different branches, especially those of difficult placement on a tree due to a rapid divergence (e.g. Nery et al. 2012). It also makes possible for a more accurate estimation of divergence times and evolutionary histories, which are crucial for establishing hypotheses on character evolution and for understanding the relationship among taxa. While the use of genomic data would not solve all phylogenetic questions, it helps to address caveats such as differences between gene and species trees resulting from incomplete sorting of ancestral polymorphism and introgression (Chan & Ragan 2013).
The acknowledged importance of genomic-scale data to reconstruct evolutionary histories of life arises concerns on the computational and analytical challenges that emerge with the large amount of genetic data to be analysed (Giribet 2016). To overcome computational challenges, genomes may be subdivided into different types of data with greater or lesser ability to elucidate the phylogenetic relationships among organisms. Recent reviews provide a summary on the most efficient uses for different types of genomic data such as whole genomes, UCEs, transcriptome sequencing, among others, suggesting that the type of data choice depends on the degree of taxa relatedness to be investigated (see Lemmon & Lemmon 2013; Patané et al. 2018).
Most studies of phylogenetic relationships from endemic Neotropical mammals, such as armadillos (Arteaga et al. 2012; Abba et al. 2015), sloths (de Moraes-Barros et al. 2011), anteaters (Barros et al. 2003; Collevatti et al. 2007), bats (Lim et al. 2003; Clare et al. 2011) and rodents (Upham & Patterson 2012) have relied either on a limited number of sequence-based markers (e.g., mtDNA and selected nuclear loci) or short tandem repeat loci (i.e., microsatellites). Studies using large-scale data to investigate phylogenetic relationships in Neotropical mammals were first and foremost conducted in primates, and a fairly complete review on systematics and evolution of New World primates was done by Schneider & Sampaio (2015). In summary, the relationships of Neotropical primates have been extensively studied with other types of data, but some rela tions remained controversial. A highly supported platyrrhine phylogeny was inferred by using large sets of molecular data or RAD-sequencing approach, resolving issues concerning Platyrrhine species diversification and relationships, including branching order among families, relative divergence of genera within families, and phylogenetic placement of the genus Aotus, allowing the generation of hypotheses regarding the origin, evolution and diversification of Platyrrhine monkeys in Neotropics (Wildman et al. 2009; Perelman et al. 2011; Kiesling et al. 2015; Valencia et al. 2018). More recently, phylogenomic analyses were used for a robust phylogenetic reconstruction of the relationships among species of capuchin monkeys (genus Sapajus), using ultraconserved elements (UCEs), which allowed for the accurate estimation of divergence times among species showing the rapid evolution of capuchin monkeys with high admixture rates (Lima et al. 2018).
Prior to the use of whole genome data, the mitochondrial genome has been widely applied to infer relationships among tropical mammals and also to answer questions regarding the origin and diversification of groups like caviomorph rodents (Voloch et al. 2013), spiny rats (Fabre et al. 2017), sloths (Ruiz-García et al. 2018b), the jaguarundi (Puma yagouaroundi, Ruiz-García et al. 2018c), the white-fronted capuchin (Cebus albifrons, Ruiz-García et al. 2018a), the mountain tapir (Tapirus pinchaque, Ruiz-García et al. 2016) and the squirrel monkeys (Saimiri spp., Chiou et al. 2011). All these phylogenomic studies provide further evidence for complex and simultaneous origins, besides a rapid and early diversification of the endemic Neotropical mammal fauna and contribute to expand our understanding of the natural processes underlying the origin of Neotropical mammals.
In summary, although the contribution of genomics to phylogenomics has not reached the great expectations for other taxonomical groups, it seems that still has much to contribute on unravelling the evolution of Neotropical mammalian species, with potential to shed light in studies of population dynamics, species delimitations, adaptive molecular evolution and comparative transcriptomics, to cite some (Pyron 2015).
Molecular ecology: from population genetics to population genomics
Genomic data can also be used to investigate intraspecific genetic diversity in population studies (e.g. population structure and connectivity, population sizes, cryptic population structuring and speciation). Recently, genomic methods such as genotyping-by-sequencing (Davey et al. 2011; Narum et al. 2013), exome sequencing (Li et al. 2010), transcriptome sequencing (Alvarez et al. 2015) and whole-genome resequencing data (Begun et al. 2007) are leading the field into population genomics. These new approaches boosted a recent increase in the number of mammalian genome assemblies in public databases, which allows for population genomic studies due to large amount of characters available for mapping and searching for genome-wide polymorphism data. Moreover, the large-scale polymorphism analyses may improve precision and accuracy of results in population genetics (Hendricks et al. 2018) and, unlike with the use of other markers, also have the ability to detect genomic regions under selection (Ellegren 2014).
Genomics can be very useful on population genetics to understand which and how evolutionary processes shaped the genome of organisms during microevolution, mainly due to the ability of genomic data to provide many neutral characters that carry molecular signatures of demographical processes that influence genome-wide loci (e.g. population size, gene flow, admixture, inbreeding and outbreeding depression) (Luikart et al. 2003; 2018). However, our ability to process, analyze and interpret genome-wide data advances at a slower pace when compared to the speed of data accumulation. Luikart et al. (2018) recently described the state-of-art of population genom ics concepts, divergence models and inference methods of population genomics data sets, and suggested that we are better equipped than ever with appropriate tools to understand the evolution of species and populations, but also, that there is room to improve.
There is a broad range of questions that have been addressed in mammalian lineages using population genomic approaches, such as: (i) to identify effective population size (Kijas et al. 2012), (ii) to detect population declines (Dobrynin et al. 2015), (iii) to estimate gene flow and hybridization rates (Abascal et al. 2016), (iv) to identify evolutionarily significant units (Mason et al. 2016), inbreeding and outbreed ing depression (Berenos et al. 2016, Huisman et al. 2016), and adaptive introgression, (v) to identify candidate adaptive loci, (vi) to improve landscape genomics, among others (Luikart et al. 2018).
Molecular ecology field is of particular interest in the Neotropical region due to the great territorial extension and diversity of environments, both contributing to foster a high biological diversity (Brown 2014). Still, the use of large amount of data is recent for Neotropical mammals and most population genetics surveys have relied on mitochondrial DNA and/or a small number of nuclear DNA markers. Nevertheless, we have examples on how our knowledge of ecology and evolution of Neotropical mammals can be benefited with genomic approaches. Analysis of population clustering using UCEs revealed widespread admixture among Sapajus monkey populations within the Amazon and even into the Cerrado and Atlantic Forest (Brazil), showing that great rivers of the region do not act as a barrier to gene flow between monkey populations (Lima et al. 2018). As another example, SNPs obtained through genotyping-by-sequencing approach on populations of the Plateau deer mouse Peromyscus melanophrys) provided valuable information on genomic diversity and structure of populations residing on a protected area in the Huautla Mountain Range, Mexico. The results suggest a single population with little or no genetic structuring, reinforcing the importance of adequate management and protection in protected areas for this endemic Mexican species (Vega et al. 2017).
Also, our knowledge about the process of organism diversification by hybridization has been greatly favored with genomics (Twyford & Ennos 2012; Seehausen et al. 2014; Luikart et al. 2018), allowing us to describe more accurately genome-wide patterns of divergence or differential introgression inside species (Harrison and Larson, 2014). For example, focusing on Neotropical carnivorans, Li et al. (2016) used a wide range of genetic data obtained through genome-wide SNP arrays, autosomal, X- and Y-linked variants and the complete mitochondrial genomes for 38 world-wide distributed Felidae species. Their results detailed the divergence patterns responsible for the phylogenetic discordance observed for the Leopardus lineage when few molecular data were used, showing pattern of hybridization among different Leopardus species distributed in Brazil, and showing genetic differentiation among Brazilian species and those that inhabit Central America region.
These results reflect the power of population genomics approaches to accurately describe the genetic diversity of Neotropical species. Since mammals are highly diverse in the Neotropics, this tool will help to infer their complex population dynamics, which usually results from rapid evolution across diverse habitats, and to bring light to the evolutionary processes that generate and maintain biodiversity in this realm.
Conservation biology
Genomic studies, in combination with historical and ecological knowledge, improve the precision on estimating parameters that are relevant to conservation efforts (Allendorf et al. 2010; Shafer et al. 2015). By highlighting fundamental information about population dynamics, whole-genome tools can guide management decisions, indicating units important for conservation, assessing presence or absence of gene flow among populations, detecting local adaptation, as well as establishing conservation priorities (McMahon et al. 2014; Garner et al. 2016). Still, few studies using genomic information focusing on mammalian conservation have been made to date (Shafer et al. 2015; Vega et al. 2017). However, these few studies high-lighted the promising use of genomic-based information to protect mammals around the world. For example, genome-wide information and population genomics data were used: (1) to assist in the management of populations with specific demands, such as genetically differentiated populations of white-tailed deer (Odocoileus virginianus, Ambriz-Morales et al. 2016), populations of endangered Sunda pangolins (Manis javanica, Nash et al. 2018), and amargosa and california voles (Microtus californicus, Krohn et al. 2018); (2) to identify the effectiveness of genetic rescue in the population of bighorn sheep (Ovis canadensis, Miller et al. 2012); (3) to monitor disease in tasmanian devils (Sarcophilus harrisii, Miller et al. 2011), and koala (Phascolarctos cinereus, Johnson et al. 2018).
Although genetic approaches have long been used in conservation research on species from Neotropics, the use of genomic-wide information in studies focused on the preservation of species is incipient, since researchers are still applying their efforts in generating reference genomes for endemic Neotropical species. Nardelli & Túnez (2017) recently published a review on the use of genetics applied to biological conservation in the Neotropics, and highlighted the concern about unequal efforts that concentrated studies on a few mammalian species and orders. The authors also emphasize the importance of using interdisciplinary approaches and genomic tools for conservation of Neotropical mammals (Nardelli & Túnez 2017). As genomic studies continue to advance, we expect a promising scenario for research on mammalian species that will guide conservation actions in the Neotropical region, one of the most diverse regions of the planet.
Molecular evolution: the power of comparative genomics to unravel functional and structural genome evolution
Comparative genomics is one of the most powerful tools to study mechanisms behind the evolution of both genome function—e.g. unraveling adaptive signals of evolutionary innovations—and genome structure—e.g. revealing contraction and expansion of genomes. With the main role of identifying variations in sequences and structure of orthologous sequences in the genome of a target species against an existing genomic database, the comparative genomic method searches for differences in regions of sequences encoding proteins or regulatory regions, in addition to identifying differences caused by deletions, insertions, copy number variations and inversions (Ellegreen 2008).
Functional evolution
Using multiple genomes from related species is useful to reveal the origin of adaptive traits. Fortunately, the current genomic research scenario has many genome sequencing projects of non-model organisms, particularly mammals, that allow the identification of adaptive evolution specific to a certain lineage, probably underlying evolutionary novelty at the phenotypic level (Ellegren 2014). In this context, genomic data provides new opportunities for the inference of functional adaptation, which occurs in most cases involving ecological changes (Reznick & Ghalambor 2001).
Among mammals, some groups conform ideal model studies to understand the molecular basis behind adaptive evolution, due to extreme morphological and physiological transformations during their evolutionary histories, making their genomes raw material to elucidate molecular mechanisms of ecological adaptation and speciation. Adaptive evolution may be achieved by many different pathways and genomic variations such as the gain or loss of genes and gene families (Zhang 2003). Examples of this phenomenon were reported to mammals around the world. Tsagkogeorga et al. (2017) used comparative evolutionary analysis of complete proteome data for eight bat species and evidenced contraction of the Olfactory Receptor (OR) gene repertoire in the last common ancestor of all bats, reflecting the change from olfaction to echolocation. Nery et al. (2014) also showed how gene loss may be adaptive, as they reported an increase in the rate of pseudogenization in keratin genes—especially those related to hair development—in cetacean lineage when compared to their ter restrial relatives. On the other hand, Quiu et al. (2012) reported the expansion of proteins related to hypoxic stress in yak species that in- habit high altitudes. Among Neotropical mammals, the recently sequenced capybara genome (Herrera-Alvarez et al. 2018 in press) revealed three gene families significantly expanded, related to tumor reversion and cancer suppression by the immune systems. The authors argued these expansions could be a response to reduce the increased cancer risk related to the evolution of a giant body size.
Genome-scale analyses may also provide insights into the action of selection across genomes. Given an adequate species sampling, the reconstructed coding sequences could be examined for ‘footprints’ of selection (e.g. dn/ ds ratios deviations) that may correlate with phenotypic evolution. In the Neotropics, a comparative study of eight mitogenomes of phyllostomid bats suggested great changes in base composition on the mitogenomes of vampire bats driven by positive selection, when compared with non-hematophagous taxa (Botero-Castro et al. 2018). Another Neotropical example consists on the evolution of the largest Brazilian predator, the jaguar (Panthera onca). Researchers sequenced the genome of P. onca and, in comparative analyses including all living Panthera species, species specific signatures of selection were found in two genes (ESRP1 and SSTR4) related to craniofacial development (Figueiró et al. 2017). The authors correlated this result with the fact that a massive head and strong bite are characteristics that differentiate the jaguar from the rest of Panthera cats, and, accordingly, may have been selected due to its specific diet, which includes large reptiles. In another example, Schneider et al. (2015) investigated molecular footprints of melanism polymorphism on populations of wild Neotropical felid species to understand the adaptive role on melanism variation within and among felid species. The authors assessed population level variation of the genomic regions surrounding Agouti signaling protein (ASIP) or the Melanocortin 1 receptor (MC1R) (previously identified as causes of melanism) and showed that three independent melanism mutations have occurred for Leopardus species, driven by different natural selection forces. Analyses of the common marmoset genome (Callithrix jacchus, Worley et al. 2014) in com parison with genomes of apes and Old-World monkeys, have revealed positive selection in growth hormone/insulin-like growth factor genes, genes related to metabolic pathways and genes related to the immune system.
An intriguing phenomenon that has received more attention recently is the evolutionary convergence in which there is independent evolution of similar characteristics in different evolutionary lineages due to similar selection pressures (Losos 2011, Stern 2013). Under- standing the molecular basis of parallel and convergent evolution has indeed become an important focus of several recent comparative genomic studies in the last two decades, shedding light on the origin of shared phenotypes among independent mammalian lineages. Comparative analysis between genomes allows for identifying if the observed phenotypic con vergence emerges from a convergence at the molecular level, and also if there is an adaptive nature in this phenomenon. An example of evolutionary convergence is the use of echoloca tion by bats and cetaceans. Parker et al. (2013) found evidence for molecular convergence in several genes linked to hearing or deafness and vision in cetaceans and bats, both lineages of echolocating mammals.
The reduction or loss of genes or gene families has been shown to be adaptive—as mentioned before—and also convergent in different mammalian lineages that underwent similar ecological changes. For example, the loss of function of genes related to smell and taste in different lineages of marine mammals (Chikina et al. 2016), the pseudogenization of bitter taste receptors in carnivorous tetrapods (Li & Zhang 2013), or the umami taste receptors in red pandas and the giant panda, an evolutionary response to an herbivorous diet (Hu et al. 2017), to cite some works with mammalian species around the world. Specifically, in the Neotropics we have an example of shared phenotype resulted from the independent loss of CMAH gene in hominids and Platyrrhine monkeys. This gene is responsible for coding the Neu5Gc mammalian membrane sugar, related to self-recognition by the innate immune system, showing a susceptibility to human diseases and representing an interesting case of parallel evolution (Springer et al. 2014). In another study, Emerling & Springer (2015) conducted a comparative genomic study focusing on Xenarthra looking for altered phototransduction genes and evidence of Rod monochromacy in mammals (i.e. complete color-blindness with poor visual acuity in dim-light). They found that Neotropical Choloepus hoffmanni sloth has inactivated genes that code for rod phototransduction proteins (SWS1 and PD6H genes). These same genes are inactivated in the deep diving whale Balaenoptera acutorostrata (Meredith et al. 2013), revealing a pattern of evolutionary convergence to life in low-light environments. Although extant sloths do not inhabit underground environments, this discovery reinforces the hypothesis of a fossorial ancestor of Xenarthrans.
Comparative genomics has also shown that most conserved sequences in mammalian ge- nomes are non-coding elements rather than protein-coding genes (Cañestro et al. 2007). The availability of new genome sequences also foster research on these conserved non-coding elements (CNEs), allowing their identification and role behind regulatory differences among species, since they usually include regulatory elements that affect the activity of nearby genes. The possibility to study genomic sequences beyond a gene-centered view stimulated the debate whether coding sequences are more important to promote evolution than non-coding sequences (Hoekstra & Coyne 2007, Wray 2007). The sequencing of the Neotropical short-tailed opossum (Monodelphis domestica) (Mikkelsen et al. 2007) and the comparison with eutherian genomes endorsed this debate by revealing a great difference in the contribution to evolutionary innovation between coding sequences and CNEs. Whereas the opossum genome seems to contain most coding genes also found in eutherian genomes, around 20% of eutherian CNEs are recent inventions that have evolved after the divergence of Eutheria and Metatheria, implying that true innovation in protein-coding genes is likely to be relatively rare, whereas CNEs appears as a major source of innovation during the evolution of mammals (Mikkelsen et al. 2007).
It is important to note that understanding the evolution of gene regulation requires information beyond genome sequences, such as experimental data on gene expression. On that account, comparative genomics approaches focusing on regulatory regions should be used to identify candidate regions potentially linked to morphological diversification between ecotypes or species, which will be further tested on laboratory.
Genome structure
Genomes size vary greatly along the tree of life and several taxonomic groups such as flowering plants, insects and teleost fishes, show extensive variation on this parameter (Kapusta et al. 2017). Among all eukaryotic species, mammalian genome sizes are less diverse, and range from 1.6 to 6.3 Gb (Kapusta et al. 2017). The available data indicates a general evolutionary trend toward smaller genomes in bats when compared to other mammals (Redi et al. 2015), and these small genomes appear to be an adaptation to the metabolic requirements for flight, which happens in flighting birds as well (Organ et al. 2007).
Genomics can reveal expansion and contraction of gene families, which is an important part of the genome architecture of species, contributing to understand patterns of gene content and gene duplication that ultimately affect species evolution and are indeed an important source of genome diversity and genome size variation (e.g. Petrov 2001; She et al. 2008; Kaessmann 2010; Chen et al. 2013). Genomics also revealed that differential accumulation and removal of transposable elements (TE) sequences represent a major determinant of genome size variation in mammals, and in some species, they may constitute up to half of the genome (Canapa et al. 2005; Schulman & Kalendar 2005). This information challenges us to deviate from a gene-centered view, because coding sequences themselves cannot tell the whole story of life or account for the organismal complexity (Redi et al. 2015). As the number of high-covered sequenced genomes of Neotropical mammals is still low, to our knowledge we only have TE content analyses for the opossum (52.2%, Mikkelsen et al. 2007), for the alpaca (32.1%— similar to other camelids, Wu et al. 2014), for the marmoset (Worley et al. 2014—the authors do not report the percentage, but state that the value is similar to other primates), and for the capybara (37.4%—similar to other rodents, Herrera-Alvarez et al. 2018).
Chromosomes are also an important part of genome architecture, since rearrangements are known to play a relevant role in evolution. Chromosomal studies have always been a subject of scientific interest, and embrace structural studies, composition, rearrangement and cell processes like mitosis and meiosis (Ferguson-Smith 2015). During many years, the random-breakage model (i.e. the absence of a rearrangement hot spot) was the most used one (Bailey et al. 2004), but more recently, evidences point to the fragile breakage model which suggests the existence of rearrangement areas that could be enriched in repetitive DNA or duplicated segments (Peng et al. 2006; Deakin 2018). In mammals, the diploid number vary from 2n = 6 for the Indian muntjac (Muntiacus muntjak) to 2n = 100 for some rodents (Kulemzina et al. 2011). In the Neotropics, Zurano et al. (2015) studied the karyotype from three endemic canids, and found great variability on heterochromatic and telomeric sequences, which could explain the karyotypic diversification of canids that exhibit interspecific variations in autosomes and sex chromosomes.
Three particular Neotropical mammalian groups have received more attention on chromosomal evolution studies: marsupials, rodents and bats. Marsupials first arose in South America, and currently in the Neotropical area there are only individuals from Didelphidae family, which are known to have a conserved diploid number (2n = 14, 18 and 22), although they have a high diversity in structural elements such as heterochromatin distribution, telomeric sequences, transposable elements and others (Carvalho et al. 2002; Pagnozzi et al. 2002; Faresin-Silva et al. 2017). Neotropical rodents have a higher diversity of heterochromatin patterns and a higher variety of diploid number with at least 60 diploid numbers described, ranging from 2n = 14 (Akodon arviculoides) to 2n = 96 (Phyllomys medius) which resulted from chromosome rearrangements, such as pericentric inversions and Robertsonian rearrangements (Di-Nizo et al. 2017). Finally, the Phyllostomidae family comprises Neotropical bats that have a diploid number ranging from 2n = 14 to 2n = 46, and the occurrence of multiple sex system (Volleth et al. 2002; Sotero-Caio et al. 2015; Sotero-Caio et al. 2017).
Today, many tools are available to make chromosomal comparative studies more powerful. Accordingly, high coverage genomes will help to uncover the factors that modulate chromosomal diversity across evolutionary time, allowing researchers on Neotropical mammals to go beyond descriptive studies.
Evo-devo
To understand the process of evolution, we also need to understand the roles of genes in development (for a review, see Carroll 2008). Evolutionary developmental biology—also known as evo-devo—focuses on investigating how development affects the phenotypic varia tion arising from genetic variation. Most classic research in evo-devo has focused on candidate genes—linking their functional evolution to evolutionary innovations and morphological variation—and classic approaches such as mutant analysis and comparative gene expression studies. While gaining an important understanding of development, these approaches are not enough to deliver a comprehensive panorama on the genetic basis of development, and we still lack information on how developmental processes merge to create a whole organism. Also, candidate gene approach has limitations because morphological changes are often linked to mutations in several genes or genetic path- ways. Later, evo-devo field has benefited a lot from the QTL (quantitative trait loci) mapping analysis, which identifies the genetic architecture underlying quantitative traits and provides a different and complementary perspective for the same questions addressed by evo-devo (for a review, see Parsons & Albertson 2013). In contrast to candidate gene approaches, QTLs provide more comprehensive insights into the molecular basis of phenotypic variation. More recently, QTL approaches making use of the availability of genome sequence information for an increasing number of species, are opening new ways to identify sets of candidate genes related to a given phenotype and to study the evolution of developmental regulatory networks—the core of evo-devo.
In this context, genome-scale analyses are already providing important insights into the action of selection across development, revealing expansion and contraction of toolkit com ponents, detecting constraints and the role of selection in developmental genes, uncovering mechanisms underlying developmental change, among other important achievements (for re views see Cañestro et al. 2007; Garfield & Wray 2009; Artieri & Singh 2010; Brujin et al. 2012; Pantalacci & Sémon 2014). Still, Neotropical mammals are not being used for evo-devo stud ies, but as many fields of biological sciences that are facing an explosion of genomic sequence information, evo-devo is not an exception, and along with the decreasing sequencing costs, it is a matter of time until Neotropical mammals become models of studies in evo-devo, as it is already the case for Neotropical fishes such as the South America lungfish (personal communication with Igor Schneider).
Beyond genomics
The evolution of genomes has primarily been addressed by examining genomic sequences and comparing gene content for certain gene families, often between distantly related organisms. The advent of sequencing the set of all RNA molecules—known as transcriptome—allowed the access of unknown non-coding and coding RNA and provided a better understanding of how genomes can vary even between phylogenetically close species (Gustincich et al. 2006; Wang et al. 2009). Classical uses of comparative transcriptomics involve searching for genes whose expression change correlates with phenotypic changes and studying the diversity of gene expression within or between species (Melé et al. 2015; Lowe et al. 2017). Comparative transcriptomics also can be used to predict gene regulatory networks, and to shed light on how the evolution of alternative splicing affects the evolution of species, which is an interesting but largely unexplored question (Nilsen & Graveley 2010; Chen et al. 2012). As example, Brawand et al. (2011) compared transcriptomes and gene expression of several mammals and showed that gene expression varies among organs, lineages and chromosomes due to differences in selective pressures. These results provide clues to the function and evolution of mammalian genomes. The use of transcriptome data to investigate Neotropical mammals is still scarce and one example is reported by Shaw et al. (2012) who studied the Jamaican fruit bat (Artibeus jamaicensis), and analyzed four different types of tissue (lung, spleen, kidney, tissue) identifying a high number of immune genes, and the first description of 42 miRNA.
CAVEATS AND FUTURE PERSPECTIVES
As discussed so far, due to rapid genome-sequencing technologies and decreasing costs, we now have access to a great amount of molecular data for a wide variety of species, which would be unthinkable two decades ago. Genomics has revolutionized many aspects of biological sciences, as the use of large-scale molecular sequences facilitate the identification of candidate genes and genomic regions that are important to the development and evolution of a variety of traits, suggesting hypotheses for the origin of phenotypic diversity. However, the genomic approach does not answer all the questions in a definitive way. To understand the evolution of organisms, we still require information beyond genome sequences such as experimental data on transcriptional regulation and gene expression variation or ecological experiments. In some cases, genomics can be seen as a first exploratory approach that can be used to generate lists of candidate genes or genomic regions potentially linked to morpho logical diversification between species, which can be ultimately tested on field or laboratory. Regardless of the question addressed to a specific species, a molecular study at some point will aim to turn insights from genetic analyses into functional hypotheses that would be tested in experimental assays. Depending on the species, these assays may or may not be feasible, but advances in laboratory techniques allow functional experiments to be done not neces sarily in the species itself, since transgenesis is already a reality commonly used in molecular laboratories (Sosa et al. 2010).
It has been a long journey since the DNA discovery until the point of being capable to sequence—at relatively affordable costs—the genomes of virtually all organisms on Earth. Still, an exciting and long pathway is ahead to all scientists aiming to work with Neotropical mammals using a genomic approach and we think that will only be a short time until whole genome analyses of variation will be common-place for ecological and evolutionary studies in Neotropical realm. With whole genome information of an ever-increasing number of Neotropical mammals, researchers will be able to identify the role of specific genomic regions in promoting phenotypic diversification among species. With a greater genomic understanding coupled with the rise of laboratory technical advances such as transgenesis and genome edit ing approaches, we can expect an auspicious future in which many species, not only the traditional species used as model systems, can enter a phase of functional investigations fol lowing their genetic, genomic or transcriptomic characterization. The comparative genomic approach will offer substantial insight into the population-level forces underlying evolutionary biology and the ecology of Neotropical mammals. In the evo-devo field, genome comparisons between closely-related species will shed light on which genes are expressed during ontogeny and gain insights into developmental evolution. The use of broad genomic approaches will be applied on phylogenetic reconstructions and provide new insights into systematics. Moreover, comparative genomics will allow the identification of non-coding elements across the genome that contributed to the recruitment of regulatory modules and networks to novel functions over evolutionary timescales in Neotropics biodiversity evolution.
In summary, new and long-standing questions are already being addressed using genomic approaches, but still at a slower pace in Neotropical mammals. The progress in all research areas mentioned here will be greatly accelerated with a collaborative research program that allow the generation of high quality, large scale genomic and transcriptomic reference sequences of mammals from South and Central America. Specific efforts are needed to involve investiga tors from all Latin America and elsewhere to stimulate the effort to address scientific questions that will contribute substantially to the discussion on the origin, diversity and distribution of Neotropical mammals. Much has to be done and a comprehensive panorama is long overdue, but this is even more exciting in times with increasing availability of genome-scale data. The complexity of life makes it difficult to study on a gene-by-gene basis and only a comparative genomic-approach can provide a global picture of the forces and mechanisms acting during biological evolution. In a context of global and quick climatic changes, this look into the past is crucial to predict possible consequences in the future—which has already arrived.