Introduction
In order to accomplish the various steps of fertilization, sperm chromatin undergoes a remarkably striking condensation by substituting histones firstly with transition proteins and secondly with protamines, a class of positively-charged proteins expressed only in male germ cells.1–3 In the course of evolution, this transition has recurred sporadically and independently in several metazoan groups,4 with protamines always present at the upper tips of the phylogenetic tree and the main types of basic proteins widely represented across both invertebrates, such as molluscs, tunicates, insects, and vertebrates.5–12
In human, during spermatogenesis approximately 90% of the canonical histones are replaced by protamines (reviewed in Castillo et al),13 so that paternal chromatin results packaged into three major structural domains: one attached to the nuclear Matrix Attachment Regions (MARs)14,15; a second one containing the majority of sperm DNA, coiled into toroids by protamines16; and, a third one corresponding to approximately 5–15% of the total DNA and bound to the preserved histones.17–24 It was hypothesized that histone retention was nonrandom and had a role in ensuring proper gene expression, correct development, totipotency and inheritance of DNA methylation in the early embryo.17,25–29
Among the first findings that supported the nonrandomness of histone retention, two studies on the β-globin cluster demonstrated that the early expressed isoforms and those expressed in the adult were associated to histones and protamines, respectively.30,31 Subsequently, it was demonstrated that histone-bound DNA was mainly enriched in: i) exons, in comparison with flanking introns32; ii) gene-dense regions and developmentally-regulated promoters33; iii) development-related gene families24,34; iv) transcription start sites and CpG-rich promoters.35
In apparent contrast, other analyses put under discussion these results because the authors claimed that the majority of preserved nucleosomes was associated to repeat elements, such as LINEs and SINEs,21 pericentric repeats36 and GC-poor, gene-poor regions.7,12 Elsewhere, it was suggested that these conflicting findings may have resulted from technical issues.37 As a consequence, it has not yet been possible to reach a final conclusion about the role and localization of histones in sperm chromatin, and the debate is still ongoing.38
In this study, our main aim was to clarify if the retained histones are preferentially distributed in the GC-rich or in the GC-poor regions of the human genome and if this association has a functional meaning. To do that, we calculated the frequencies of sequences and genes associated to mononucleosomes or carrying the epigenetic marks H3K27me3 (repressive mark),24 H3K4me3 and H3K4me2 (activating marks) in human isochores.24 The isochores are defined as Megabase-size regions of DNA (> 0.3 Mb, up to several Mb) fairly homogeneous in base composition and associated with a number of biological properties,39 such as gene expression and distribution,40 DNA methylation, CpG islands, replication timing and chromatin structure (as reviewed in Bernardi).39
According to the isochore map of human chromosomes, the total number of isochore bands in the human genome is 3,159, a number close to the maximum number obtained for chromosomal bands at the highest resolution in early prophase.41 In vertebrate genomes, isochores are grouped into the following five families characterized by different ranges of GC level39: two GC-rich families, represented by H3 (>53% GC) and H2 (ranging from 46–53% GC); and, three GC-poor families, represented by H1 (ranging from 41–46% GC), L2 (from 37–41% GC) and L1 (<37% GC). The GC-poor families, while representing approximately 85% of the genome, constitute the “genome desert” because they are characterized by a low gene density. The remaining 15%, represented by the three GC-rich families, is called the “genome core” since it contains more than half of the genes.41
Given that both the retained histones and the GC-rich isochores represent approximately 15% of the genome, we hypothesized that their localization might coincide. If this hypothesis had been right, then we would have to find the totality of histones embedded in the GC-rich isochores. Unlike our expectations, we found that histones are distributed in large amounts also in the GC-poor isochores. At this point, we tried to find an explanation by examining the distribution of genes carrying epigenetic marks. We found that genes whose function is not limited to the first stages of development are mainly associated with GC-rich isochores, while developmental and tissue-specific genes are associated with the GC-poor ones. Our approach, taking into account isochore structure, provides–for the first time–information about the association between isochores and histone retention in human sperm chromatin, supporting the hypothesis that the distribution of nucleosomes in sperm chromatin is not random and might be related to expression timing in the early phases of development.
Materials and methods
Retrieval of coordinates of selected sequences from the datasets of Hammoud et al and Samans et al12,24
Accession numbers of genes enriched in mononucleosomes and chromatin marks in human sperm chromatin were downloaded from the Supplementary Material of Hammoud et al.24 In order to extract gene name, chromosome number, and transcription start and end sites for each given accession we used the Table Browser tool at Genome Browser (www.genome.ucsc.edu ) with the following selection criteria: Assembly: hg17 (to allow compatibility with the coordinates of isochores)41,42; Group: Gene and Gene predictions track; Track: RefSeq Genes; Table: RefGene; Region: Genome. Entries with no match in RefGene were discarded. Subsequently, output was inspected manually in order to remove entries having the same or overlapping coordinates. As a result, for mononucleosomes, from an initial dataset of 654 entries, we obtained a final dataset of 522 sequences. For genes carrying the epigenetic marks H3K27me3 and H3K4me2, we obtained the coordinates and names of genes harboring epigenetic marks through the BIOMART tool at the Ensembl website (www.ensembl.org/info/data/biomart/index.html ). Their positions along the isochores were determined as described in next paragraph. As control analysis, for these genes we also used the Table Browser tool, obtaining comparable results (data not shown).
To increase the strength of our results, we decided to also include in our analyses the sequences provided by Samans et al,12 since these authors put under discussion the findings of Hammoud et al.24 To do that, we downloaded information from the GEO database (https://www.ncbi.nlm.nih.gov/geo/ ) the Supplementary file GSM1160359 containing the coordinates of 14,823 peaks,12 representing nucleosome binding sites in human sperm chromosomes. The coordinates were used to match these sequences to the isochore bands. Finally, for each dataset, we pooled the data of all chromosomes and we calculated the frequency (in %), and the density of mononucleosomes and of chromatin marks (see below), dividing the number of genes by the length of isochores in Mb.
Matching histone-enriched genes and sequences to the isochores
Coordinates of the human isochores and their GC levels were retrieved from Costantini et al.41 The latter were used to generate, for each chromosome, a table containing all isochore bands with their relative positions (nucleotide start/end) and color (we used the color code reported in the literature: H3-red; H2-orange; H1-yellow; L2-light blue; L1-dark blue).39,41–43 This table was used to match each gene to the corresponding isochore band. Finally, we plotted the frequencies of genes associated to mononucleosomes and modified histones. We also calculated the mononucleosome density (with the formula: d = number of genes/number of isochore bands) across the isochore families, obtaining comparable results (data not shown). Finally, we calculated and plotted the densities of genes harboring modified histones over the Mb of each isochore family, using the formula d = 10 × (number of genes/Mb of isochores).
Results
Distribution and density of mononucleosomes in the isochore families and their comparison with data from Samans et al12,24
We localized genes associated to unmodified mononucleosomes on the isochore table,24 and estimated their number within each isochore family, and finally after pooling data for chromosomes 1 to 22 we plotted their percent frequencies. Results are shown in Figure 1a. From this figure it is visible that nearly half of the genes are located in the L2 isochore and a quarter in H1, whereas only the 20% reside in the genome core (isochores H2 and H3). The above results suggest that the three-quarters of the genes retaining histones are embedded in GC-poor isochores (H1 and L2), which constitute the gene desert.
It is useful to remember that the gene desert is not totally devoid of genes, but they are less dense in this compartment compared to the genome core.39 Looking at the histogram of isochore bands, reported for reference, it is possible to see that the frequency trend of histone-enriched genes follows that of isochores. When we considered the density, we could observe that it undergoes a slight increase from GC-poor to GC-rich regions (Fig. 2a). In Figure 1b we report the results of the analyses of sequences retrieved from Samans et al12; in this case, the percent of sequences found in GC-poor isochores is even higher.
Thereby, it appears that in both cases the majority of histone-enriched sites are embedded in GC-poor isochores.12,24 This result, however, is more pronounced with the data from Samans’ group,12 where only 10% of the sites belong to GC-rich isochores. This difference could be attributed to the fact that sequences from Samans et al include nongenic regions as well.12 In fact, when we consider the density, we can see a bias towards L1 and L2 isochores, which most probably can be attributed to the presence of nongenic regions (Fig. 2b).
Distribution and density of H3K27me3, H3K4me3 and H3K4me2 in isochore families
We estimated the frequency (in %) of genes associated with the chromatin marks H3K27me3 (repressive), H3K4me2 and H3K4me3 (activating) across the isochores (Fig. 3a, b and c, respectively).24 Each column represents the frequency (in %) of genes carrying marked histones in each isochore family of human chromosomes. As described previously for Figure 1, the data for all chromosomes were pooled and then we reported the percent frequency of isochores for reference. As can be seen, genes associated with H3K27me3 are mostly (∼70%) located in GC-poor isochores, in agreement with the results obtained by Jung et al who found that this modification is enriched in the gene-poor compartment.44 Concerning H3K4me3, the total number of genes associated to this modification is lower, but the distribution is similar to that of the repressive mark. In fact, it has been proposed that these two modifications are often associated, constituting a bivalent chromatin structure, whose function should be to repress genes maintaining them and able to be activated at the same time (gene poising).24 On the contrary, Figure 3 shows that genes harboring H3K4me2 are enriched in GC-rich isochores.
In Figure 4 we plotted the percent frequencies of the three modifications, together with that of the isochore bands. Remarkably, our analysis reveals an opposite trend between H3K4me2 and H3K27me3 in relation to the frequency of the isochore families. The frequency of H3K4me2, in fact, presents a strikingly opposite trend respective to that of the frequency of isochore bands, whereas the frequency of H3K27me3 and of H3K4me3 are almost equally distributed along the five isochore families. We calculated and plotted the density (number genes/Mb) of each of the above modified histones along the isochore families. We could see that the densities of the three modifications are correlated with the GC level of isochores (Fig. 5); in fact, gene density is higher in the GC-rich compartment. More precisely, this is evident for H3K4me2, the density of which is highest in H3, probably because this mark is associated with housekeeping genes, which are more frequent in H3.39
Localization of HOX clusters in human isochores
We obtained the coordinates of the four HOX clusters (associated with H3K27me3 mark 24) from the GENECARDS website (http://www.genecards.org ) and determined their position along isochores. We observed that HOXB and HOXD reside in the L2 isochore, whereas HOXA and HOXC belong to the L1 isochore. The four loci are, then, all embedded in GC-poor isochores. In Figure 6, we have reported the example of the HOXB cluster in chromosome 17; it is clear how hoxb genes, despite being GC-rich, reside in a GC-poor isochore. We also analyzed the remaining HOX clusters and obtained a similar result (data not shown).
Discussion
Sperm chromatin has attracted the interest of the scientific community for a long time, especially for its peculiar structure and its relevance for studies on infertility, totipotency and embryo development. The prevailing view that the packaging of the paternal genome has a functional meaning has been supported by a number of lines of evidence; for example, abnormalities in protamine and histone patterning are linked to infertility in both mouse and human and to incorrect embryo development.45–48 Furthermore, it has been reported that histones in sperm chromatin are characterized by a level of plasticity similar to that of embryonic stem cells, thanks to epigenetic marks and posttranslational modifications,11; 49–51 then, they might have a role in transgenerational epigenetic inheritance. Despite being a subject of such interest, our knowledge on the function and localization of determinants of sperm chromatin is still incomplete. In fact, as mentioned in the Introduction, while some studies suggest that histones are bound mostly to promoters and genes,24,30,33,34 other studies propose that histone retention in sperm is more frequent in repeats and gene poor regions.12,21,36,52
In this study, for the first time, genomic sequences that retain histones in human sperm chromatin have been mapped along the isochore families in order to correlate their distribution with the base composition.12,24 Our results show that both sequences and a large part of genes associated with mononucleosomes, H3K27me3 and H3K4me3 are located in GC-poor isochores L2 and H1 (∼68% of the human genome). Nonetheless, when we consider the density of genes retaining histones, we observe that it presents an increasing trend from the GC-poor compartment to the GC-rich one, possibly because in the latter the gene density is higher.39
For interpretation of the results presented herein, we have to take into account that: i) approximately half of the genes (including genes with high GC%) reside in the GC-poor, gene-poor compartment39; ii) in embryo samples, expression begins prevalently from genes located in GC-poor isochores53; and, iii) genes constitute only a small percent of the genome,39 therefore it is expected to find nucleosomes also in nongenic regions. To this regard, the association of nucleosomes with repetitive sequences could have a role in postfertilization processes reviewed in Castillo et al,13 in tridimensional organization of male gamete chromatin and in inheritance of paternal chromatin structure.54
Interestingly, regarding point i) above, if we compare the plot showing nucleosome retention with the isochore plot,12,43 we observe that most of the GC peaks associated with nucleosomes are actually located in GC-poor isochores. This scenario becomes very clear when we consider the case of HOX clusters that are all GC-rich sequences (which then appear as GC peaks) embedded into larger GC-poor isochores.
About point ii), Barton and coworkers showed that during the passage from early to late stages of preimplantation processes,55 GC-rich isochores increase their expression activity while GC-poor ones decrease it. Moreover, it has been reported that most genes expressed during mouse brain development are located in GC-poor and LINE-rich regions.56,57 On the basis of these lines of evidence, it has been proposed that genes located in GC-poor isochores are actively expressed during development and turn silent by heterochromatinization at the end of this process.58,59 In this context, we can speculate that the presence of bivalent chromatin (H3K27me3/H4K3me3) in GC-poor regions could help to facilitate this process.
Conclusions
During the last years, the interest and progress towards sperm genetics and epigenetics has increased deeply, leading to a clarification that sperm-persisting histones are not simply remnants of an inefficient replacement process but contribute to the paternal information delivered to the offspring. Nevertheless, there is an ongoing debate regarding the genomic location of retained histones.12,24,34,37,38,52 Our work proposes a “background approach”, in which we focused on the genome as a whole, through the isochores; this approach allowed us to overcome the vision of histones associated with genes or nongenic sequences, as we considered regions much larger compared to single sequences. Providing an explanation for this debate was not our goal; however, on the basis of our results, we can conclude that the two visions might be compatible.
We show that in human sperm DNA, genes and sequences enriched in histones are more frequent in GC-poor isochores, with the exception of genes associated to the activating mark H4K3me2. It is known that H3K4me3 covers genes related to development, organogenesis and tissue specificity, whereas H3K4me2 is more frequently associated with housekeeping genes.24 This is consistent with the fact that tissue-specific and developmental genes are enriched in GC-poor isochores, whereas housekeeping genes are more frequent in GC-rich ones.39
On the basis of these observations, we suggest that housekeeping genes, whose expression is not limited to the first stages of development, would prefer a GC-rich isochore localization. On the contrary, tissue-specific and developmental genes whose expression is time-limited can take advantage of a structure constituted by bivalent nucleosomes and GC-poor isochores, where they can easily be switched off to a silent state immediately after their transcription. Histone retention is, then, likely to be nonrandom in the sperm genome and could possibly be correlated to the isochores in which they are embedded and, ultimately, to the period in which the genes have to be expressed.60
Perspectives
This work opens new perspectives on the importance of isochore structure in the localization of genomic determinants, such as nucleosome-enriched sequences in sperm chromatin. Mapping sequences associated to histones along the chromosomes can be useful in order to display an eventual clustering, to match them with the determinants of tridimensional structure, such as matrix attachment regions and LADs and, especially, to discover signals that might specify retention. It could also be interesting to unravel a possible correlation between isochores, chromatin marks, other epigenetic regulators (such as DNA methylation and attachment to the nuclear matrix) and gene expression in sperm chromatin of animal models. A future step will be to characterize in detail the genomic localization of the single genes according to their expression activity during spermatogenesis or during the early zygotic stages.
Another relevant point for evolutionary and comparative genomics studies would be to understand how sperm chromatin marks are distributed along the isochores in other animal models, such as zebrafish, that do not use protamines to compact sperm chromatin.61 Studying animal models differing in the extent of nucleosomal compaction and in organization of isochores, could increase our understanding of sperm packaging from an evolutionary point of view.
Declarations
Conflict of interest
The authors have no conflict of interests related to this publication.
Authors’ contributions
AV conceived the research, produced, analyzed and interpreted data, and wrote the manuscript. SA participated in data interpretation and in correcting the manuscript.