Phages are diverse and abundant within microbial communities, where they play major roles in their evolution and adaptation. Phage replication, and multiplication, is generally thought to be restricted within a single or narrow host range. Here we use published and newly generated proximity-ligation-based metagenomic Hi-C (metaHiC) data from various environments to explore virus–host interactions. We reconstructed 4,975 microbial and 6,572 phage genomes of medium quality or higher. MetaHiC yielded a contact network between genomes and enabled assignment of approximately half of phage genomes to their hosts, revealing that a substantial proportion of these phages interact with multiple species in environments as diverse as the oceanic water column or the human gut. This observation challenges the traditional view of a narrow host spectrum of phages by unveiling that multihost associations are common across ecosystems, with implications for how they might impact ecology and evolution and phage therapy approaches.
Genome organization plays an important role in silencing compacted, heterochromatinized genes in the most virulent human malaria parasite, Plasmodium falciparum. However, it remains unclear how these genes spatially cluster or whether active genes are also organized in a specific manner. We used Micro-C to achieve near-nucleosome resolution DNA–DNA contact maps, which revealed previously undescribed inter- and intrachromosomal heterochromatic and euchromatic structures in the blood-stage parasite. We observed subtelomeric fold structures that facilitate interactions among heterochromatinized genes involved in antigenic variation. In addition, we identified long-range intra- and interchromosomal interactions among active, stage-specific genes. Both structures are mediated by AP2-P, an ApiAP2 DNA-binding factor, and a putative MORC chromatin remodeler, and functional specificity is achieved via combinatorial binding with other sequence-specific DNA-binding factors. This study provides insight into the organizational machinery used by this medically important eukaryotic parasite to spatially coordinate genes underlying antigenic variation and to co-activate stage-specific genes.
The composition of genomic sequences, such as GC content, nucleotide motifs, and repeats, varies from one species to another and within the same genome. Composition correlates with gene transcriptional activity and chromosome organization, and all genome sequences have coevolved with the chromatin-associated complexes they encode to precisely regulate these two features. However, when foreign DNA—including exogenous mobile elements and natural or artificial genesinvades or integrates a host nucleus, it encounters regulatory mechanisms and rules under which it has not evolved. How host cells process and eventually adopt these unfamiliar exogenous sequences remains largely unexplored.
Multiciliated cells (MCCs) ensure fluid circulation in various organs. Their differentiation is marked by the amplification of cilia-nucleating centrioles, driven by a genuine cell-cycle variant, which is characterized by wave-like expression of canonical and non-canonical cyclins such as Cyclin O (CCNO). Patients with CCNO mutations exhibit a subtype of primary ciliary dyskinesia called reduced generation of motile cilia (RGMC). Here, we show that Ccno is activated at the crossroads of the onset of MCC differentiation, the entry into the MCC cell-cycle variant, and the activation of the centriole biogenesis program. Its absence blocks the G1/S-like transition of the cell-cycle variant, interrupts the centriologenesis transcription program, and compromises the production of centrioles and cilia in mouse brain and human respiratory MCCs. Altogether, our study identifies CCNO as a core regulator of entry into the MCC cell-cycle variant and the interruption of this variant as one etiology of RGMC.
Meiosis, endoreplication, and asynthetic fissions are variations of the canonical cell cycle where either replication or mitotic divisions are muted. Here, we identify a cell cycle variantconserved across organs and mammals, where both replication and mitosis are muted, and that orchestrates the differentiation of post-mitotic progenitors into multiciliated cells (MCCs). MCC progenitors reactivate most of the cell cycle transcriptional program but replace the temporal expression of cyclins E2 and A2 with non-canonical cyclins O and A1. In addition, the primary APC/C inhibitor Emi1 is silenced. Re-expressing cyclins E2 and A2 and/or Emi1 can induce partial replication or mitosis. This shows that a cell can co-opt the cell cycle genetic program and regulate only certain elements to qualitatively and quantitatively divert CDK activity toward differentiation rather than division. We propose this cell cycle variant to exploit the existence of a cytoplasmic—or centriolar—CDK threshold lower than the S-phase threshold.
The tidyCoverage R package provides a framework for intuitive investigation of collections of genomic tracks over genomic features, relying on the principle of tidy data manipulation. It defines two data structures, CoverageExperiment and AggregatedCoverage classes, directly extending the SummarizedExperiment fundamental class, and introduces a principled approach to exploring genome-wide data. This infrastructure facilitates the extraction and manipulation of genomic coverage track data across individual or multiple sets of thousands of genomic loci. This allows the end user to rapidly visualize track coverage at individual genomic loci or aggregated coverage profiles over sets of genomic loci. tidyCoverage seamlessly combines with the existing Bioconductor ecosystem to accelerate the integration of genome-wide track data in epigenomic analysis workflows. tidyCoverage emerges as a valuable tool, contributing to the advancement of epigenomics research by promoting consistency, reproducibility, and accessibility in data analysis.
The growth of omic data presents evolving challenges in data manipulation, analysis and integration. Addressing these challenges, Bioconductor provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming offers a revolutionary data organization and manipulation standard. Here we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analyzing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas, spanning six data frameworks and ten analysis tools.
Genome-wide chromatin conformation capture assays provide formidable insights into the spatial organization of genomes. However, due to the complexity of the data structure, their integration in multi-omics workflows remains challenging. We present data structures, computational methods and visualization tools available in Bioconductor to investigate Hi-C, micro-C and other 3C-related data, in R. An online book (https://bioconductor.org/books/OHCA/) further provides prospective end users with a number of workflows to process, import, analyze and visualize any type of chromosome conformation capture data.
A multitude of proteins bind to DNA at regulatory regions to control the expression of neighbouring genes. Their binding can be revealed by chromatin accessibility assays such as DNaseseq, MNase-seq or ATAC-seq. Plotting the size of the sequencing fragments generated by these assays relative to the center of regulatory regions produce two-dimensional fragment density plots called V-plots. Such plots can reveal nucleosome positioning or transcription factor binding sites at regulatory regions. Here, we present VplotR, an R package to easily generate V-plots and one-dimensional footprint profiles over single or aggregated genomic loci of interest. The use of VplotR will improve our understanding of molecular organization at regulatory regions.
Periodic occurrences of oligonucleotide sequences can impact the physical properties of DNA. For example, DNA bendability is modulated by 10-bp periodic occurrences of WW (W = A/T) dinucleotides. We present periodicDNA, an R package to identify k-mer periodicity and generate continuous tracks of k-mer periodicity over genomic loci of interest, such as regulatory elements. periodicDNA will facilitate investigation and improve understanding of how periodic DNA sequence features impact function.
RNA profiling has provided increasingly detailed knowledge of gene expression patterns, yet the different regulatory ar- chitectures that drive them are not well understood. To address this, we profiled and compared transcriptional and regu- latory element activities across five tissues of Caenorhabditis elegans, covering ∼90% of cells. We find that the majority of promoters and enhancers have tissue-specific accessibility, and we discover regulatory grammars associated with ubiquitous, germline, and somatic tissue–specific gene expression patterns. In addition, we find that germline-active and soma-specific promoters have distinct features. Germline-active promoters have well-positioned +1 and −1 nucleosomes associated with a periodic 10-bp WW signal (W = A/T). Somatic tissue–specific promoters lack positioned nucleosomes and this signal, have wide nucleosome-depleted regions, and are more enriched for core promoter elements, which largely differ between tissues. We observe the 10-bp periodic WW signal at ubiquitous promoters in other animals, suggesting it is an ancient conserved signal. Our results show fundamental differences in regulatory architectures of germline and somatic tissue–specific genes, uncover regulatory rules for generating diverse gene expression patterns, and provide a tissue-specific resource for future studies.
Nuclear compartments have diverse roles in regulating gene expression, yet the molecular forces and components that drive compartment formation remain largely unclear1. The long non-coding RNA Xist establishes an intra-chromosomal compartment by localizing at a high concentration in a territory spatially close to its transcription locus2 and binding diverse proteins3–5 to achieve X-chromosome inactivation (XCI)6,7. The XCI process therefore serves as a paradigm for understanding how RNA-mediated recruitment of various proteins induces a functional compartment. The properties of the inactive X (Xi)-compartment are known to change over time, because after initial Xist spreading and transcriptional shutoff a state is reached in which gene silencing remains stable even if Xist is turned off8. Here we show that the Xist RNA-binding proteins PTBP19, MATR310, TDP-4311 and CELF112 assemble on the multivalent E-repeat element of Xist7 and, via self-aggregation and heterotypic protein–protein interactions, form a condensate1 in the Xi. This condensate is required for gene silencing and for the anchoring of Xist to the Xi territory, and can be sustained in the absence of Xist. Notably, these E-repeat-binding proteins become essential coincident with transition to the Xist-independent XCI phase8, indicating that the condensate seeded by the E-repeat underlies the developmental switch from Xist-dependence to Xist-independence. Taken together, our data show that Xist forms the Xi compartment by seeding a heteromeric condensate that consists of ubiquitous RNA-binding proteins, revealing an unanticipated mechanism for heritable gene silencing.
Cancer is characterized by genomic instability leading to deletion or amplification of oncogenes or tumor suppressors. However, most of the altered regions are devoid of known cancer drivers. Here, we identify lncRNAs frequently lost or amplified in cancer. Among them, we found amplified lncRNA associated with lung cancer (ALAL-1) as frequently amplified in lung adenocarcinomas. ALAL-1 is also overexpressed in additional tumor types, such as lung squamous carcinoma. The RNA product of ALAL-1 is able to promote the proliferation and tumorigenicity of lung cancer cells. ALAL-1 is a TNFα−and NF-κB–induced cytoplasmic lncRNA that specifically interacts with SART3, regulating the subcellular localization of the protein deubiquitinase USP4 and, in turn, its function in the cell. Interestingly, ALAL-1 expression inversely correlates with the immune infiltration of lung squamous tumors, while tumors with ALAL-1 amplification show lower infiltration of several types of immune cells. We have thus unveiled a pro-oncogenic lncRNA that mediates cancer immune evasion, pointing to a new target for immune potentiation
An essential step for understanding the transcriptional circuits that control development and physiology is the global identification and characterization of regulatory elements. Here, we present the first map of regulatory elements across the development and ageing of an animal, identifying 42,245 elements accessible in at least one Caenorhabditis elegans stage. Based on nuclear transcription profiles, we define 15,714 protein-coding promoters and 19,231 putative enhancers, and find that both types of element can drive orientation-independent transcription. Additionally, more than 1000 promoters produce transcripts antisense to protein coding genes, suggesting involvement in a widespread regulatory mechanism. We find that the accessibility of most elements changes during development and/or ageing and that patterns of accessibility change are linked to specific developmental or physiological processes. The map and characterization of regulatory elements across C. elegans life provides a platform for understanding how transcription controls development and ageing.
Since the discovery of chromosome territories, it has been clear that DNA within the nucleus is spatially organized. During the last decade, a tremendous body of work has described architectural features of chromatin at different spatial scales, such as A/B compartments, topologically associating domains (TADs), and chromatin loops. These features correlate with domains of chromatin marking and gene expression, supporting their relevance for gene regulation. Recent work has highlighted the dynamic nature of spatial folding and investigated mechanisms of their formation. Here we discuss current understanding and highlight key open questions in chromosome organization in animals.