Create and document a new S4 class AggregatedCoverage;
Put together the set of functions from Day 1, 2 and 3 into a single constructor of AggregatedCoverage objects;
Create a method to handle plotting for your new S4 class
Tip
At any time, if you are lost or do not understand how functions in theproposed solution work, type ?<function> in the R console and a help menu will appear.
You can also check the help tab in the corresponding quadrant.
Reminder
We aim to create a package which can plot the aggregated coverage of a genomic track over a set of GRanges of interest (at a fixed width).
4.1 Framework
Functions developed in Day 2 work well for a specific couple of .bigwig/.bed files. However, the more used to Bioconductor you get, the more you will realize that you might already have your genomic features of interest (TSSs, genebodies, regulatory elements, binding motifs, …) already imported in R.
To make this package more “usable” by a broader Bioconductor audience, we will move to a more “Bioconductor-friendly” framework. We will create a single S4 class named AggregatedCoverage. The AggregatedCoverage constructor function will take 3 inputs:
The path to a single bigwig file;
The path to a single feature file (e.g. a bed or a narrowPeak file);
A width to use to recover the coverage around the center of each feature.
The coverage (from the bigwig file) over each genomic feature (from the feature file) will be extracted, then the mean signal +/- confidence interval (CI) scores will be computed, similarly to what has been done in Day 2.
The AggregatedCoverage class will be a direct adaptation of the SummarizedExperiment class. It will contain a colData (refering to the “samples”, i.e. the bigiwg file imported as RleList), a rowData (describing the genomic distance to the center of the each genomic range of interest) and exactly 3 assays: mean, lowCI and upCI.
4.2 Preparing colData
The colData slot of a SummarizedExperiment-derived object should be a data.frame. Each row represents an individual sample and each column describes a variable associated with each sample.
Question
Create a colData object for a single sample, e.g. the Scc1-vs-inputbigwig file.
The rowData slot of a SummarizedExperiment-derived object should also be a data.frame. In our case, each row will represent the genomic distance to the center of the set of genomic ranges of interest.
Question
Create a data.frame which contains a single distance column, which will be a numerical vector centered at 0 and whose length is half of a provided width variable (e.g. for width == 2000, the distance column would be a sequence from -1000 to 999 (length of 2000))
Answer
width <-2000rowData <-data.frame(distance =seq(-width/2, width/2-1, by =1))
4.4 Preparing assays
The assays stored in a SummarizedExperiment-derived object should be a list of numerical matrices, with as many rows as rows in the matching rowData and as many columns as rows in the matching colData.
In our case, each matrix will represent a different metric:
The mean signal of a sample (from colData) at the corresponding distance from a genomic range of interest (from rowData)
The upper confidence intervale value of a signal of a sample (from colData) at the corresponding distance from a genomic range of interest (from rowData)
The lower confidence intervale value of a signal of a sample (from colData) at the corresponding distance from a genomic range of interest (from rowData)
Question
Prepare a list of three matrices as indicated hereabove. Use the functions defined in the previous exercises and the datasets provided as extdata.
Each matrix should have 1 column, since there is only 1 example bigwig.
Each matrix should have as many rows as the chosen width variable.
Everything is now ready to manually create an AggregatedCoverage object.
Question
Define a new AggregatedCoverage class, containing a SummarizedExperiment-derived object with two additional slots, named features (storing a GRanges object corresponding to the imported genomic loci of interest) and width (storing the width at which each genomic locus is resized).
You can now manually create a new AggregatedCoverage object with the methods::new() function. Start by creating a SummarizedExperiment, and then fill out the two extra slots manually.
Tip: You can edit the slots using the object@slot <- ... notation.
Instead of manually using the methods::new() function, developers generally provide constructor functions to initiate an S4 object.
Question
Wrap all the previous steps together in a single constructor function. By convention, this constructor function is named after the class of object it creates, here AggregatedCoverage.
The constructor should take three arguments:
The path to a single bigwig file;
The path to a single feature file (e.g. a bed or a narrowPeak file);
A width to use to recover the coverage around the center of each feature.
4.7 Implement this class and constructor in the package
Question
Add the new class definition to your package source code in a file named AllClasses.R.
Add the new constructor to your package source code in a file named AggregatedCoverage.R.
Document each file accordingly.
4.8 Plot AggregatedCoverage objects
Now that we have a dedicated class to store aggregated coverage signal metrics over a set of genomic features of interest, we can provide a plot method that will be used to dispatch an AggregatedCoverage object to the right function when passed to the plot() generic function.
Question
Create a method with setMethod(<generic>, <class>, <fct>) to plot AggregatedCoverage objects. You can reuse the plotCoverage() function defined in Day 2.