4 Using S4 classes in a package

Aims

Create and document a new S4 class AggregatedCoverage;
Put together the set of functions from Day 1, 2 and 3 into a single constructor of AggregatedCoverage objects;
Create a method to handle plotting for your new S4 class

Tip

At any time, if you are lost or do not understand how functions in the proposed solution work, type ?<function> in the R console and a help menu will appear.

You can also check the help tab in the corresponding quadrant.

Reminder

We aim to create a package which can plot the aggregated coverage of a genomic track over a set of GRanges of interest (at a fixed width).

4.1 Framework

Functions developed in Day 2 work well for a specific couple of .bigwig/.bed files. However, the more used to Bioconductor you get, the more you will realize that you might already have your genomic features of interest (TSSs, genebodies, regulatory elements, binding motifs, …) already imported in R.

To make this package more “usable” by a broader Bioconductor audience, we will move to a more “Bioconductor-friendly” framework. We will create a single S4 class named AggregatedCoverage. The AggregatedCoverage constructor function will take 3 inputs:

The path to a single bigwig file;
The path to a single feature file (e.g. a bed or a narrowPeak file);
A width to use to recover the coverage around the center of each feature.

The coverage (from the bigwig file) over each genomic feature (from the feature file) will be extracted, then the mean signal +/- confidence interval (CI) scores will be computed, similarly to what has been done in Day 2.

The AggregatedCoverage class will be a direct adaptation of the SummarizedExperiment class. It will contain a colData (refering to the “samples”, i.e. the bigiwg file imported as RleList), a rowData (describing the genomic distance to the center of the each genomic range of interest) and exactly 3 assays: mean, lowCI and upCI.

4.2 Preparing colData

The colData slot of a SummarizedExperiment-derived object should be a data.frame. Each row represents an individual sample and each column describes a variable associated with each sample.

Question

Create a colData object for a single sample, e.g. the Scc1-vs-input bigwig file.

Answer

bw_file <- system.file("extdata", "Scc1-vs-input.bw", package = "JacquesTestPackage")
colData <- data.frame(
    file = bw_file,
    sample = 'Scc1-vs-input', 
    type = 'ChIPseq', 
    target = 'Scc1' 
)

4.3 Preparing rowData

The rowData slot of a SummarizedExperiment-derived object should also be a data.frame. In our case, each row will represent the genomic distance to the center of the set of genomic ranges of interest.

Question

Create a data.frame which contains a single distance column, which will be a numerical vector centered at 0 and whose length is half of a provided width variable (e.g. for width == 2000, the distance column would be a sequence from -1000 to 999 (length of 2000))

Answer

width <- 2000
rowData <- data.frame(
    distance = seq(-width/2, width/2-1, by = 1)
)

4.4 Preparing `assays`

The assays stored in a SummarizedExperiment-derived object should be a list of numerical matrices, with as many rows as rows in the matching rowData and as many columns as rows in the matching colData.

In our case, each matrix will represent a different metric:

The mean signal of a sample (from colData) at the corresponding distance from a genomic range of interest (from rowData)
The upper confidence intervale value of a signal of a sample (from colData) at the corresponding distance from a genomic range of interest (from rowData)
The lower confidence intervale value of a signal of a sample (from colData) at the corresponding distance from a genomic range of interest (from rowData)

Question

Prepare a list of three matrices as indicated hereabove. Use the functions defined in the previous exercises and the datasets provided as extdata.

Each matrix should have 1 column, since there is only 1 example bigwig.
Each matrix should have as many rows as the chosen width variable.

Answer

bed_file <- system.file("extdata", "Scc1-peaks.narrowPeak", package = "JacquesTestPackage")
l <- importFiles(bw_file, bed_file, width = width) |> 
    filterGRanges()
df <- computeCoverage(l)

assays <- list(
    'mean' = matrix(df$mean, ncol = 1), 
    'upCI' = matrix(df$ci_high, ncol = 1), 
    'lowCI' = matrix(df$ci_low, ncol = 1)
)

4.5 Build an `AggregatedCoverage` object

Everything is now ready to manually create an AggregatedCoverage object.

Question

Define a new AggregatedCoverage class, containing a SummarizedExperiment-derived object with two additional slots, named features (storing a GRanges object corresponding to the imported genomic loci of interest) and width (storing the width at which each genomic locus is resized).

Answer

library(SummarizedExperiment)
methods::setClass(
    "AggregatedCoverage", 
    contains = c("SummarizedExperiment"), 
    slots = list(
        features = 'GRanges', 
        width = 'integer'
    )
)

Question

You can now manually create a new AggregatedCoverage object with the methods::new() function. Start by creating a SummarizedExperiment, and then fill out the two extra slots manually.

Tip: You can edit the slots using the object@slot <- ... notation.

Answer

AC <- methods::new(
    "AggregatedCoverage",
    SummarizedExperiment::SummarizedExperiment(
        rowData = rowData,
        colData = colData,
        assays = assays
    )
)
AC@features <- l[['features']]
AC@width <- as.integer(width)
AC

4.6 Build an `AggregatedCoverage` constructor

Instead of manually using the methods::new() function, developers generally provide constructor functions to initiate an S4 object.

Question

Wrap all the previous steps together in a single constructor function. By convention, this constructor function is named after the class of object it creates, here AggregatedCoverage.

The constructor should take three arguments:

The path to a single bigwig file;
The path to a single feature file (e.g. a bed or a narrowPeak file);
A width to use to recover the coverage around the center of each feature.

Answer

AggregatedCoverage <- function(bw_file, features_file, width) {
    colData <- data.frame(
        file = bw_file
    )
    rowData <- data.frame(
        distance = seq(-width/2, width/2-1, by = 1)
    )
    l <- importFiles(bw_file, features_file, width = width) |> 
        filterGRanges()
    df <- computeCoverage(l)
    assays <- list(
        'mean' = matrix(df$mean, ncol = 1), 
        'upCI' = matrix(df$ci_high, ncol = 1), 
        'lowCI' = matrix(df$ci_low, ncol = 1)
    )
    AC <- methods::new(
        "AggregatedCoverage",
        SummarizedExperiment::SummarizedExperiment(
            rowData = rowData,
            colData = colData,
            assays = assays
        )
    )
    AC@features <- l[['features']]
    AC@width <- as.integer(width)
    return(AC)
}
AC <- AggregatedCoverage(bw_file, bed_file, width)
AC

4.7 Implement this class and constructor in the package

Question

Add the new class definition to your package source code in a file named AllClasses.R.
Add the new constructor to your package source code in a file named AggregatedCoverage.R.
Document each file accordingly.

4.8 Plot `AggregatedCoverage` objects

Now that we have a dedicated class to store aggregated coverage signal metrics over a set of genomic features of interest, we can provide a plot method that will be used to dispatch an AggregatedCoverage object to the right function when passed to the plot() generic function.

Question

Create a method with setMethod(<generic>, <class>, <fct>) to plot AggregatedCoverage objects. You can reuse the plotCoverage() function defined in Day 2.

Answer

setMethod("plot", "AggregatedCoverage", function(x) {
    df <- lapply(seq_len(ncol(x)), function(K) {
        data.frame(
            file = SummarizedExperiment::colData(x)$file[K],
            K = K,
            distance = SummarizedExperiment::rowData(x)$distance,
            score = SummarizedExperiment::assay(x, 'score')[, K],
            upCI = SummarizedExperiment::assay(x, 'upCI')[, K], 
            lowCI = SummarizedExperiment::assay(x, 'lowCI')[, K]
        )
    }) |> dplyr::bind_rows()
    ggplot(df, mapping = aes(
        x = distance, y = score, ymin = lowCI, ymax = upCI,
        col = basename(file), fill = basename(file)
    )) +
        geom_line() +
        geom_ribbon(col = NA, alpha = 0.2)
})

plot(AC)

Question

Add this method to your package source code in a file named AllMethods.R.
Document the method accordingly.

4.1 Framework

4.2 Preparing colData

4.3 Preparing rowData

4.4 Preparing assays

4.5 Build an AggregatedCoverage object

4.6 Build an AggregatedCoverage constructor

4.7 Implement this class and constructor in the package

4.8 Plot AggregatedCoverage objects

4.4 Preparing `assays`

4.5 Build an `AggregatedCoverage` object

4.6 Build an `AggregatedCoverage` constructor

4.8 Plot `AggregatedCoverage` objects