Primary url:
https://ribocrypt.org/ (This leads to browser page)
Welcome to RiboCrypt
RiboCrypt
is an R
package for interactive visualization in genomics.
RiboCrypt
works with any NGS-based method, but much
emphasis is put on Ribo-seq data visualization.
This tutorial will walk you through usage of the app.
RibCrypt
app currently supports creating interactive
browser views for NGS tracks, using ORFik, Ribocrypt and massiveNGSpipe
as backend.
The browser is the main coverage plot display page. It contains a click panel on the left side and display panels on the right. It displays coverage of NGS data in either transcript coordinates (default), or genomic coordinates (like IGV). Each part will now be explained:
The display panel shows the primary settings, (study, gene, sample, etc), the possible select boxes are:
Each experiment usually have multiple libraries. Select which one to display, by default if you select multiple libraries they will be shown under each other.
Library are by default named:
The resuting name above could be:
A normal thing to see is that if condition is KO (knockout), the fraction column usually contains a gene name (the name of the gene that was knocked out) Currently, best way to find SRR run number for respective sample is to go to metadata tab and search for the study.
Here additional options are shown:
From the options specified in the display panel, when you press “plot” the data will be displayed. It contains the specific parts:
Here we collect the analysis possibilities, which are usually on whole genome scale.
This tab displays a heatmap of codons dwell times over all genes selected, for both A and P sites. When pressing “Differential” you swap to a between library differential codon dwell time comparison (minimum 2 libraries selected is required for this method!)
Study and gene select works same as for browser specified above. In addition to have the option to specify all genes (default). - Select libraries (multiple allowed)
This tab displays a heatmap of coverage per readlength at a specific region (like start site of coding sequences) over all genes selected.
Study and gene select works same as for browser specified above. In addition to have the option to specify all genes (default).
Here additional options are shown:
5’ extension (extend viewed window upstream from point, default 30)
3’ extension (extend viewed window downstreamfrom point, default 30)
Extension works like this, first extend to transcript coordinates.
After gene end extend in genomic coordinates
If chromosome boundary is reached, remove those genes from the full set.
Given an experiment with a least 1 design column with two values, like wild-type (WT) vs knock out (of a specific gene), you can run differential expression of genes. The output is an interactive plot, where you can also search for you target genes, making it more useable than normal expression plots, which often are very hard to read.
Organism and experiment explained above - Differential method: FPKM ratio is a pure FPKM ratio calculation without factor normalization (like batch effects), fast and crude check. DESeq2 argument gives a robust version, but only works for experiments with valid experimental design (i.e. design matrix must be full ranked, see deseq2 tutorial for details!) - Select two conditions (which 2 factors to group by)
Display all samples for a specific organism over selected gene. This tab does not use bigwig files to load (as that would be very slow). It uses precomputed fst files of coverage over all libraries. Note: Not all isoforms are computed, by default the longest isoform is computed.
Organism, experiment and gene explained above - Group on: the metadata column to order plot by - K-means clusters: How many k-means clusters to use, if > 1, Group will be sorted within the clusters, but K-means have priority.
This tab gives the statistics of over representation analysis per cluster. Using chi squared test, it gives the residuals per term from metadata (like tissue, cell-line etc). If a value is bigger than +/- 3, it means it is quite certain this is over represented.
If no clustering was applied, this tab gives the number of items per metadata term (40 brain samples, 30 kidney samples etc).
This mode is very intensive on CPU, so it requires certain pre-computed results for the back end. That is namely: - Premade collection experiments (an ORFik experiment of all experiments per organism) - Premade collection count table and library sizes (for normalizations purpose) - Premade fst serialized coverage calculation per gene (for instant loading of coverage over thousands of libraries)
Note that on the live app, the human collection (4000 Ribo-seq samples) takes around 30 seconds to plot for a ~ 2K nucleotides gene, ~99% of the time is spent on rendering the plot, not actual computation. Future investigation into optimization will be done.
This tab displays a QC of pshifted coverage per readlength (like start site of coding sequences) over all genes selected.
The display panel shows what can be specified to display, the possible select boxes are same as for heatmap above:
From the options specified in the display panel, when you press “plot” the data will be displayed. It contains the specific parts:
Top plot: Read length relative usage
Bottom plot: Fourier transform (3nt periodicity quality, clean peak means good periodicity)
This tab displays the fastq QC output from fastp, as a html page.
The display panel shows what can be specified to display, you can select from organism, study and library.
Displays the html page.
Metadata tab displays information about studies and custom predictions. ## SRA search
Search SRA for full information of supported study
Here you input a study accession number in the form of either:
On top the abstract of the study is displayed, and on bottom a table of all metadata found from the study is displayed.
Full table of supported studies with information about sample counts
Full list of predicted translons on all_merged tracks per species.
Translon annotation scheme:
All files are packed into ORFik experiments for easy access through the ORFik backend package:
File formats used internally in experiments are:
For our webpage the processing pipeline used is massiveNGSpipe which wraps over multiple tools:
If you’re not familiar with terms like “p-shifting” or “p-site offset”, it’s best to walk through ORFikOverview vignette, especially chapter 6 “RiboSeq footprints automatic shift detection and shifting”
RiboCrypt uses the shiny router API system for creating runable links and backspacing etc. The API specificiation is the following:
https://ribocrypt.org/ (This leads to browser page)
Page selection is done with “#” followed by the page short name, the list is the following:
Example: https://ribocrypt.org/#tutorial sends you to this tutorial page
Settings can be specified by using the standard web parameter API:
Example: https://RiboCrypt.org/?dff=all_merged-Homo_sapiens&gene=ATF4-ENSG00000128272#browser will lead you to browser and insert gene ATF4 (all other settings being default).
A more complicated call would be: https://RiboCrypt.org/?dff=all_merged-Homo_sapiens&gene=ATF4-ENSG00000128272&tx=ENST00000404241&frames_type=area&kmer=9&go=TRUE&extendLeaders=100&extendTrailers=100&viewMode=TRUE&other_tx=TRUE#browser
This app is created as a collaboration with:
Main authors and contact: