larmanguide.Rmd
This guide is intended to help members of the Larman Lab at JHU to get up and running with phipmake
on MARCC, with help of the SLURM task scheudler. Some of the steps are generalizable and others are specific to the Larman Lab.
Some definitions:
drake
is an R package written by Will Landau and part of rOpenSci designed to support reproducible data analysis workflows. drake
keep track of which steps of a data analysis pipeline are out-of-date, so that you can update only those steps without re-running all of your code from scratch.phipmake
is an R package written to use tools from drake
to implement the Larman Lab’s PhIP-Seq analysis pipeline.drake
workflow to be built.For usage with PhIPdb, the following R package installation steps should occur via a secure shell (SSH) connection to MARCC, using an account that is part of the hlarman1 group. For each step, I provide an option using interactive mode and using MARCC’s SLURM task scheduler.
To install packages using the SLURM scheduler, use the resources in PhIPdb/Software/install_r_pkg:
Alternatively, to install packages in an interactive session, load R using the following two commands:
Note, for heavy computation, you should not use the default login node. You can request an interactive session on a compute node with srun
, example below. However, installing R packages interactively from a compute node on MARCC causes some problems.
drake
and phipmake
from GitHub:# SLURM
sbatch --export=pkg="ropensci/drake" installGithub.sh
sbatch --export=pkg="brandonsie/phipmake" installGithub.sh
This should install drake
, phipmake
, and all required dependencies.
# Interactive
if(!requireNamespace("remotes")) install.packages("remotes")
# drake is on CRAN, but phipmake depends on the more recently updated Github version.
remotes::install_github("ropensci/drake")
remotes::install_github("brandonsie/phipmake")
This should install drake
, phipmake
, and all required dependencies.
# Interactive
install.packages("future")
install.packages("future.apply")
drake
uses future
to parellalize plans so that multiple targets can be generated simultaneously. phipmake
uses future.apply
to parallelize within certain individual targets.
phipmake
relies on a parameters file, usually a file called drake_params.tsv and stored in the root directory of a project in PhIPdb/ProcessedData. For example, PhIPdb/ProcessedData/phipseq_0100/drake_params.tsv.
To create drake_params.tsv with default parameters for a new PhIP-Seq screen interactively, you can use the following command from phipmake
. Alternatively, it’s fine to copy and modify a drake_params.tsv file from a previous project.:
# Replace phipseq_9999 with appropriate project ID.
phipmake:::write_drake_params(
dir = "/data/hlarman1/PhIPdb/ProcessedData/phipseq_9999",
screen_name = "phipseq_9999")
This will write drake_params.tsv to the specified directory, expecting a counts file named counts.csv, an enrichment file called enrichment.csv. write_drake_params
also assumes a default enrichment threshold of 5, a metadata path of /data/hlarman1/PhIPdb/Metadata/PeptideLibraries, and a desired output file extension of tsv. All of these options can be modified via parameters passed to write_drake_params.
phipmake
Navigate to the phipmake software directory in PhIPdb and use the provided shell script.
cd /data/hlarman1/PhIPdb/Software/phipmake
#Setup variables for sbatch exports
pdir="/home-1/bsie1@jhu.edu/data/PhIPdb/ProcessedData/"
screen="phipseq_0101"
plans="Counts-Enrichment-Polyclonal-AVARDA"
targs_to_clean="NULL"
pjobs=4
# Some notes about above variables
# screen: changes from project to project
# plans and targs_to_clean four possible plans are delimited by "-". See additional notes below.
# pjobs: can be set to 1 to not parallelize and not depeend on future or future.apply
sbatch --export=wd=$pdir$screen,plan=$plans,clean=$targs_to_clean,njobs=$pjobs runphipmake.sh
Explanation of some of the parameters exported in the sbatch
command:
plans
specifies the four sub-workflows in the Larman Lab’s phipmake
pipeline that can be toggled on/off. Targets are delimited by hyphens “-”. For example, to run only Enrichment and Polyclonal parts of the pipeline, excluding Counts and AVARDA, set plans="Enrichment-Polyclonal"
.targs_to_clean
specifies targets of the phipmake
plan to force to be rebuilt, even if drake
thinks that that target is up-to-date. These targets are also delimited by hyphens “-”. For example, to indicate that data pulled from annotation files must be updated, set targs_to_clean="counts_annotations-enrichment_annotations"
. This will delete drake’s cache for these two targets, and so these targets and all downstream dependencies should be rebuilt.