Introduction

This guide is intended to help members of the Larman Lab at JHU to get up and running with phipmake on MARCC, with help of the SLURM task scheudler. Some of the steps are generalizable and others are specific to the Larman Lab.

Some definitions:

  • drake is an R package written by Will Landau and part of rOpenSci designed to support reproducible data analysis workflows. drake keep track of which steps of a data analysis pipeline are out-of-date, so that you can update only those steps without re-running all of your code from scratch.
  • phipmake is an R package written to use tools from drake to implement the Larman Lab’s PhIP-Seq analysis pipeline.
  • plans are workflow recipes, consisting of objects to be build (targets) and instructions for building those objects (commands).
  • targets are individual objects in a drake workflow to be built.

Setup

1. MARCC

For usage with PhIPdb, the following R package installation steps should occur via a secure shell (SSH) connection to MARCC, using an account that is part of the hlarman1 group. For each step, I provide an option using interactive mode and using MARCC’s SLURM task scheduler.

Interactive

Alternatively, to install packages in an interactive session, load R using the following two commands:

Note, for heavy computation, you should not use the default login node. You can request an interactive session on a compute node with srun, example below. However, installing R packages interactively from a compute node on MARCC causes some problems.

2. Install drake and phipmake from GitHub:

Interactive

This should install drake, phipmake, and all required dependencies.

3. Install optional depdendencies for parallelization.

Interactive

drake uses future to parellalize plans so that multiple targets can be generated simultaneously. phipmake uses future.apply to parallelize within certain individual targets.

Usage on MARCC

Parameter Setup

phipmake relies on a parameters file, usually a file called drake_params.tsv and stored in the root directory of a project in PhIPdb/ProcessedData. For example, PhIPdb/ProcessedData/phipseq_0100/drake_params.tsv.

To create drake_params.tsv with default parameters for a new PhIP-Seq screen interactively, you can use the following command from phipmake. Alternatively, it’s fine to copy and modify a drake_params.tsv file from a previous project.:

This will write drake_params.tsv to the specified directory, expecting a counts file named counts.csv, an enrichment file called enrichment.csv. write_drake_params also assumes a default enrichment threshold of 5, a metadata path of /data/hlarman1/PhIPdb/Metadata/PeptideLibraries, and a desired output file extension of tsv. All of these options can be modified via parameters passed to write_drake_params.

Running phipmake

Navigate to the phipmake software directory in PhIPdb and use the provided shell script.

Explanation of some of the parameters exported in the sbatch command:

  • plans specifies the four sub-workflows in the Larman Lab’s phipmake pipeline that can be toggled on/off. Targets are delimited by hyphens “-”. For example, to run only Enrichment and Polyclonal parts of the pipeline, excluding Counts and AVARDA, set plans="Enrichment-Polyclonal".
  • targs_to_clean specifies targets of the phipmake plan to force to be rebuilt, even if drake thinks that that target is up-to-date. These targets are also delimited by hyphens “-”. For example, to indicate that data pulled from annotation files must be updated, set targs_to_clean="counts_annotations-enrichment_annotations". This will delete drake’s cache for these two targets, and so these targets and all downstream dependencies should be rebuilt.

Types of Output Files

Maintenance & Updating

Updating annotation files

Adding a New Library

BLAST+, Tidying

Annotations

Github Pull Requests

PR’s are welcome.