epitopefindr_avarda_vignette.Rmd
This vignette provides a walthrough of using
epitopefindr
to process an example dataset included with
the package, epitopefindr::pairwise_viral_hits
.
The dataset has been previously explored by Monaco et al. Biorxiv and describes a set of phage-displayed peptide sequences for which the patient’s antibody specificity demonstrated a statistically significant change when comparing the patient’s data from the PhIP-seq assay.
epitopefindr
may be useful to look for sequence
similarity among these enriched peptides as a step towards identifying
critical sequence motifs.
We can see that pairwise_viral_hits
is a
Biostrings::AAStringSet object consisting of the 159 pairwise enriched
peptides. These peptides have been annotated with taxonomic and protein
name information. This is our input to epitopefindr
.
library(epitopefindr)
library(magrittr)
epitopefindr::pairwise_viral_hits
epitopefindr::pairwise_viral_hits
#> AAStringSet object of length 159:
#> width seq names
#> [1] 56 GWTVQAVEVVENFLPVPGSTMD...HPSFAGLIDPSLDQEDFNAVLD Phlebovirus_Sandf...
#> [2] 56 YPRQMMHPSFAGLIDPSLDQED...FMFSKTINVSLRGAQKRDIEES Phlebovirus_Sandf...
#> [3] 56 GLRTSGEEVEELEYERLISSLS...LAVKERGLPYKVRLEKALMSGI Mammarenavirus_Pi...
#> [4] 56 MADEALYVYLEGPGATLPEQQQ...TLYPRGVALLSLRLSIIIPRGY Mastadenovirus_Hu...
#> [5] 56 IGEFSQEGYLLRPRLAKTELYF...VLEHTRDQLLSVGDVFDESRMA Parapoxvirus_Pseu...
#> ... ... ...
#> [155] 56 MEPRPGASTRRPEGRPQREPAP...DSIYSEADTEVGGRGDLRPPLT Simplexvirus_Huma...
#> [156] 28 PIIATSDPTPRRDAATKSRRRRPHSRRL Simplexvirus_Huma...
#> [157] 56 TKKGTSYPKLSKSYTNNKGKEV...SEQQSLYQNADAYVSVGSSKYN Influenzavirus A_...
#> [158] 56 TKKGTSYPKLSKSYTNNKGKEV...SEQQTLYQNVDAYVSVGSSKYN Influenzavirus A_...
#> [159] 56 MDVLSKSSLKELLAHLERTPLE...QNVLISRNEYYNQPYPDVTSLI Seadornavirus_Ban...
names(epitopefindr::pairwise_viral_hits)[1]
#> [1] "Phlebovirus_Sandfly fever sicilian virus_Nucleocapsid_1297"
epitopefindr
The all-in-one wrapper for running epitopefindr
is
epfind
. At its most basic, epfind
takes two
input parameters, (1) a set of named peptides, either as an AAStringSet
object or as a directory path to a .fasta file that can be read by
Biostrings::readAAStringSet, and (2) a path to which output files can be
written. More parameters for customization are documented at
?epfind
.
output.dir <- "AVARDA_vignette_data/"
msa.path <- paste0(output.dir,"/msa.pdf")
if(!file.exists(msa.path)){
epitopefindr::epfind(epitopefindr::pairwise_viral_hits, output.dir,
e.thresh = 0.0001, min.groupsize = 6)
unlink(paste0(output.dir,"intermediate_files"), recursive = TRUE, force = TRUE)
}
We can print some of the sequence motifs from this output. Consider
MSA 11 as an example. This 18 amino acid motif is represented in 7 of
our input peptides. Many of these positions are fully conserved, but
positions 3, 5, and 17 are only partially conserved. This illustrates
that epitopefindr
makes use of BLAST to retain evidence for
certain motifs that would be discarded with a gapped k-mer approach.
#convert pdf to png
print.range <- 11
imgdir <- paste0(output.dir,"/png/")
if(!dir.exists(imgdir)) dir.create(imgdir)
imgnames <- paste0(imgdir,"msa-",print.range,".png")
pdftools::pdf_convert(msa.path, format = "png", pages = print.range,
filenames = imgnames, verbose = FALSE)
#> Warning in sprintf(filenames, pages, format): 2 arguments not used by format
#> 'AVARDA_vignette_data//png/msa-11.png'
The output of epitopefindr
can be fed into network
graphing. Each vertex represents a fragment of an input sequence. Each
link represents an alignment between two peptide fragments. Group 11 is
highlighted in blue.
aln <- data.table::fread(paste0(output.dir,"/finalAlignments.csv"))
ver <- data.table::fread(paste0(output.dir,"/epitopeSummary.csv"), header = TRUE)
g11 <- ver[,c("id","11")]
g11 <- g11[!grep("NA$",g11$id),]
g11$color <- ifelse(g11$`11` == "", "red","blue")
set.seed(1)
g <- igraph::graph_from_data_frame(aln[,1:2], directed = FALSE, vertices = g11) %>%
(igraph::simplify) %>% plot(vertex.label = NA, vertex.size = 3,
arrow.size = 0, vertex.color = g11$color)