Frequently Asked Questions

What is LENS?

LENS (Landscape of Effective Neoantigens Software) is a workflow designed for predicting potentially therapuetically targetable tumor-specific and tumor-associated antigens from short-read sequencing data.

What antigen sources are supported?

Somatic single nucleotide variants (SNVs)
Somatic insertion and deletion variants (InDels)
Splice variants
Fusion events
Viruses
Endgoenous Retroviruses (ERVs)
Cancer-testis antigens (CTAs)

What is RAFT? How is it different from LENS?

RAFT is a nextflow-based workflow manager that runs LENS. RAFT is used to run the LENS workflow, but RAFT is also capable of running other workflows (see raft.py available-workflows). More information about RAFT can be found in the RAFT Documentation.

What input files are required?

Users must provide a manifest file (see Manifest) as well as multiple samples per patient. These samples include:

DNA normal sample (WES/WXS or WGS)
DNA tumor sample (WES/WXS or WGS)
RNA tumor sample

Optionally, a patient-matched, tissue-matched RNA normal sample can be provided. If such a sample is not available, then a non-patient-matched, tissue-specific (e.g. GTEx) sample can be provided.

What sequencing technologies are supported?

Illumina-based short read (50 bp - 150 bp) sequencing is currently supported. A long-read based tumor antigen workflow is in development.

How can I run LENS with my own reference files?

Yes! See the Usage page for more information.

How can I run LENS using different tools?

Yes! RAFT allows users to utilize alternative (supported) tools by changing the tool parameters within main.nf. See the Usage page for more information.

How do I know which tool versions are running?

Tool versions are available in the Technical Details page.

How do I know which Docker iamge tools are running?

RAFT (the workflow manager running LENS) relies upon containerization software (Docker or Singularity/Apptainer) for running its processes. The specific container being used for each process can be found in your LENS project’s nextflow.config file (/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config). Each process defined within RAFT’s modules (the directories in workflow/) has one or more Nextflow label directives assigned to it. For example the fastp process includes the label fastp_container. If we search for this label within the nextflow.config file then we find the lines:

This reveals that the fastp process is being run using BioContainer’s fastp v0.23.1 Docker image.

How do I change which tool versions are used?

Tool verions can be changed by utilizing an alternative Docker image for processes being run by a specific tool. Docker images can be found either on DockerHub or BioContainers. The tool’s corresponding process labels will need to be modified in the nextflow.config file (/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config).

Note

Docker images may differ in construction (e.g. where binaries are located within the container), so some modifications to process definitions within each tool’s RAFT module (in /path/to/raft/projects/<PROJECT_ID>/workflow/) may be required to the corresponding process definitions.

How do I change resource allocations for each tool?

Resource allocations (cpus and memory directives) can be modified in the nextflow.config file (/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config).

How is the workflow code in LENS executed?

See Technical Details for more information.

How is CTA gene list defined?

The CTA (cancer-testis antigens) gene list (cta_and_self_antigen.homo_sapiens.gene_list) consists of CTAs defined referencing the CTDatabase (http://www.cta.lncc.br/) list of genes. Note that some genes have been added to the CTDatabase base set.

Why are some of my CTA transcripts expressed in normal tissues?

The CTAs defined in the CTDatabase reference list are not necessarily testis-specific. To address this, LENS outputs contains columns that can be used to further filter CTA pMHCs (such as gene_detectable_normal_tissues defined using Human Protein Atlas data).

What do the mTEC columns mean?

Medullary thymic epithelial cells (mTECs) are cells within the thymus involved in central tolerance (https://en.wikipedia.org/wiki/Medullary_thymic_epithelial_cells). LENS utilizes two mTEC expression datasets (Larouche et al, Genome Medicine, 2020 and Laumont et al, Science Translational Medicine, 2018) to estimate CTA expression within mTEC tissues. Briefly, lower mTEC expression suggests a reduced risk of central tolerance towards the pMHC of interest.

How is ERV list defined?

The endogenous retrovirus (ERV) list is defined by the gEVE database version 1.1 (http://geve.med.u-tokai.ac.jp/). Note that the endogenous retroviral elements contained within the gEVE database are computatoinally predicted. The LENS report includes columns that can be used to further filter ERV pMHCs based on metadata contained within the erv_scores.25SEP2023.tsv metadata file. More information on these ERV scores can be found in the ERV section of the Technical Details page.

How is CCF calculated?

Cancer cell fraction (CCF, also known as clonality) is estimated using the formula:

where f is the variant allele frequency, $rho$ is the sample tumor purity, N_T is the gene-level copy number, and m is the multiplicity (Tarabichi et al., 2021).

How is the priorization score calculated?

Tumor antigens are prioritized based upon the binding affinity, allele-specific transcript abundance, and CFF (if available), which aligns with previously published recommendations cite{wells2020key}.

Specifically, binding affinity values are transformed using:

$$ frac{abs(chi - 1000)}{1000} $$

such that higher affinity interactions with smaller nanomolar values will have larger transformed values while maintaining a distribution mirroring the original.

The support read counts will be log2 transformed and normalized by dividing each observation by the maximum observed count such that they range from $[0, 1]$.

The cancer cell fraction distributions will not require this normalization as they are already bound between $[0, 1]$. Next, each peptide and HLA allele combination will be assigned a prioritization metric using:

$$ S = pMHC_{BA} * pMHC_{RS}* pMHC_{CCF} $$

where $pMHC_{BA}$ is the transformed binding affinity, $pMHC_{RS}$ is the log-transformed and normalized read support, and $pMHC_{CCF}$ is the estimated cancer cell fraction.

This metric can be used to prioritize potential targets for manufacture and application to the patient’s tumor.

How are SNV and InDel peptides generated?

Answer coming soon!

How should I interpret the LENS report?

Answer coming soon!

How do I determine how many pMHCs are filtered at each step of LENS?

Answer coming soon!