Frequently Asked Questions
=====


What is LENS?
------------
LENS (Landscape of Effective Neoantigens Software) is a workflow designed for
predicting potentially therapuetically targetable tumor-specific and
tumor-associated antigens from short-read sequencing data.

What antigen sources are supported?
------------
- Somatic single nucleotide variants (SNVs)
- Somatic insertion and deletion variants (InDels)
- Splice variants
- Fusion events
- Viruses
- Endgoenous Retroviruses (ERVs)
- Cancer-testis antigens (CTAs)

What is RAFT? How is it different from LENS?
------------
RAFT is a nextflow-based workflow manager that runs LENS. RAFT is used to run
the LENS workflow, but RAFT is also capable of running other workflows (see
``raft.py available-workflows``). More information about RAFT can be found in
the `RAFT Documentation
<https://reproducible-analyses-framework-and-tools.readthedocs.io/en/latest/>`_.

What input files are required?
------------
Users must provide a manifest file (see :doc:`manifest`) as well as multiple
samples per patient. These samples include:

- DNA normal sample (WES/WXS or WGS)
- DNA tumor sample (WES/WXS or WGS)
- RNA tumor sample

Optionally, a patient-matched, tissue-matched RNA normal sample can be
provided. If such a sample is not available, then a non-patient-matched,
tissue-specific (e.g. GTEx) sample can be provided.

What sequencing technologies are supported?
------------
Illumina-based short read (50 bp - 150 bp) sequencing is currently supported. A
long-read based tumor antigen workflow is in development.

How can I run LENS with my own reference files?
------------
Yes! See the :doc:`usage` page for more information.

How can I run LENS using different tools?
------------
Yes! RAFT allows users to utilize alternative (supported) tools by changing the
tool parameters within ``main.nf``. See the :doc:`usage` page for more
information.

How do I know which tool versions are running?
------------
Tool versions are available in the :doc:`technical` page.

How do I know which Docker iamge tools are running?
------------
RAFT (the workflow manager running LENS) relies upon containerization software
(Docker or Singularity/Apptainer) for running its processes. The specific
container being used for each process can be found in your LENS project's
``nextflow.config`` file
(``/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config``). Each
process defined within RAFT's modules (the directories in ``workflow/``) has
one or more Nextflow ``label`` directives assigned to it. For example the
``fastp`` process includes the label ``fastp_container``. If we search for this
label within the ``nextflow.config`` file then we find the lines:

.. code-block:: console
    withLabel: fastp_container {                                                                      
    label = 'cloud'                                                                                 
    container = 'docker://quay.io/biocontainers/fastp:0.23.1--h79da9fb_0'                           
  }  

This reveals that the ``fastp`` process is being run using BioContainer's
``fastp v0.23.1`` Docker image.

How do I change which tool versions are used?
------------
Tool verions can be changed by utilizing an alternative Docker image for
processes being run by a specific tool. Docker images can be found either on
`DockerHub <www.dockerhub.com>`_ or `BioContainers <www.biocontainers.org>`_.
The tool's corresponding process ``labels`` will need to be modified in the ``nextflow.config`` file
(``/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config``).

.. note::
   Docker images may differ in construction (e.g. where binaries are located
   within the container), so some modifications to process definitions within
   each tool's RAFT module (in ``/path/to/raft/projects/<PROJECT_ID>/workflow/``) may
   be required to the corresponding process definitions. 


How do I change resource allocations for each tool?
------------
Resource allocations (``cpus`` and ``memory`` directives) can be modified in
the ``nextflow.config`` file
(``/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config``).


How is the workflow code in LENS executed?
------------
See :doc:`technical` for more information.

How is CTA gene list defined?
------------
The CTA (cancer-testis antigens) gene list
(``cta_and_self_antigen.homo_sapiens.gene_list``) consists of CTAs defined
referencing the CTDatabase  (http://www.cta.lncc.br/) list of genes. Note that
some genes have been added to the CTDatabase base set.

Why are some of my CTA transcripts expressed in normal tissues?
------------
The CTAs defined in the CTDatabase reference list are not necessarily
testis-specific. To address this, LENS outputs contains columns that can be
used to further filter CTA pMHCs (such as ``gene_detectable_normal_tissues``
defined using Human Protein Atlas data). 

What do the mTEC columns mean?
------------
Medullary thymic epithelial cells (mTECs) are cells within the thymus involved
in central tolerance
(https://en.wikipedia.org/wiki/Medullary_thymic_epithelial_cells). LENS
utilizes two mTEC expression datasets (Larouche et al, Genome Medicine, 2020
and Laumont et al, Science Translational Medicine, 2018) to estimate CTA
expression within mTEC tissues. Briefly, lower mTEC expression suggests a
reduced risk of central tolerance towards the pMHC of interest.

How is ERV list defined?
------------
The endogenous retrovirus (ERV) list is defined by the gEVE database version
1.1 (http://geve.med.u-tokai.ac.jp/). Note that the endogenous retroviral
elements contained within the gEVE database are computatoinally predicted. The
LENS report includes columns that can be used to further filter ERV pMHCs based
on metadata contained within the ``erv_scores.25SEP2023.tsv`` metadata file.
More information on these ERV scores can be found in the ERV section of the
:doc:`technical` page.

How is CCF calculated?
------------
Cancer cell fraction (CCF, also known as clonality) is estimated using the
formula:

.. image::
   ccf_calculation.png
   :width: 400

where ``f`` is the variant allele frequency, $rho$ is the sample tumor purity,
``N_T`` is the gene-level copy number, and ``m`` is the multiplicity (Tarabichi
et al., 2021).


How is the priorization score calculated?
------------
Tumor antigens are prioritized based upon the binding affinity, allele-specific transcript abundance, and CFF (if available), which aligns with previously published recommendations \cite{wells2020key}.

Specifically, binding affinity values are transformed using:

$$
\frac{abs(\chi - 1000)}{1000}
$$

such that higher affinity interactions with smaller nanomolar values will have larger transformed values while maintaining a distribution mirroring the original.

The support read counts will be log2 transformed and normalized by dividing each observation by the maximum observed count such that they range from $[0, 1]$. 

The cancer cell fraction distributions will not require this normalization as they are already bound between $[0, 1]$. Next, each peptide and HLA allele combination will be assigned a prioritization metric using:

$$
S = pMHC_{BA} * pMHC_{RS}* pMHC_{CCF}
$$

where $pMHC_{BA}$ is the transformed binding affinity, $pMHC_{RS}$ is the log-transformed and normalized read support, and $pMHC_{CCF}$ is the estimated cancer cell fraction.

This metric can be used to prioritize potential targets for manufacture and application to the patient's tumor.

How are SNV and InDel peptides generated?
------------
Answer coming soon!

How should I interpret the LENS report?
------------
Answer coming soon!

How do I determine how many pMHCs are filtered at each step of LENS?
------------
Answer coming soon!