Frequently Asked Questions
==========================


What is LENS?
-------------
LENS (Landscape of Effective Neoantigens Software) is a workflow designed for
predicting potentially therapeutically targetable tumor-specific and
tumor-associated antigens from short-read sequencing data.

What antigen sources are supported?
-----------------------------------
- Somatic single nucleotide variants (SNVs)
- Somatic insertion and deletion variants (InDels)
- Splice variants
- Fusion events
- Viruses
- Endogenous Retroviruses (ERVs)
- Cancer-testis antigens (CTAs)

What is RAFT? How is it different from LENS?
--------------------------------------------
RAFT is a nextflow-based workflow manager that runs LENS. RAFT is used to run
the LENS workflow, but RAFT is also capable of running other workflows (see
``raft.py available-workflows``). More information about RAFT can be found in
the `RAFT Documentation
<https://reproducible-analyses-framework-and-tools.readthedocs.io/en/latest/>`_.

What input files are required?
------------------------------
Users must provide a manifest file (see :doc:`preparing_your_samples`) as well as multiple
samples per patient. These samples include:

- DNA normal sample (WES/WXS or WGS)
- DNA tumor sample (WES/WXS or WGS)
- RNA tumor sample

Optionally, a patient-matched, tissue-matched RNA normal sample can be
provided. If such a sample is not available, then a non-patient-matched,
tissue-specific (e.g. GTEx) sample can be provided.

What sequencing technologies are supported?
-------------------------------------------
Illumina-based short read (50 bp - 150 bp) sequencing is currently supported. A
long-read based tumor antigen workflow is in development.

How can I run LENS with my own reference files?
-----------------------------------------------
In the Config Generator, modify reference file paths in the **References**
step. For CLI users, use ``--setup-only`` to generate the project configuration,
then edit the reference paths in
``/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config`` before
launching with ``raft run-workflow``. See :doc:`running_lens` for more
information.

How can I run LENS using different tools?
-----------------------------------------
In the Config Generator, expand the **Tool selection** accordion for the step
you want to change and pick an alternative tool. For CLI users, pass
``--user-params`` to override the default tool for a given step (e.g.,
``--user-params fq_trim_tool=trim_galore``). For a list of supported
alternative tools, see :doc:`technical_details`.

How do I know which tool versions are running?
----------------------------------------------
Tool versions are available in the :doc:`technical_details` page.

How do I know which Docker image tools are running?
---------------------------------------------------
RAFT (the workflow manager running LENS) relies upon containerization software
(Docker or Singularity/Apptainer) for running its processes. The specific
container being used for each process can be found in your LENS project's
``nextflow.config`` file
(``/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config``). Each
process defined within RAFT's modules (the directories in ``workflow/``) has
one or more Nextflow ``label`` directives assigned to it. For example the
``fastp`` process includes the label ``fastp_container``. If we search for this
label within the ``nextflow.config`` file then we find the lines:

.. code-block:: text

   withLabel: fastp_container {
     label = 'cloud'
     container = 'docker://quay.io/biocontainers/fastp:0.23.1--h79da9fb_0'
   }

This reveals that the ``fastp`` process is being run using BioContainer's
``fastp v0.23.1`` Docker image.

How do I change which tool versions are used?
---------------------------------------------
Tool versions can be changed by utilizing an alternative Docker image for
processes being run by a specific tool. Docker images can be found either on
`DockerHub <www.dockerhub.com>`_ or `BioContainers <www.biocontainers.org>`_.
The tool's corresponding process ``labels`` will need to be modified in the ``nextflow.config`` file
(``/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config``).

.. note::
   Docker images may differ in construction (e.g. where binaries are located
   within the container), so some modifications to process definitions within
   each tool's RAFT module (in ``/path/to/raft/projects/<PROJECT_ID>/workflow/``) may
   be required to the corresponding process definitions. 


How do I change resource allocations for each tool?
---------------------------------------------------
Resource allocations (``cpus`` and ``memory`` directives) can be modified in
the ``nextflow.config`` file
(``/path/to/raft/projects/<PROJECT_ID>/workflow/nextflow.config``).


How is the workflow code in LENS executed?
------------------------------------------
See :doc:`technical_details` for more information.

How is CTA gene list defined?
-----------------------------
The CTA (cancer-testis antigens) gene list
(``cta_and_self_antigen.homo_sapiens.gene_list``) consists of CTAs defined
referencing the CTDatabase  (http://www.cta.lncc.br/) list of genes. Note that
some genes have been added to the CTDatabase base set.

Why are some of my CTA transcripts expressed in normal tissues?
---------------------------------------------------------------
The CTAs defined in the CTDatabase reference list are not necessarily
testis-specific. To address this, LENS outputs contains columns that can be
used to further filter CTA pMHCs (such as ``gene_detectable_normal_tissues``
defined using Human Protein Atlas data). 

What do the mTEC columns mean?
------------------------------
Medullary thymic epithelial cells (mTECs) are cells within the thymus involved
in central tolerance
(https://en.wikipedia.org/wiki/Medullary_thymic_epithelial_cells). LENS
utilizes two mTEC expression datasets (Larouche et al, Genome Medicine, 2020
and Laumont et al, Science Translational Medicine, 2018) to estimate CTA
expression within mTEC tissues. Briefly, lower mTEC expression suggests a
reduced risk of central tolerance towards the pMHC of interest.

How is ERV list defined?
------------------------
The endogenous retrovirus (ERV) list is defined by the gEVE database version
1.1 (http://geve.med.u-tokai.ac.jp/). Note that the endogenous retroviral
elements contained within the gEVE database are computationally predicted. The
LENS report includes columns that can be used to further filter ERV pMHCs based
on metadata contained within the ``erv_scores.25SEP2023.tsv`` metadata file.
More information on these ERV scores can be found in the ERV section of the
:doc:`technical_details` page.

How is CCF calculated?
----------------------
Cancer cell fraction (CCF, also known as clonality) is estimated using the
formula:

.. image::
   ccf_calculation.png
   :width: 400

where ``f`` is the variant allele frequency, $rho$ is the sample tumor purity,
``N_T`` is the gene-level copy number, and ``m`` is the multiplicity (Tarabichi
et al., 2021).


How is the priorization score calculated?
-----------------------------------------
Tumor antigens are prioritized based upon the binding affinity, allele-specific
transcript abundance, and CCF (if available), which aligns with previously
published recommendations (`Wells et al., 2020
<https://doi.org/10.1016/j.molimm.2020.02.017>`_).

Specifically, binding affinity values are transformed using:

$$
\frac{abs(\chi - 1000)}{1000}
$$

such that higher affinity interactions with smaller nanomolar values will have larger transformed values while maintaining a distribution mirroring the original.

The support read counts will be log2 transformed and normalized by dividing each observation by the maximum observed count such that they range from $[0, 1]$. 

The cancer cell fraction distributions will not require this normalization as they are already bound between $[0, 1]$. Next, each peptide and HLA allele combination will be assigned a prioritization metric using:

$$
S = pMHC_{BA} * pMHC_{RS}* pMHC_{CCF}
$$

where $pMHC_{BA}$ is the transformed binding affinity, $pMHC_{RS}$ is the log-transformed and normalized read support, and $pMHC_{CCF}$ is the estimated cancer cell fraction.

This metric can be used to prioritize potential targets for manufacture and application to the patient's tumor.

How are SNV and InDel peptides generated?
-----------------------------------------

LENS generates SNV- and InDel-derived peptides through the following pipeline:

.. graphviz:: snv_indel_peptide_generation.dot

1. **Somatic variant calling** — three callers (``mutect2``, ``strelka2``,
   ``varscan2``) identify somatic variants independently.
2. **Filter for PASS** — ``bcftools`` retains only variants passing each
   caller's quality filters.
3. **Union somatic calls** — ``jacquard`` combines passing calls across all
   three callers.
4. **Annotate variants** — ``snpEff`` annotates variant effects (missense,
   frameshift, etc.).
5. **Filter for expressed transcripts** — ``salmon`` quantification is used to
   retain only variants in transcripts above the user-specified expression
   percentile (default: 75th).
6. **Phase variants** — ``whatshap`` performs read-backed phasing of somatic
   and germline variants, preserving local haplotype context.
7. **Build variant-specific VCFs** — for each variant of interest, a VCF is
   constructed that includes the focal somatic variant and any phased
   germline variants on the same haplotype.
8. **Extract exonic sequences** — ``samtools faidx`` extracts exon sequences
   from the reference FASTA for expressed transcripts harboring the variant.
9. **Apply variants to transcripts** — ``bcftools consensus`` applies the
   variant-specific VCFs to the exonic sequences, producing tumor and normal
   transcript sequences.
10. **Translate into peptides** — ``lenstools make-snv-peptides-context``
    translates the tumor and normal sequences into mutant and wildtype peptides.

For more detail on each step, see :doc:`technical_details`.

How should I interpret the LENS report?
---------------------------------------
See :doc:`interpreting_report` for a guide to reading the LENS report,
including key columns, priority scores, missing values, and common filtering
strategies.

How do I determine how many pMHCs are filtered at each step of LENS?
--------------------------------------------------------------------

The RAFT Output Viewer provides interactive pMHC filtering and counting. After
a LENS run completes:

.. code-block:: console

   raft generate-reports --project-id my-project-lens

In the viewer, select a patient and use the filter controls (antigen source, Top
N, binding affinity threshold) to see how many pMHCs remain at each filter
level. See :doc:`accessing_and_reviewing_outputs` for details.

Additionally, the run QC summary file
(``outputs/lens/*.run_qc_summary.tsv``) contains per-step counts for each
antigen source.