================= Technical Details ================= Technical details are regarding the internals to the off-the-shelf LENS workflow are described below. Off-the-shelf defaults ====================== Default references ------------------ .. list-table:: :header-rows: 1 * - Workflow - Reference type - Reference * - DNA alignment - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - DNA alignment post-processing - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - DNA alignment post-processing - BED - hg38_exome.bed * - DNA alignment post-processing - Known sites VCF - Homo_sapiens_assembly38.dbsnp138.vcf.gz * - RNA alignment - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - RNA alignment - GTF - gencode.v37.annotation.with.hervs.gtf * - Transcript quantification - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - Transcript quantification - GTF - gencode.v37.annotation.with.hervs.gtf * - Somatic variant calling - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - Somatic variant calling - BED - hg38_exome.bed * - Somatic variant calling - Panel of normals VCF - 1000g_pon.hg38.vcf.gz * - Somatic variant calling - Allele frequencies VCF - af-only-gnomad.hg38.vcf.gz * - Somatic variant calling - Known sites VCF - small_exac_common_3.hg38.vcf.gz * - Variant annotation - snpEff annotation file - GRCh38.GENCODEv37 * - Germline variant calling - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - Germline variant calling - BED - hg38_exome.bed * - Variant phasing - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - Variant phasing - GTF - gencode.v37.annotation.with.hervs.gtf * - Splice variant calling - Tool-specific reference - snaf-data * - Virus detection - Virus-specific (no Homo sapiens homology) sequences - virus_masked_hg38.fa * - Virus detection - Virus-specific sequences - virus.cds.2024f2.fa * - Fusion detection - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - Fusion detection - Tool-specific reference - GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play * - Tumor purity detection - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - Tumor purity detection - BED - hg38_exome.bed * - Copy number variant detection - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - Copy number variant detection - BED - hg38_exome.bed * - CTA pMHC generation - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - CTA pMHC generation - GTF - gencode.v37.annotation.with.hervs.gtf * - CTA pMHC generation - CTA gene list - cta_and_self_antigen.homo_sapiens.gene_list * - ERV pMHC generation - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - ERV pMHC generation - ERV annotations - Hsap38.txt * - SNV pMHC generation - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - SNV pMHC generation - GTF - gencode.v37.annotation.with.hervs.gtf * - SNV pMHC generation - Canonical protein reference - gencode.v37.pc_translations.fa * - InDel pMHC generation - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - InDel pMHC generation - GTF - gencode.v37.annotation.with.hervs.gtf * - InDel pMHC generation - Canonical protein reference - gencode.v37.pc_translations.fa * - Fusion pMHC generation - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - Fusion pMHC generation - GTF - gencode.v37.annotation.with.hervs.gtf * - pMHC characterization - Tool-specific reference - mhcflurry * - CTA annotation - CTA metadata - canonical_txs.mtec.norm.subcell.annot.tsv * - ERV annotation - ERV metadata - erv_scores.25SEP2023.tsv * - Sample swap detection - Genomic reference - Homo_sapiens.assembly38.no_ebv.fa * - Sample swap detection - Known sites VCF - somalier.sites.hg38.vcf.gz Default tools ------------- .. list-table:: :header-rows: 1 * - Workflow - Tool - Tool version * - DNA alignment - fastp - v0.23.1 * - DNA alignment - bwa-mem2 - v2.2.1 * - DNA alignment post-processing - samblaster - v0.1.26 * - DNA alignment - fastp - v0.23.1 * - RNA alignment - star - v2.7.0f * - Transcript quantification - salmon - v1.1.0 * - Somatic variant calling - mutect2 - v4.1.6.0 * - Somatic variant calling - varscan2 - v2.1.1 * - Somatic variant calling - strelka2 - v2.2.9 * - Somatic variant filtering - bcftools - v1.19 * - Variant annotation - snpeff - v4.3k * - Somatic SNV/InDel filtering - snpsift - v4.3k * - Variant unionizing - jacquard - v1.1.4 * - Germline variant calling - deepvariant - v1.1.0 * - Somatic and germline merging - jacquard - v1.1.4 * - Variant phasing - whatshap - v1.2.1 * - HLA Typing - seq2hla - v2.2 * - Splice variant calling - snaf - v0.7.0 * - Fusion detection - starfusion - v1.10.1 * - Tumor purity detection - sequenza - v3.0.0 * - Copy number variant detection - cnvkit - v0.9.9 * - Copy number variant detection - cnvkit - v0.9.9 * - CTA pMHC generation - lenstools - v1.3 * - ERV pMHC generation - lenstools - v1.3 * - SNV pMHC generation - lenstools - v1.3 * - InDel pMHC generation - lenstools - v1.3 * - Viral pMHC generation - lenstools - v1.3 * - Splice pMHC generation - lenstools - v1.3 * - Fusion pMHC generation - lenstools - v1.3 * - pMHC characterization - mhcflurry - v2.0.6 * - Sample swap detection - somalier - v0.2.17 Default parameters ------------------ .. list-table:: :header-rows: 1 * - Workflow - Tool - Parameters * - DNA alignment - fastp - ``--in1 ${fq1} --out1 ${dataset}-${pat_name}-${run}_1.trimmed.fq.gz --in2 ${fq2} --out2 ${dataset}-${pat_name}-${run}_2.trimmed.fq.gz --failed_out ${dataset}-${pat_name}-${run}.fastp_fails.fq.gz --thread ${task.cpus}`` * - DNA alignment - bwa-mem2 - ``-R "@RG\\tID:${dataset}-${pat_name}-${run}\\tSM:${dataset}-${pat_name}-${run}\\tLB:NULL\\tPL:Illumina" ${fa} ${fq1} ${fq2} -t ${task.cpus - 1}`` * - DNA alignment post-processing - samblaster - ```` * - RNA alignment - fastp - ``--in1 ${fq1} --out1 ${dataset}-${pat_name}-${run}_1.trimmed.fq.gz --in2 ${fq2} --out2 ${dataset}-${pat_name}-${run}_2.trimmed.fq.gz --failed_out ${dataset}-${pat_name}-${run}.fastp_fails.fq.gz --thread ${task.cpus}`` * - RNA alignment - star - ``-quantMode TranscriptomeSAM --outSAMtype BAM SortedByCoordinate --twopassMode Basic --outSAMunmapped Within`` * - Transcript quantification - salmon - ``quant --threads ${task.cpus} -t ${fa} -l a -a ${aln} -o .`` * - Somatic variant calling - mutect2 - ``Placeholder`` * - Somatic variant calling - strelka2 - ``Placeholder`` * - Somatic variant calling - varscan2 - ``Placeholder`` * - Somatic variant filtering (mutect2) - bcftools - ``FILTER="PASS"`` * - Somatic variant filtering (strelka2) - bcftools - ``FILTER="PASS"`` * - Somatic variant filtering (varscan2) - bcftools - ``FILTER="PASS"`` * - Variant annotation - snpeff - ```` * - Somatic SNV filtering - snpsift - ``ANN[*].EFFECT has 'missense_variant'`` * - Somatic InDel filtering - snpsift - ``(ANN[*].EFFECT has 'conservative_inframe_insertion') || (ANN[*].EFFECT has 'conservative_inframe_deletion') || (ANN[*].EFFECT has 'disruptive_inframe_insertion') || (ANN[*].EFFECT has 'disruptive_inframe_deletion') || (ANN[*].EFFECT has 'frameshift_variant')`` * - Variant unionizing - jacquard - ``--include_format_tags=\"GT,AF,AU,CU,GU,TU,TAR,TIR,FREQ,VAF`` * - Germline variant calling - deepvariant - ``--model_type WES`` * - Variant merging - jacquard - ```` * - Variant phasing - whatshap - ``--ignore-read-groups`` * - Splice variant filtering - split_snaf_by_sample - ``snaf_awk_filter: '\$5 == "True" && \$3 == 0.0', snaf_tumor_exp_threshold: '1000'`` * - Fusion detection - star - ``--outReadsUnmapped None --twopassMode Basic --outSAMstrandField intronMotif --outSAMunmapped Within --chimSegmentMin 12 --chimJunctionOverhangMin 8 --chimOutJunctionFormat 1 --alignSJDBoverhangMin 10 --alignMatesGapMax 100000 --alignIntronMax 100000 --alignSJstitchMismatchNmax 5 -1 5 5 --chimMultimapScoreRange 3 --chimScoreJunctionNonGTAG -4 --chimMultimapNmax 20 --chimNonchimScoreDropMin 10 --peOverlapNbasesMin 12 --peOverlapMMp 0.1 --alignInsertionFlush Right --alignSplicedMateMapLminOverLmate 0 --alignSplicedMateMapLmin 30`` * - Fusion detection - starfusion - ``--examine_coding_effect`` * - Tumor purity detection - sequenza - ``sequenza_gc_wiggle: '-w 50', sequenza_bam2seqz: '', sequenza_seqz_binning: '-w 50'`` * - Copy number variant detection - cnvkit - ``cnvkit_segment: '--drop-low-coverage', cnvkit_call: '--ploidy 2'`` * - Expressed CTA detection - lenstools_filter_expressed_self_genes - ``-p 95`` * - Expressed ERV detection - lenstools_filter_ervs_by_rna_coverage - ``--mean-depth 60`` * - Expressed SNV detection - lenstools_filter_expressed_variants_parameters - ``-p 75`` * - Expressed InDel detection - lenstools_filter_expressed_variants_parameters - ``-p 75`` * - Expressed virus detection - lenstools_filter_viruses_by_rna_coverage - ``--coverage 25`` * - pMHC characterization - mhcflurry - 8,9,10,11 * - pMHC filtering - lenstools - <500 nM LENS workflow flowchart ======================= Somatic Nucleotide Variants (SNVs) ================================== Somatic single-nucleotide variants (SNVs), or variants present within tumor tissue but absent from germline tissue, can be a source of tumor-specific immunogenic peptides. Somatic Insertion and Deletion Variants (InDels) ================================================ Somatic insertion/deletion variants (InDels), or variants present within tumor tissue but absent from germline tissue, can be a source of tumor-specific immunogenic peptides. Splice Variants =============== Fusion Events ============= Endogenous Retroviruses (ERVs) ============================== Viruses ======= Cancer-testis Antigens (CTAs) ============================= Aberrantly expressed genes (e.g. CTAs) can be a source of tumor-associated immunogenic peptides. Determining expressed CTA and self-antigens ------------------------------------------- CTA/Self-antigens that are included in the user-provided list. The default list is available in ``/path/to/raft/references/homo_sapiens/cta_self/cta_and_self_antigen.homo_sapiens.gene_list``. This list includes loci described in the CTDatabase (http://www.cta.lncc.br/). Targetable CTA/Self-antigen peptides are generated using the coding sequence of CTA transcripts that exceed the user-provided expression percentile (default: 95%). Generating patient-specific CTA coding sequences ------------------------------------------------ LENS performs germline variant calling (default: DeepVariant) as part of its workflow. Reference generation notes ==========================