Manifest Specifications ========================= This page provides the full column reference, sample naming conventions, complex sample set configurations, and manifest validation for RAFT manifests. For a task-oriented guide to creating manifests, see :doc:`preparing_your_samples`. Manifest columns ----------------- A RAFT manifest must have at least the columns defined in the table below. Columns can be in any order and other columns containing non-RAFT metadata are also allowed. .. list-table:: RAFT Columns :widths: 25 25 25 :header-rows: 1 * - Column - Description - Allowed values * - Dataset - Name for collection of patients - Free text * - Patient_Name - Name for collection of samples - Free text * - Run_Name - Name for the specific sample - Free text (see note below) * - File_Prefix - Base name (or full path) of input files - Free text * - Sequencing_Method - Sequencing protocol for sample - (RNA-seq, WES, WXS, WGS) * - Normal - Is the sample normal or abnormal (tumor)? - (TRUE, FALSE) .. note:: A sample's ``Run_Name`` is instrumental in guiding samples through some RAFT workflows. A sample's ``Run_Name`` should have a two-letter prefix that describes the type of sample and a delimiter (``-`` or ``_``) followed by an arbitrary unique identifier. The first letter of the prefix is either ``a`` (for abnormal) or ``n`` (for normal). The second letter is either ``r`` (for RNA) or ``d`` (for DNA). For example, a sample with an ``ar-`` (or ``ar\_``) prefix is an abnormal (tumor) RNA sample while a sample with a ``nd-`` prefix is a normal DNA sample. Complex sample sets -------------------- Users may encounter situations that require more than one sample per sample type per patient. For example, users may have a single set of DNA samples (normal DNA and tumor DNA), but may have multiple RNA-seq samples (e.g. multiple timepoints). RAFT's ``subjoin`` functionality supports these cases via a ``Group`` column in the manifest: .. code-block:: console Patient_Name Run_Name Dataset File_Prefix Sequencing_Method Normal Group Pt01 ad-Pt01-03A AML 9f7f7 WES FALSE 1-2 Pt01 nd-Pt01-11A AML 8e74a WES TRUE 1-2 Pt01 ar-Pt01-03A AML cdb288 RNA-Seq FALSE 1 Pt01 ar-Pt01-03B AML cdb289 RNA-Seq FALSE 2 This scenario depicts a patient (``Pt01``) with a DNA normal sample, a DNA tumor sample, and **two** RNA-seq samples. In this example, ``ar-Pt01-03A`` is a pre-treatment sample and ``ar-PT01-03B`` is a post-treatment sample. LENS will be run on two distinct sample sets: The pre-treatment sample set: - ad-Pt01-03A - nd-Pt01-11A - **ar-Pt01-03A** The post-treatment sample set: - ad-Pt01-03A - nd-Pt01-11A - **ar-Pt01-03B** Each sample set produces its own LENS report. .. note:: ``Groups`` identifiers do not have to be numbers. Descriptive identifiers (e.g. ``pre-treatment`` and ``post-treatment``) are also supported. Validating a manifest ---------------------- LENS automatically checks manifest integrity when ``raft`` is run in either ``run-ots`` or ``run-workflow`` modes. To manually verify a manifest: .. code-block:: console raft check-manifest -m This validates required columns, allowed values, Run_Name prefix conventions, cross-sample HLA consistency, and other integrity checks.