Explain output¶
Let us continue with the NA18740
example from
Getting Start. In this
part of the document, we explain in greater detail the outputs generated by
mhcflow
.
Note
By default, mhcflow
removes all intermediate files. The snapshot
of each directories shown below contains only the "final" files.
For details about the intermediate files that have been removed, please
refer to intermediate
and intermediate_aux
sections in the
file manifests
files under each folder.
fisher/¶
A snapshot of the fisher/
directory is shown below:
NA18740_class1
├─ fisher/
│ ├─ log/
│ ├─ NA18740.fisher.chr6.idx
│ ├─ NA18740.fisher.hla_bed.idx
│ ├─ NA18740.fisher.unplaced.idx
│ ├─ NA18740.fisher.idx.final.tsv
│ ├─ NA18740.fisher.file_manifest.json
│ ├─ NA18740.dumper.file_manifest.json
NA18740.fisher.idx.final.tsv
: This file contains the final set of fished read IDs. It is the union of the following three files:NA18740.fisher.chr6.idx
: Contains reads mapped to chromosome 6 in which matches of Kmer sequences are found.NA18740.fisher.chr6.idx
: Similar as above but from unplaced reads.NA18740.fisher.hla_bed.idx
: Contains reads mapped to the HLA regions defined in the BED file.
NA18740.fisher.file_manifest.json
: The file manifest for thefisher
component in JSON format, which provides detailed information about:- input files
- output files
- Auxiliary files, such as
.log
and.done
files - intermediate and intermediate auxiliary files.
NA18740.dumper.file_manifest.json
: The file manifest for thedumper
step inside thefisher
component.
Note
For more details on file manifests and the files under log/
directory,
please refer to the dedicated page here.
realigner/¶
The realigner/
directory contains realignments of reads against the HLA
reference sequence in BAM format.
NA18740_class1
├─ realigner/
│ ├─ log/
│ ├─ NA18740.hla.realn.bam
│ ├─ NA18740.hla.realn.bam.bai
│ ├─ NA18740.realigner.file_manifest.json
NA18740.hla.realn.bam
: This file contains the realignment result and also serves as the input to thetyper
component. Realignments are sorted by coordinates and indexed.NA18740.realigner.file_manifest.json
: The JSON file is the file manifest for therealigner
component.
typer/¶
All typing results are stored in the typer/
directory, as shown below.
For a detailed explanation of each file, please refer to the
mhctyper documentation.
finalizer/¶
The finalizer/
directory contains the sample-level HLA reference and
the corresponding realignment, which are convenient for downstream analyses
such as LOH detection.
NA18740_class1
├─ finalizer/
│ ├─ log/
│ ├─ NA18740.hla.fasta
│ ├─ NA18740.hla.nix
│ ├─ NA18740.hla.realn.bam
│ ├─ NA18740.hla.realn.bam.bai
│ ├─ NA18740.realigner.file_manifest.json
│ ├─ NA18740.finalizer.file_manifest.json
NA18740.hla.fasta
: The sample-level HLA reference with typed alleles by thetyper
component.NA18740.hla.nix
: The accompanying index for the sample-level HLA reference, produced bynovoindex
.
Note
In the context of a paired normal and tumor samples, "sample-level" HLA reference implies a reference specific to the subject or individual.
NA18740.hla.realn.bam
: The realignment file, generated by aligning fished reads against the alleles included in the sample-level reference.
Note
Although the file NA18740.hla.realn.bam
appears in both the finalizer
and
realigner/
directories, they contains different set of MHC alleles.
The BAM file in the finalizer/
directory includes only the alleles
typed by the typer
component, whereas the one in the realigner/
directory contains a broader set of MHC alleles.