Skip to content

Explain output

Let us continue with the NA18740 example from Getting Start. In this part of the document, we explain in greater detail the outputs generated by mhcflow.

.
├─ NA18740_class1/
  ├─ finalizer/
  ├─ fisher/
  ├─ realigner/
  └─ typer/

Note

By default, mhcflow removes all intermediate files. The snapshot of each directories shown below contains only the "final" files. For details about the intermediate files that have been removed, please refer to intermediate and intermediate_aux sections in the file manifests files under each folder.

fisher/

A snapshot of the fisher/ directory is shown below:

NA18740_class1
├─ fisher/
  ├─ log/
  ├─ NA18740.fisher.chr6.idx
  ├─ NA18740.fisher.hla_bed.idx
  ├─ NA18740.fisher.unplaced.idx
  ├─ NA18740.fisher.idx.final.tsv
  ├─ NA18740.fisher.file_manifest.json
  ├─ NA18740.dumper.file_manifest.json
  • NA18740.fisher.idx.final.tsv: This file contains the final set of fished read IDs. It is the union of the following three files:
    • NA18740.fisher.chr6.idx: Contains reads mapped to chromosome 6 in which matches of Kmer sequences are found.
    • NA18740.fisher.chr6.idx: Similar as above but from unplaced reads.
    • NA18740.fisher.hla_bed.idx: Contains reads mapped to the HLA regions defined in the BED file.
  • NA18740.fisher.file_manifest.json: The file manifest for the fisher component in JSON format, which provides detailed information about:
    • input files
    • output files
    • Auxiliary files, such as .log and .done files
    • intermediate and intermediate auxiliary files.
  • NA18740.dumper.file_manifest.json: The file manifest for the dumper step inside the fisher component.

Note

For more details on file manifests and the files under log/ directory, please refer to the dedicated page here.

realigner/

The realigner/ directory contains realignments of reads against the HLA reference sequence in BAM format.

NA18740_class1
├─ realigner/
  ├─ log/
  ├─ NA18740.hla.realn.bam
  ├─ NA18740.hla.realn.bam.bai
  ├─ NA18740.realigner.file_manifest.json
  • NA18740.hla.realn.bam: This file contains the realignment result and also serves as the input to the typer component. Realignments are sorted by coordinates and indexed.
  • NA18740.realigner.file_manifest.json: The JSON file is the file manifest for the realigner component.

typer/

All typing results are stored in the typer/ directory, as shown below. For a detailed explanation of each file, please refer to the mhctyper documentation.

NA18740_class1
├─ typer/
  ├─ NA18740.a1.tsv
  ├─ NA18740.a2.tsv
  ├─ NA18740.hlatyping.res.tsv

finalizer/

The finalizer/ directory contains the sample-level HLA reference and the corresponding realignment, which are convenient for downstream analyses such as LOH detection.

NA18740_class1
├─ finalizer/
  ├─ log/
  ├─ NA18740.hla.fasta
  ├─ NA18740.hla.nix
  ├─ NA18740.hla.realn.bam
  ├─ NA18740.hla.realn.bam.bai
  ├─ NA18740.realigner.file_manifest.json
  ├─ NA18740.finalizer.file_manifest.json
  • NA18740.hla.fasta: The sample-level HLA reference with typed alleles by the typer component.
  • NA18740.hla.nix: The accompanying index for the sample-level HLA reference, produced by novoindex.

Note

In the context of a paired normal and tumor samples, "sample-level" HLA reference implies a reference specific to the subject or individual.

  • NA18740.hla.realn.bam: The realignment file, generated by aligning fished reads against the alleles included in the sample-level reference.

Note

Although the file NA18740.hla.realn.bam appears in both the finalizer and realigner/ directories, they contains different set of MHC alleles. The BAM file in the finalizer/ directory includes only the alleles typed by the typer component, whereas the one in the realigner/ directory contains a broader set of MHC alleles.