Getting started¶

Quick example¶

Using the simulation data, s6, provided in the repository as an example:

docker compose run --rm lohhlamod \
    lohhlamod --tbam ./simulation/s6/input/s6_t.hla.realn.ready.bam \
    --nbam ./simulation/s6/input/s6_n.hla.realn.ready.bam \
    --subject s6 \
    --hlaref ./simulation/s6/input/s6_n.hla.fasta \
    --tstates ./simulation/s6/input/tstates.tsv \
    --outdir ./s6_test

The output folder s6_test contains the following results. Details descriptions of each file can be found on the Explain Output page.

s6_test/
├─ hla_a.rds
├─ hla_b.rds
├─ hla_c.rds
├─ s6.loh.res.tsv
├─ s6_n.filt.bam
├─ s6_n.filt.bam.bai
├─ s6_t.filt.bam
├─ s6_t.filt.bam.bai

Input preparation¶

lohhlamod is designed as a specialized, high-performance module focused exclusively on the detection of HLA Loss of Heterozygosity (LOH).

While the original LOHHLA framework included an internal HLA typing and realignment routine—which users could optionally skip—lohhlamod explicitly removes these components to provide a leaner, more modular footprint. This architectural choice treats HLA typing as an upstream prerequisite rather than an internal step, reflecting the reality that most modern bioinformatics pipelines already have a preferred HLA typing method in place.

Required inputs¶

To run lohhlamod, you must provide:

HLA-aligned BAMs: Alignments against subject's specific HLA reference for both normal and tumor samples.
HLA reference: The subject-specific HLA alleles in FASTA format.
Tumor states: A TSV file containing tumor ploidy and purity estimates.

HLA reference and alignment¶

mhcflow, is a re-engineered HLA typing tool based on Polysolver, which is optimized to produce the specific alignments required for downstream LOH analysis. You can find detailed preparation instructions here.

Input compatibility

lohhlamod expects BAM files where reads have been re-aligned to the patient's inferred HLA alleles (e.g., HLA-A, B, and C). If you are using a custom pipeline instead of mhcflow, ensure your HLA reference contains only the subject-specific alleles and that you align reads against this specific reference for both tumor and normal samples.

Estimated tumor ploidy and purity (--tstates)¶

lohhlamod uses estimated ploidy and purity to infer allelic copy number. These estimates can be obtained from most standard CNV algorithms.

Default ploidy and purity values

In cases where estimates are unavailable, lohhlamod will fall back to default values of ploidy = 2 and purity = 0.5. Currently, these defaults are hardcoded and cannot be modified via the command line.

Format

The file passed to --tstates must be a tab-delimited file. You can refer to the tstates.tsv files provided under simulation/ directory in this repository for exact formatting details.

Example:

SampleID	TumorPloidy	TumorPurityNGS
s1_t	2.33	1

Command line interface¶

usage: lohhlamod
       [-h] --subject STR --tbam FILE --nbam FILE --hlaref FILE
       [--tstates FILE] --outdir DIR [--min-cov INT] [--min-necnt INT]
       [--threads INT]

options:
  -h, --help       show this help message and exit
  --subject STR    Specify the subject ID
  --tbam FILE      Specify the tumor bam file
  --nbam FILE      Specify the normal bam file
  --hlaref FILE    Specify HLA reference sequence
  --tstates FILE   Specify file includeing tumor purity and ploidy
  --outdir DIR     Specify the output directory
  --min-cov INT    Specify the minimum coverage at mismatch sites (30)
  --min-necnt INT  Specify the minimum number of diff events allowed for reads
                   mapping to HLA alleles (1)
  --threads INT    Specify the number of threads (16)