Algorithm & interpretation¶
Key Enhancements over the Original LOHHLA¶
While the original LOHHLA provides a robust foundation for LOH detection,
lohhlamod introduces several architectural and statistical refinements
designed to reduce false positives and improve robustness in real-world
clinical datasets.
Multi-metric evaluation¶
The original algorithm primarily relies on a copy number (CN) threshold (<0.5)
and a unique P-value (PVal_unique < 0.01). To address observed overcalling—particularly
in samples with noisy coverage—lohhlamod introduces additional bin-level statistics:
-
Pct_CN_Diff_Supporting_Bins: Measures the proportion of 150bp bins that support a significant CN difference between the two alleles. A true LOH event typically spans the entire length of the allele. A suggested starting threshold for high-confidence calls is 75%.Threshold value
In real-world datasets, it is not always possible to observe a 100% bins supporting a copy number difference due to the technical difficulties imposed by the highly-polymorphic nature of MHC regions. Utilizing this value can help filter out suspicious false positive LOH calls.
-
Allele-Specific Loss Metrics (
Pct_A1/A2_Loss_Supporting_Bins): These metrics use a one-sample t-test of allelic logR (H0: μ=−1) to determine if an allele shows a statistically significant loss. This is particularly useful for cases where estimated CN for both alleles drop below 0.5 or even become negative.Why use direct test
Relying solely on copy number estimates can be risky with noisy coverage data. Having direct statistical tests for allelic loss provides a more grounded way to identify true deletions.
BAF corrected for allelic capture bias¶
In many assays, one HLA allele may be captured more efficiently than its
counterpart due to probe affinity or sequence context. lohhlamod estimates
this "capture bias" from the normal sample and applies a correction factor
to the observed BAF in the tumor.
Plausibility check
If the correction results in a BAF >1.0 (which can happen if the capture bias and tumor observation directions differ), lohhlamod forces the BAF to a value of 1.0 to maintain biological plausibility. When this happens, it is recommended to:
- Verify the HLA typing result.
- Check for potential sample swapping.
Global depth corrector¶
Log-ratio (LogR) calculation depends on normalizing depth differences between tumor and normal samples. In an ideal world, a neutral diploid state should results in a LogR value of 0.
In reality, however, the diploid baseline often shifts away from zero due to technical challenges. Furthermore, a global, library-wise depth corrector may not reflect the specific normalization needed locally across the HLA genes.
Unlike the original tool which calculates a global corrector across the entire
BAM file, lohhlamod estimates the corrector locally from the HLA genes. By
applying the high-quality alignment restrictions (via --min-necnt) to the
depth corrector as it does to the coverage calculation, the tool ensures the
normalization baseline is more representative of the local observation.
Suggested Interpretation Workflow¶
When reviewing results, I recommend following this hierarchy:
-
Check Bin Consistency Look at
Pct_CN_Diff_Supporting_Bins. If the proportion is low, the imbalance is likely localized noise rather than a true LOH event.Be careful
This metric identifies the percentage of bins supporting a copy number difference. However, an amplification event can also produce a high value. Therefore, it is recommended not to use this metric in isolation.
-
Evaluate Specific Loss Check
Pct_A1_Loss_Supporting_BinsandPct_A2_Loss_Supporting_Bins.-
In the case of LOH: Either metric is expected to be high (see the HLA-A result in simulated
s6case). -
In the case of Homozygous Loss: Both metrics are expected to be high (see the HLA-C result in simulated
s7case).
-
-
Review LogR and BAF Ensure the median LogR estimates and BAF shift align with the bin-level statistics. A true loss should show a consistent shift away from subject-specific neutral states across both metrics.
-
Validate CN Estimates
- If the estimated copy number for both alleles are negative or near zero,
review your coverage data via inspecting tables saved in the
.rdsfiles and/or checking the diagnostic plots. - If the estimated copy number falls outside the calculated lower and upper bounds, it suggests the data may be skewed and the statistical assumptions may be violated.
Can you trust copy number estimates
Copy number estimates are sensitive to noise in the coverage data. While the estimation is reliable in well-profiled assays (as demonstrated in the "best-case" simulation scenarios),
lohhlamodis currently best used as a qualitative measure to determine if a LOH event has occurred. - If the estimated copy number for both alleles are negative or near zero,
review your coverage data via inspecting tables saved in the
-
Visual Confirmation: Always inspect the diagnostic plots generated by
lohhlaplotcommand, especially the paired allelic coverage distribution plot. Visual evidence remains a reliable validator.
Hidden cutoffs¶
The following parameters are currently hardcoded but may be exposed to the CLI in future releases:
| Parameter | Value | Description |
|---|---|---|
| Bin size | 150bp | The sliding window size for coverage calculation |
| Min. mismatches | 5 | Minimum number of mismatches required between alleles to attempt LOH detection. |
| Alpha | 0.01 | Significance threshold for statistical tests. |
| Gamma | 1 | Smoothing parameter for copy number estimation. |
| Base quality | 20 | Minimum base quality required for samtools mpileup. |
| SAM flag | 3584 / 2 | Excludes UNMAP, QCFAIL, DUP (3584); includes PROPER_PAIR (2). |