Nucleotide Models
Nucleotide-focused Python model types for parsed HGVS variants.
Type Aliases:
| Name | Description |
|---|---|
NucleotideSequenceItem |
Tagged union for supported inserted or replacement nucleotide sequence models. |
NucleotideEdit |
Tagged union for supported nucleotide edit models. |
NucleotideSequenceItem
module-attribute
NucleotideSequenceItem: TypeAlias = LiteralSequenceItem | RepeatSequenceItem | CopiedSequenceItem
Tagged union for supported inserted or replacement nucleotide components:
NucleotideEdit
module-attribute
NucleotideEdit: TypeAlias = NucleotideSequenceOmittedEdit | NucleotideSubstitutionEdit | NucleotideInsertionEdit | NucleotideDeletionInsertionEdit | NucleotideRepeatEdit
Tagged union for supported nucleotide edit models:
Allele
dataclass
Bases: Generic[VariantT]
One allele carrying one or more variants in cis.
Attributes:
| Name | Type | Description |
|---|---|---|
variants |
tuple[VariantT, ...]
|
Variants described on the same allele and therefore treated as occurring together in cis. |
Examples:
A nucleotide allele carrying multiple variants:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000001.11:g.[123G>A;345del]")
>>> len(variant.description.allele_one.variants)
2
A protein allele carrying multiple variants together:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NP_003997.1:p.[Ser68Arg;Asn594del]")
>>> len(variant.description.allele_one.variants)
2
__iter__
Return an iterator over variants carried by an allele in order.
Returns:
| Type | Description |
|---|---|
Iterator[VariantT]
|
Iterator over variants carried by this allele, in the order they appear in the written description. |
Examples: >>> from tinyhgvs import parse_hgvs >>> allele = parse_hgvs( ... "NP_003997.1:p.[Ser68Arg;Asn594del]" ... ).description.allele_one >>> len(tuple(allele)) 2
>>> from tinyhgvs import parse_hgvs
>>> allele = parse_hgvs(
... "NC_000001.11:g.[123G>A;345del]"
... ).description.allele_one
>>> len(tuple(allele))
2
AllelePhase
Bases: str, Enum
Model describing phase between/among alleles.
Attributes:
| Name | Type | Description |
|---|---|---|
TRANS |
Alleles are In-trans phase. |
|
UNCERTAIN |
Phase between/among alleles is uncertain. |
Examples:
In-trans alleles:
>>> from tinyhgvs import parse_hgvs
>>> variants = parse_hgvs("NM_004006.2:c.[2376G>C];[3103del]")
>>> variants.description.phase
<AllelePhase.TRANS: 'trans'>
Uncertain phase:
>>> variant = parse_hgvs("NC_000001.11:g.123G>A(;)345del")
>>> variant.description.phase
<AllelePhase.UNCERTAIN: 'uncertain'>
In-trans protein alleles:
>>> variants = parse_hgvs("NP_003997.1:p.[Ser68Arg];[Ser68=]")
>>> variants.description.phase
<AllelePhase.TRANS: 'trans'>
Protein alleles with uncertain phase:
>>> variant = parse_hgvs("NP_003997.1:p.(Ser73Arg)(;)(Asn103del)")
>>> variant.description.phase
<AllelePhase.UNCERTAIN: 'uncertain'>
AlleleVariant
dataclass
AlleleVariant(allele_one: Allele[VariantT], allele_two: Allele[VariantT] | None = None, phase: AllelePhase | None = None, alleles_unphased: tuple[Allele[VariantT], ...] = ())
Bases: Generic[VariantT]
Structured representation of HGVS allele variant syntax.
An allele variant may describe:
- a single allele carrying one or more variants.
- two alleles with an explicit phase relationship.
- additional alleles whose relation to the established allele state is uncertain.
Attributes:
| Name | Type | Description |
|---|---|---|
allele_one |
Allele[VariantT]
|
First allele in the allele-variant description. |
allele_two |
Allele[VariantT] | None
|
Second allele, if present. |
phase |
AllelePhase | None
|
Phase relation between |
alleles_unphased |
tuple[Allele[VariantT], ...]
|
Additional alleles in uncertain relation to the
allele state established by |
Examples:
Variants in cis on a single allele:
>>> from tinyhgvs import parse_hgvs
>>> desc = parse_hgvs("NC_000023.10:g.[30683643A>G;33038273T>G]").description
>>> len(tuple(desc))
1
>>> desc.allele_two is None
True
>>> desc.phase is None
True
>>> len(desc.allele_one.variants)
2
Two nucleotide alleles in trans:
>>> desc = parse_hgvs("NM_004006.2:c.[2376G>C];[3103del]").description
>>> desc.allele_two is not None
True
>>> desc.phase
<AllelePhase.TRANS: 'trans'>
>>> len(desc.allele_one.variants)
1
>>> len(desc.allele_two.variants)
1
Additional alleles with uncertain phase:
>>> desc = parse_hgvs(
... "NM_004006.2:c.[296T>G;476T>C];[476T>C](;)1083A>C"
... ).description
>>> desc.phase
<AllelePhase.TRANS: 'trans'>
>>> len(desc.unphased_alleles)
1
>>> len(desc.unphased_alleles[0].variants)
1
Variants in cis on a single protein allele:
>>> desc = parse_hgvs("NP_003997.1:p.[Ser68Arg;Asn594del]").description
>>> len(tuple(desc))
1
>>> desc.allele_two is None
True
>>> desc.phase is None
True
>>> len(desc.allele_one.variants)
2
Two protein alleles in trans:
>>> desc = parse_hgvs("NP_003997.1:p.[Ser68Arg];[Ser68=]").description
>>> desc.allele_two is not None
True
>>> desc.phase
<AllelePhase.TRANS: 'trans'>
>>> len(desc.allele_one.variants)
1
>>> len(desc.allele_two.variants)
1
phased_alleles
property
Return the established phased allele pair, if present.
Returns:
| Type | Description |
|---|---|
tuple[Allele[VariantT], Allele[VariantT]] | None
|
The established
phased allele pair as |
Notes
This property reports only the primary phased allele pair
represented by allele_one and allele_two. Alleles in
alleles_unphased are not included.
Examples:
A single allele does not establish a phased pair:
>>> from tinyhgvs import parse_hgvs
>>> desc = parse_hgvs("NC_000001.11:g.[123G>A;345del]").description
>>> desc.phased_alleles is None
True
Two alleles with established phase return a pair:
>>> desc = parse_hgvs("NM_004006.2:c.[2376G>C];[2376=]").description
>>> desc.phase
<AllelePhase.TRANS: 'trans'>
>>> pair = desc.phased_alleles
>>> pair is not None
True
>>> len(pair[0].variants), len(pair[1].variants)
(1, 1)
Two alleles with uncertain phase do not return a pair:
>>> desc = parse_hgvs("NC_000001.11:g.123G>A(;)345del").description
>>> desc.phase
<AllelePhase.UNCERTAIN: 'uncertain'>
>>> pair = desc.phased_alleles
>>> pair is None
True
Additional alleles with uncertain relation to the established pair:
>>> desc = parse_hgvs(
... "NM_004006.2:c.[296T>G;476T>C];[476T>C](;)1083A>C"
... ).description
>>> pair = desc.phased_alleles
>>> pair is not None
True
>>> len(desc.alleles_unphased)
1
Two protein alleles with known phase:
>>> desc = parse_hgvs("NP_003997.1:p.[Ser68Arg];[Ser68=]").description
>>> desc.phase
<AllelePhase.TRANS: 'trans'>
>>> pair = desc.phased_alleles
>>> pair is not None
True
>>> len(pair[0].variants), len(pair[1].variants)
(1, 1)
Two predicted protein alleles with unknown phase:
>>> desc = parse_hgvs("NP_003997.1:p.(Ser73Arg)(;)(Asn103del)").description
>>> desc.phase
<AllelePhase.UNCERTAIN: 'uncertain'>
>>> desc.phased_alleles is None
True
>>> len(desc.unphased_alleles)
0
unphased_alleles
property
unphased_alleles: tuple[Allele[VariantT], ...]
Return alleles with uncertain relation to the allele state
established by allele_one and allele_two.
Returns:
| Type | Description |
|---|---|
tuple[Allele[VariantT], ...]
|
Alleles written after the established allele state whose relation to that state is uncertain. Empty when not present. |
Notes
This property exposes the additional alleles stored in
alleles_unphased. It does not include allele_one or
allele_two.
Examples:
Uncertain phase between two primary alleles does not create an unphased tail:
>>> from tinyhgvs import parse_hgvs
>>> desc = parse_hgvs("NC_000001.11:g.123G>A(;)345del").description
>>> len(desc.unphased_alleles)
0
One additional unphased allele to the established state:
>>> desc = parse_hgvs(
... "NM_004006.2:c.[296T>G];[476T>C](;)1083A>C"
... ).description
>>> len(desc.unphased_alleles)
1
>>> len(desc.unphased_alleles[0].variants)
1
Multiple additions of unphased alleles to the established state:
>>> desc = parse_hgvs(
... "NM_004006.2:c.[296T>G];[476T>C](;)1083A>C(;)1406del"
... ).description
>>> len(desc.unphased_alleles)
2
One additional unphased protein allele to the established state:
>>> desc = parse_hgvs("p.[Ser68Arg];[Asn594del](;)0").description
>>> len(desc.unphased_alleles)
1
>>> desc.unphased_alleles[0].variants[0].effect.kind
'no_protein_produced'
No additional unphased protein alleles:
__iter__
__iter__() -> Iterator[Allele[VariantT]]
Iterate over alleles in description order.
Yields:
| Type | Description |
|---|---|
Allele[VariantT]
|
Alleles in description order: |
Notes
Iteration preserves the structural order of the HGVS allele-variant description. It does not infer or reorder alleles by phase.
Examples:
>>> from tinyhgvs import parse_hgvs
>>> desc = parse_hgvs("NM_004006.2:c.[2376G>C];[3103del]").description
>>> len(tuple(desc))
2
NucleotideAnchor
Bases: str, Enum
HGVS reference point used to interpret a nucleotide position.
Attributes:
| Name | Type | Description |
|---|---|---|
ABSOLUTE |
Coordinate is read directly on the named reference sequence. |
|
RELATIVE_CDS_START |
Coordinate is read relative to the CDS start site. |
|
RELATIVE_CDS_END |
Coordinate is read relative to the CDS end site. |
Examples:
An intronic splice-site substitution uses direct coordinates:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.357+1G>A")
>>> variant.description.location.start.anchor
<NucleotideAnchor.ABSOLUTE: 'absolute'>
A 5' UTR substitution is anchored to the CDS start:
>>> variant = parse_hgvs("NM_007373.4:c.-1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_START: 'relative_cds_start'>
A 3' UTR substitution is anchored to the CDS end:
>>> variant = parse_hgvs("NM_001272071.2:c.*1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_END: 'relative_cds_end'>
NucleotideCoordinateKind
Bases: str, Enum
Known/unknown state for a nucleotide coordinate.
Attributes:
| Name | Type | Description |
|---|---|---|
KNOWN |
Coordinate has anchor, coordinate, and offset values. |
|
UNKNOWN |
Coordinate is written as |
NucleotideCoordinate
dataclass
NucleotideCoordinate(kind: NucleotideCoordinateKind, anchor: NucleotideAnchor | None = None, coordinate: int | None = None, offset: int | None = None)
Nucleotide coordinate written as a known position or ?.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
NucleotideCoordinateKind
|
Whether this coordinate is known or unknown. |
anchor |
NucleotideAnchor | None
|
Reference point used to interpret the coordinate. |
coordinate |
int | None
|
Primary HGVS coordinate as written. |
offset |
int | None
|
Signed secondary displacement from the primary coordinate.
|
Examples:
Duplication crossing an exon/intron border:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.11(NM_004006.2):c.260_264+48dup")
>>> variant.description.location.start.coordinate
260
>>> variant.description.location.end.coordinate
264
>>> variant.description.location.end.offset
48
Upstream intronic substitution:
>>> variant = parse_hgvs("NG_012232.1(NM_004006.2):c.264-2A>G")
>>> variant.description.location.start.coordinate
264
>>> variant.description.location.start.offset
-2
5' UTR substitution:
>>> variant = parse_hgvs("NM_007373.4:c.-1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_START: 'relative_cds_start'>
>>> variant.description.location.start.coordinate
-1
>>> variant.description.location.start.offset
0
3' UTR substitution:
>>> variant = parse_hgvs("NM_001272071.2:c.*1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_END: 'relative_cds_end'>
>>> variant.description.location.start.coordinate
1
>>> variant.description.location.start.offset
0
5' UTR intronic substitution:
>>> variant = parse_hgvs("NM_001385026.1:c.-106+2T>A")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_START: 'relative_cds_start'>
>>> variant.description.location.start.coordinate
-106
>>> variant.description.location.start.offset
2
is_intronic
property
is_cds_start_anchored
property
Return True if variant's location is relative to the CDS start.
Examples:
is_cds_end_anchored
property
Return True if variant's location is relative to the CDS end.
Examples:
is_five_prime_utr
property
Return True for exonic positions in the 5' UTR.
Examples:
>>> from tinyhgvs import parse_hgvs
>>> position = parse_hgvs("NM_007373.4:c.-123C>T").description.location.start
>>> position.is_five_prime_utr
True
>>> position = parse_hgvs("NM_001385026.1:c.-106+2T>A").description.location.start
>>> position.is_five_prime_utr
False
>>> position.is_three_prime_utr
False
is_three_prime_utr
property
Return True for exonic positions in the 3' UTR.
Examples:
>>> from tinyhgvs import parse_hgvs
>>> position = parse_hgvs("NM_001272071.2:c.*1C>T").description.location.start
>>> position.is_three_prime_utr
True
>>> position = parse_hgvs("NM_001272071.2:c.*639-1G>A").description.location.start
>>> position.is_three_prime_utr
False
>>> position.is_five_prime_utr
False
CopiedSequenceItem
dataclass
CopiedSequenceItem(source_reference: ReferenceSpec | None, source_coordinate_system: CoordinateSystem | None, source_location: Interval[NucleotideCoordinate], is_inverted: bool)
Copied nucleotide sequence used in an insertion or deletion-insertion.
Attributes:
| Name | Type | Description |
|---|---|---|
source_reference |
ReferenceSpec | None
|
Source reference when the copied sequence comes from
a different accession. |
source_coordinate_system |
CoordinateSystem | None
|
Source coordinate system when it differs from
the outer variant. |
source_location |
Interval[NucleotideCoordinate]
|
Inclusive interval on the source reference. |
is_inverted |
bool
|
Whether the copied sequence is inserted in reverse orientation. |
Examples:
A stretch of sequence from the same transcript is inserted in reverse orientation:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.849_850ins850_900inv")
>>> item = variant.description.edit.items[0]
>>> item.is_from_same_reference
True
>>> item.source_location.start.coordinate
850
>>> item.source_location.end.coordinate
900
>>> item.is_inverted
True
A copied sequence can also come from another chromosome:
>>> variant = parse_hgvs("NC_000002.11:g.47643464_47643465ins[NC_000022.10:g.35788169_35788352]")
>>> item = variant.description.edit.items[0]
>>> item.source_reference.primary.id
'NC_000022.10'
>>> item.source_coordinate_system
<CoordinateSystem.GENOMIC: 'g'>
LiteralSequenceItem
dataclass
Model describing literal-base-type sequence edit component.
Attributes:
| Name | Type | Description |
|---|---|---|
value |
str
|
Nucleotide bases. |
Examples:
A literal insertion of three nucleotides:
>>> from tinyhgvs import LiteralSequenceItem, parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.32862923_32862924insCCT")
>>> item = variant.description.edit.items[0]
>>> isinstance(item, LiteralSequenceItem)
True
>>> item.value
'CCT'
RepeatSequenceItem
dataclass
Model describing repeat-type sequence edit component.
Attributes:
| Name | Type | Description |
|---|---|---|
unit |
str
|
Repeat unit of nucleotide bases. |
count |
int
|
Number of units being repeated. |
Examples:
The insertion contains 100 copies of N:
>>> from tinyhgvs import RepeatSequenceItem, parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.32717298_32717299insN[100]")
>>> item = variant.description.edit.items[0]
>>> isinstance(item, RepeatSequenceItem)
True
>>> item.unit
'N'
>>> item.count
100
NucleotideRepeatBlock
dataclass
NucleotideRepeatBlock(count: int, unit: str | None = None, location: Interval[NucleotideCoordinate] | None = None)
One repeat block/unit in a nucleotide repeat description.
Attributes:
| Name | Type | Description |
|---|---|---|
count |
int
|
Number of repeated units. |
unit |
str | None
|
Literal base(s) repeat unit. None when repeat unit is described
in the form of |
location |
Interval[NucleotideCoordinate] | None
|
Location per repeat block. None when repeat unit is described
in the form of |
Examples:
A literal 3bp bases repeat with 23 units:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000014.8:g.123CAG[23]")
>>> block = variant.description.edit.blocks[0]
>>> block.unit
'CAG'
>>> block.count
23
>>> block.location is None
True
A RNA repeat variant composed of consecutive repeat units, each
described in the form location[count], rather than unit[count]:
a repetitive unit from a location:
>>> variant = parse_hgvs("NM_004006.3:r.456_465[4]466_489[9]490_499[3]")
>>> block = variant.description.edit.blocks[1]
>>> block.unit is None
True
>>> block.location.start.coordinate
466
>>> block.location.end.coordinate
489
NucleotideSequenceOmittedEdit
Bases: str, Enum
Nucleotide edits whose altered sequence is not written explicitly.
Attributes:
| Name | Type | Description |
|---|---|---|
NO_CHANGE |
No nucleotide change, written as |
|
DELETION |
Deletion of the reference interval, written as |
|
DUPLICATION |
Duplication of the reference interval, written as |
|
INVERSION |
Inversion of the reference interval, written as |
Examples:
A coding DNA deletion:
>>> from tinyhgvs import NucleotideSequenceOmittedEdit, parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.5697del")
>>> variant.description.edit is NucleotideSequenceOmittedEdit.DELETION
True
A genomic duplication:
>>> variant = parse_hgvs("NC_000001.11:g.1234_2345dup")
>>> variant.description.edit
<NucleotideSequenceOmittedEdit.DUPLICATION: 'duplication'>
NucleotideSubstitutionEdit
dataclass
Model describing nucleotide substitution.
Examples:
A reference base C is substituted by A at the described
location.
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.33038255C>A")
>>> variant_edit = variant.description.edit
>>> variant_edit.reference
'C'
>>> variant_edit.alternate
'A'
>>> variant_edit.kind
'substitution'
NucleotideInsertionEdit
dataclass
NucleotideInsertionEdit(items: tuple[NucleotideSequenceItem, ...])
Model describing nucleotide insertion.
Attributes:
| Name | Type | Description |
|---|---|---|
items |
tuple[NucleotideSequenceItem, ...]
|
Inserted sequence items in the order they appear in the HGVS expression. |
kind |
Literal['insertion']
|
Edit kind. |
Examples:
Literal nucleotide insertion:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.32862923_32862924insCCT")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].value
'CCT'
A composite insertion can mix literal and copied sequence:
>>> variant = parse_hgvs("LRG_199t1:c.419_420ins[T;450_470;AGGG]")
>>> variant_edit = variant.description.edit
>>> len(variant_edit.items)
3
>>> variant_edit.items[0].value
'T'
>>> variant_edit.items[1].is_from_same_reference
True
>>> variant_edit.items[2].value
'AGGG'
Insertion of unspecified repeated bases:
>>> variant = parse_hgvs("NC_000023.10:g.32717298_32717299insN[100]")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].unit
'N'
>>> variant_edit.items[0].count
100
NucleotideDeletionInsertionEdit
dataclass
NucleotideDeletionInsertionEdit(items: tuple[NucleotideSequenceItem, ...])
Model describing nucleotide deletion-insertion.
Attributes:
| Name | Type | Description |
|---|---|---|
items |
tuple[NucleotideSequenceItem, ...]
|
Replacement sequence items in the order they appear in the HGVS expression. |
kind |
Literal['deletion_insertion']
|
Edit kind. |
Examples:
A deleted interval is replaced by one literal sequence component.
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("LRG_199t1:c.850_901delinsTTCCTCGATGCCTG")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].value
'TTCCTCGATGCCTG'
A deleted interval can be replaced by copied sequence from the same reference:
>>> variant = parse_hgvs("NC_000022.10:g.42522624_42522669delins42536337_42536382")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].source_location.start.coordinate
42536337
A deleted interval is replaced by repeated unspecified bases.
>>> variant = parse_hgvs("NM_004006.2:c.812_829delinsN[12]")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].unit
'N'
>>> variant_edit.items[0].count
12
NucleotideRepeatEdit
dataclass
NucleotideRepeatEdit(blocks: tuple[NucleotideRepeatBlock, ...])
Model describing a top-level nucleotide repeat variant.
Attributes:
| Name | Type | Description |
|---|---|---|
blocks |
tuple[NucleotideRepeatBlock, ...]
|
Repeat blocks/units written in the HGVS description. |
kind |
Literal['repeat']
|
Edit kind. |
Examples:
A DNA repeat variant with explicit repeat unit:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000014.8:g.123CAG[23]")
>>> variant_edit = variant.description.edit
>>> variant_edit.blocks[0].unit
'CAG'
>>> variant_edit.blocks[0].count
23
A RNA repeat variant composed of consecutive blocks/units, each represented as a location span:
>>> variant = parse_hgvs("NM_004006.3:r.456_465[4]466_489[9]490_499[3]")
>>> variant_edit = variant.description.edit
>>> len(variant_edit.blocks)
3
>>> variant_edit.blocks[2].count
3
NucleotideVariant
dataclass
NucleotideVariant(location: Location[NucleotideCoordinate], edit: NucleotideEdit)
Model describing a nucleotide-level variant.
Attributes:
| Name | Type | Description |
|---|---|---|
location |
Location[NucleotideCoordinate]
|
|
edit |
NucleotideEdit
|
Nucleotide edit applied at the location. |
Examples:
A splice-site substitution is represented by a nucleotide location and a nucleotide substitution edit.
>>> from tinyhgvs import NucleotideSubstitutionEdit, parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.357+1G>A")
>>> isinstance(variant.description.edit, NucleotideSubstitutionEdit)
True
>>> variant_description = variant.description
>>> variant_description.location.start.coordinate
357
>>> variant_description.location.start.offset
1