Nucleotide Models
Nucleotide-focused Python model types for parsed HGVS variants.
Type Aliases:
| Name | Description |
|---|---|
NucleotideSequenceItem |
Tagged union for supported inserted or replacement nucleotide sequence models. |
NucleotideEdit |
Tagged union for supported nucleotide edit models. |
NucleotideSequenceItem
module-attribute
NucleotideSequenceItem: TypeAlias = LiteralSequenceItem | RepeatSequenceItem | CopiedSequenceItem
Tagged union for supported inserted or replacement nucleotide components:
NucleotideEdit
module-attribute
NucleotideEdit: TypeAlias = NucleotideSequenceOmittedEdit | NucleotideSubstitutionEdit | NucleotideInsertionEdit | NucleotideDeletionInsertionEdit | NucleotideRepeatEdit
Tagged union for supported nucleotide edit models:
CoordinateSystem
Bases: str, Enum
Supported HGVS coordinate types.
The coordinate system tells users what kind of biological reference frame is being used by the parsed variant.
Attributes:
| Name | Type | Description |
|---|---|---|
GENOMIC |
Genomic DNA coordinates written as |
|
CIRCULAR_GENOMIC |
Circular genomic DNA coordinates written as |
|
MITOCHONDRIAL |
Mitochondrial DNA coordinates written as |
|
CODING_DNA |
Coding DNA coordinates written as |
|
NON_CODING_DNA |
Non-coding DNA coordinates written as |
|
RNA |
RNA coordinates written as |
|
PROTEIN |
Protein coordinates written as |
Examples:
Genomic DNA variant:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.11:g.33038255C>A")
>>> variant.coordinate_system
<CoordinateSystem.GENOMIC: 'g'>
Coding DNA variant:
>>> variant = parse_hgvs("NM_004006.2:c.357+1G>A")
>>> variant.coordinate_system
<CoordinateSystem.CODING_DNA: 'c'>
Protein variant:
>>> variant = parse_hgvs("NP_003997.1:p.Trp24Ter")
>>> variant.coordinate_system
<CoordinateSystem.PROTEIN: 'p'>
NucleotideAnchor
Bases: str, Enum
HGVS reference point used to interpret a nucleotide position.
Attributes:
| Name | Type | Description |
|---|---|---|
ABSOLUTE |
Coordinate is read directly on the named reference sequence. |
|
RELATIVE_CDS_START |
Coordinate is read relative to the CDS start site. |
|
RELATIVE_CDS_END |
Coordinate is read relative to the CDS end site. |
Examples:
An intronic splice-site substitution uses direct coordinates:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.357+1G>A")
>>> variant.description.location.start.anchor
<NucleotideAnchor.ABSOLUTE: 'absolute'>
A 5' UTR substitution is anchored to the CDS start:
>>> variant = parse_hgvs("NM_007373.4:c.-1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_START: 'relative_cds_start'>
A 3' UTR substitution is anchored to the CDS end:
>>> variant = parse_hgvs("NM_001272071.2:c.*1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_END: 'relative_cds_end'>
NucleotideCoordinate
dataclass
NucleotideCoordinate(anchor: NucleotideAnchor, coordinate: int, offset: int = 0)
Nucleotide coordinate with anchor and signed offset semantics.
Attributes:
| Name | Type | Description |
|---|---|---|
anchor |
NucleotideAnchor
|
Reference point used to interpret the coordinate. |
coordinate |
int
|
Primary HGVS coordinate as written. For example, |
offset |
int
|
Signed secondary displacement from the primary coordinate. Positive values move downstream and negative values move upstream. |
Examples:
Duplication crossing an exon/intron border:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.11(NM_004006.2):c.260_264+48dup")
>>> variant.description.location.start.coordinate
260
>>> variant.description.location.end.coordinate
264
>>> variant.description.location.end.offset
48
Upstream intronic substitution:
>>> variant = parse_hgvs("NG_012232.1(NM_004006.2):c.264-2A>G")
>>> variant.description.location.start.coordinate
264
>>> variant.description.location.start.offset
-2
5' UTR substitution:
>>> variant = parse_hgvs("NM_007373.4:c.-1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_START: 'relative_cds_start'>
>>> variant.description.location.start.coordinate
-1
>>> variant.description.location.start.offset
0
3' UTR substitution:
>>> variant = parse_hgvs("NM_001272071.2:c.*1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_END: 'relative_cds_end'>
>>> variant.description.location.start.coordinate
1
>>> variant.description.location.start.offset
0
is_intronic
property
is_five_prime_utr
property
CopiedSequenceItem
dataclass
CopiedSequenceItem(source_reference: ReferenceSpec | None, source_coordinate_system: CoordinateSystem | None, source_location: Interval[NucleotideCoordinate], is_inverted: bool)
Copied nucleotide sequence used in an insertion or deletion-insertion.
Attributes:
| Name | Type | Description |
|---|---|---|
source_reference |
ReferenceSpec | None
|
Source reference when the copied sequence comes from
a different accession. |
source_coordinate_system |
CoordinateSystem | None
|
Source coordinate system when it differs from
the outer variant. |
source_location |
Interval[NucleotideCoordinate]
|
Inclusive interval on the source reference. |
is_inverted |
bool
|
Whether the copied sequence is inserted in reverse orientation. |
Examples:
A stretch of sequence from the same transcript is inserted in reverse orientation:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.849_850ins850_900inv")
>>> item = variant.description.edit.items[0]
>>> item.is_from_same_reference
True
>>> item.source_location.start.coordinate
850
>>> item.source_location.end.coordinate
900
>>> item.is_inverted
True
A copied sequence can also come from another chromosome:
>>> variant = parse_hgvs("NC_000002.11:g.47643464_47643465ins[NC_000022.10:g.35788169_35788352]")
>>> item = variant.description.edit.items[0]
>>> item.source_reference.primary.id
'NC_000022.10'
>>> item.source_coordinate_system
<CoordinateSystem.GENOMIC: 'g'>
LiteralSequenceItem
dataclass
Model describing literal-base-type sequence edit component.
Attributes:
| Name | Type | Description |
|---|---|---|
value |
str
|
Nucleotide bases. |
Examples:
A literal insertion of three nucleotides:
>>> from tinyhgvs import LiteralSequenceItem, parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.32862923_32862924insCCT")
>>> item = variant.description.edit.items[0]
>>> isinstance(item, LiteralSequenceItem)
True
>>> item.value
'CCT'
RepeatSequenceItem
dataclass
Model describing repeat-type sequence edit component.
Attributes:
| Name | Type | Description |
|---|---|---|
unit |
str
|
Repeat unit of nucleotide bases. |
count |
int
|
Number of units being repeated. |
Examples:
The insertion contains 100 copies of N:
>>> from tinyhgvs import RepeatSequenceItem, parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.32717298_32717299insN[100]")
>>> item = variant.description.edit.items[0]
>>> isinstance(item, RepeatSequenceItem)
True
>>> item.unit
'N'
>>> item.count
100
NucleotideRepeatBlock
dataclass
NucleotideRepeatBlock(count: int, unit: str | None = None, location: Interval[NucleotideCoordinate] | None = None)
One repeat block/unit in a nucleotide repeat description.
Attributes:
| Name | Type | Description |
|---|---|---|
count |
int
|
Number of repeated units. |
unit |
str | None
|
Literal base(s) repeat unit. None when repeat unit is described
in the form of |
location |
Interval[NucleotideCoordinate] | None
|
Location per repeat block. None when repeat unit is described
in the form of |
Examples:
A literal 3bp bases repeat with 23 units:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000014.8:g.123CAG[23]")
>>> block = variant.description.edit.blocks[0]
>>> block.unit
'CAG'
>>> block.count
23
>>> block.location is None
True
A RNA repeat variant composed of consecutive repeat units, each
described in the form location[count], rather than unit[count]:
a repetitive unit from a location:
>>> variant = parse_hgvs("NM_004006.3:r.456_465[4]466_489[9]490_499[3]")
>>> block = variant.description.edit.blocks[1]
>>> block.unit is None
True
>>> block.location.start.coordinate
466
>>> block.location.end.coordinate
489
NucleotideSequenceOmittedEdit
Bases: str, Enum
Nucleotide edits whose altered sequence is not written explicitly.
Attributes:
| Name | Type | Description |
|---|---|---|
NO_CHANGE |
No nucleotide change, written as |
|
DELETION |
Deletion of the reference interval, written as |
|
DUPLICATION |
Duplication of the reference interval, written as |
|
INVERSION |
Inversion of the reference interval, written as |
Examples:
A coding DNA deletion:
>>> from tinyhgvs import NucleotideSequenceOmittedEdit, parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.5697del")
>>> variant.description.edit is NucleotideSequenceOmittedEdit.DELETION
True
A genomic duplication:
>>> variant = parse_hgvs("NC_000001.11:g.1234_2345dup")
>>> variant.description.edit
<NucleotideSequenceOmittedEdit.DUPLICATION: 'duplication'>
NucleotideSubstitutionEdit
dataclass
Model describing nucleotide substitution.
Examples:
A reference base C is substituted by A at the described
location.
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.33038255C>A")
>>> variant_edit = variant.description.edit
>>> variant_edit.reference
'C'
>>> variant_edit.alternate
'A'
>>> variant_edit.kind
'substitution'
NucleotideInsertionEdit
dataclass
NucleotideInsertionEdit(items: tuple[NucleotideSequenceItem, ...])
Model describing nucleotide insertion.
Attributes:
| Name | Type | Description |
|---|---|---|
items |
tuple[NucleotideSequenceItem, ...]
|
Inserted sequence items in the order they appear in the HGVS expression. |
kind |
Literal['insertion']
|
Edit kind. |
Examples:
Literal nucleotide insertion:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.32862923_32862924insCCT")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].value
'CCT'
A composite insertion can mix literal and copied sequence:
>>> variant = parse_hgvs("LRG_199t1:c.419_420ins[T;450_470;AGGG]")
>>> variant_edit = variant.description.edit
>>> len(variant_edit.items)
3
>>> variant_edit.items[0].value
'T'
>>> variant_edit.items[1].is_from_same_reference
True
>>> variant_edit.items[2].value
'AGGG'
Insertion of unspecified repeated bases:
>>> variant = parse_hgvs("NC_000023.10:g.32717298_32717299insN[100]")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].unit
'N'
>>> variant_edit.items[0].count
100
NucleotideDeletionInsertionEdit
dataclass
NucleotideDeletionInsertionEdit(items: tuple[NucleotideSequenceItem, ...])
Model describing nucleotide deletion-insertion.
Attributes:
| Name | Type | Description |
|---|---|---|
items |
tuple[NucleotideSequenceItem, ...]
|
Replacement sequence items in the order they appear in the HGVS expression. |
kind |
Literal['deletion_insertion']
|
Edit kind. |
Examples:
A deleted interval is replaced by one literal sequence component.
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("LRG_199t1:c.850_901delinsTTCCTCGATGCCTG")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].value
'TTCCTCGATGCCTG'
A deleted interval can be replaced by copied sequence from the same reference:
>>> variant = parse_hgvs("NC_000022.10:g.42522624_42522669delins42536337_42536382")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].source_location.start.coordinate
42536337
A deleted interval is replaced by repeated unspecified bases.
>>> variant = parse_hgvs("NM_004006.2:c.812_829delinsN[12]")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].unit
'N'
>>> variant_edit.items[0].count
12
NucleotideRepeatEdit
dataclass
NucleotideRepeatEdit(blocks: tuple[NucleotideRepeatBlock, ...])
Model describing a top-level nucleotide repeat variant.
Attributes:
| Name | Type | Description |
|---|---|---|
blocks |
tuple[NucleotideRepeatBlock, ...]
|
Repeat blocks/units written in the HGVS description. |
kind |
Literal['repeat']
|
Edit kind. |
Examples:
A DNA repeat variant with explicit repeat unit:
>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000014.8:g.123CAG[23]")
>>> variant_edit = variant.description.edit
>>> variant_edit.blocks[0].unit
'CAG'
>>> variant_edit.blocks[0].count
23
A RNA repeat variant composed of consecutive blocks/units, each represented as a location span:
>>> variant = parse_hgvs("NM_004006.3:r.456_465[4]466_489[9]490_499[3]")
>>> variant_edit = variant.description.edit
>>> len(variant_edit.blocks)
3
>>> variant_edit.blocks[2].count
3
NucleotideVariant
dataclass
NucleotideVariant(location: Interval[NucleotideCoordinate], edit: NucleotideEdit)
Model describing a nucleotide-level variant.
Attributes:
| Name | Type | Description |
|---|---|---|
location |
Interval[NucleotideCoordinate]
|
Inclusive nucleotide interval where the edit is applied. |
edit |
NucleotideEdit
|
Nucleotide edit applied at that interval. |
Examples:
A splice-site substitution is represented by a nucleotide location and a nucleotide substitution edit.
>>> from tinyhgvs import NucleotideSubstitutionEdit, parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.357+1G>A")
>>> isinstance(variant.description.edit, NucleotideSubstitutionEdit)
True
>>> variant_description = variant.description
>>> variant_description.location.start.coordinate
357
>>> variant_description.location.start.offset
1