Nucleotide Models

Nucleotide-focused Python model types for parsed HGVS variants.

Type Aliases:

Name	Description
`NucleotideSequenceItem`	Tagged union for supported inserted or replacement nucleotide sequence models.
`NucleotideEdit`	Tagged union for supported nucleotide edit models.

NucleotideSequenceItem `module-attribute`

NucleotideSequenceItem: TypeAlias = LiteralSequenceItem | RepeatSequenceItem | CopiedSequenceItem

Tagged union for supported inserted or replacement nucleotide components:

NucleotideEdit `module-attribute`

NucleotideEdit: TypeAlias = NucleotideSequenceOmittedEdit | NucleotideSubstitutionEdit | NucleotideInsertionEdit | NucleotideDeletionInsertionEdit | NucleotideRepeatEdit

Tagged union for supported nucleotide edit models:

CoordinateSystem

Bases: str, Enum

Supported HGVS coordinate types.

The coordinate system tells users what kind of biological reference frame is being used by the parsed variant.

Attributes:

Name	Type	Description
`GENOMIC`		Genomic DNA coordinates written as `g.`.
`CIRCULAR_GENOMIC`		Circular genomic DNA coordinates written as `o.`.
`MITOCHONDRIAL`		Mitochondrial DNA coordinates written as `m.`.
`CODING_DNA`		Coding DNA coordinates written as `c.`.
`NON_CODING_DNA`		Non-coding DNA coordinates written as `n.`.
`RNA`		RNA coordinates written as `r.`.
`PROTEIN`		Protein coordinates written as `p.`.

Examples:

Genomic DNA variant:

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.11:g.33038255C>A")
>>> variant.coordinate_system
<CoordinateSystem.GENOMIC: 'g'>

Coding DNA variant:

>>> variant = parse_hgvs("NM_004006.2:c.357+1G>A")
>>> variant.coordinate_system
<CoordinateSystem.CODING_DNA: 'c'>

Protein variant:

>>> variant = parse_hgvs("NP_003997.1:p.Trp24Ter")
>>> variant.coordinate_system
<CoordinateSystem.PROTEIN: 'p'>

NucleotideAnchor

Bases: str, Enum

HGVS reference point used to interpret a nucleotide position.

Attributes:

Name	Type	Description
`ABSOLUTE`		Coordinate is read directly on the named reference sequence.
`RELATIVE_CDS_START`		Coordinate is read relative to the CDS start site.
`RELATIVE_CDS_END`		Coordinate is read relative to the CDS end site.

Examples:

An intronic splice-site substitution uses direct coordinates:

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.357+1G>A")
>>> variant.description.location.start.anchor
<NucleotideAnchor.ABSOLUTE: 'absolute'>

A 5' UTR substitution is anchored to the CDS start:

>>> variant = parse_hgvs("NM_007373.4:c.-1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_START: 'relative_cds_start'>

A 3' UTR substitution is anchored to the CDS end:

>>> variant = parse_hgvs("NM_001272071.2:c.*1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_END: 'relative_cds_end'>

NucleotideCoordinate `dataclass`

NucleotideCoordinate(anchor: NucleotideAnchor, coordinate: int, offset: int = 0)

Nucleotide coordinate with anchor and signed offset semantics.

Attributes:

Name	Type	Description
`anchor`	`NucleotideAnchor`	Reference point used to interpret the coordinate.
`coordinate`	`int`	Primary HGVS coordinate as written. For example, `c.-81` uses `coordinate == -81` and `c.*24` uses `coordinate == 24`.
`offset`	`int`	Signed secondary displacement from the primary coordinate. Positive values move downstream and negative values move upstream.

Examples:

Duplication crossing an exon/intron border:

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.11(NM_004006.2):c.260_264+48dup")
>>> variant.description.location.start.coordinate
260
>>> variant.description.location.end.coordinate
264
>>> variant.description.location.end.offset
48

Upstream intronic substitution:

>>> variant = parse_hgvs("NG_012232.1(NM_004006.2):c.264-2A>G")
>>> variant.description.location.start.coordinate
264
>>> variant.description.location.start.offset
-2

5' UTR substitution:

>>> variant = parse_hgvs("NM_007373.4:c.-1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_START: 'relative_cds_start'>
>>> variant.description.location.start.coordinate
-1
>>> variant.description.location.start.offset
0

3' UTR substitution:

>>> variant = parse_hgvs("NM_001272071.2:c.*1C>T")
>>> variant.description.location.start.anchor
<NucleotideAnchor.RELATIVE_CDS_END: 'relative_cds_end'>
>>> variant.description.location.start.coordinate
1
>>> variant.description.location.start.offset
0

is_intronic `property`

is_intronic: bool

Return True for coordinate-anchored positions with an offset.

Examples:

>>> from tinyhgvs import parse_hgvs
>>> parse_hgvs("NM_004006.2:c.357+1G>A").description.location.start.is_intronic
True
>>> parse_hgvs("NG_012232.1(NM_004006.2):c.264-2A>G").description.location.start.is_intronic
True

is_five_prime_utr `property`

is_five_prime_utr: bool

Return True for positions in the 5' UTR.

Examples:

>>> from tinyhgvs import parse_hgvs
>>> position = parse_hgvs("NM_007373.4:c.-123C>T").description.location.start
>>> position.is_five_prime_utr
True
>>> position.is_three_prime_utr
False

is_three_prime_utr `property`

is_three_prime_utr: bool

Return True for positions in the 3' UTR.

Examples:

>>> from tinyhgvs import parse_hgvs
>>> position = parse_hgvs("NM_001272071.2:c.*1C>T").description.location.start
>>> position.is_three_prime_utr
True
>>> position.is_five_prime_utr
False

CopiedSequenceItem `dataclass`

CopiedSequenceItem(source_reference: ReferenceSpec | None, source_coordinate_system: CoordinateSystem | None, source_location: Interval[NucleotideCoordinate], is_inverted: bool)

Copied nucleotide sequence used in an insertion or deletion-insertion.

Attributes:

Name	Type	Description
`source_reference`	`ReferenceSpec \| None`	Source reference when the copied sequence comes from a different accession. `None` means the same outer reference.
`source_coordinate_system`	`CoordinateSystem \| None`	Source coordinate system when it differs from the outer variant. `None` means the same outer coordinate system.
`source_location`	`Interval[NucleotideCoordinate]`	Inclusive interval on the source reference.
`is_inverted`	`bool`	Whether the copied sequence is inserted in reverse orientation.

Examples:

A stretch of sequence from the same transcript is inserted in reverse orientation:

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.849_850ins850_900inv")
>>> item = variant.description.edit.items[0]
>>> item.is_from_same_reference
True
>>> item.source_location.start.coordinate
850
>>> item.source_location.end.coordinate
900
>>> item.is_inverted
True

A copied sequence can also come from another chromosome:

>>> variant = parse_hgvs("NC_000002.11:g.47643464_47643465ins[NC_000022.10:g.35788169_35788352]")
>>> item = variant.description.edit.items[0]
>>> item.source_reference.primary.id
'NC_000022.10'
>>> item.source_coordinate_system
<CoordinateSystem.GENOMIC: 'g'>

LiteralSequenceItem `dataclass`

LiteralSequenceItem(value: str)

Model describing literal-base-type sequence edit component.

Attributes:

Name	Type	Description
`value`	`str`	Nucleotide bases.

Examples:

A literal insertion of three nucleotides:

>>> from tinyhgvs import LiteralSequenceItem, parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.32862923_32862924insCCT")
>>> item = variant.description.edit.items[0]
>>> isinstance(item, LiteralSequenceItem)
True
>>> item.value
'CCT'

RepeatSequenceItem `dataclass`

RepeatSequenceItem(unit: str, count: int)

Model describing repeat-type sequence edit component.

Attributes:

Name	Type	Description
`unit`	`str`	Repeat unit of nucleotide bases.
`count`	`int`	Number of units being repeated.

Examples:

The insertion contains 100 copies of N:

>>> from tinyhgvs import RepeatSequenceItem, parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.32717298_32717299insN[100]")
>>> item = variant.description.edit.items[0]
>>> isinstance(item, RepeatSequenceItem)
True
>>> item.unit
'N'
>>> item.count
100

NucleotideRepeatBlock `dataclass`

NucleotideRepeatBlock(count: int, unit: str | None = None, location: Interval[NucleotideCoordinate] | None = None)

One repeat block/unit in a nucleotide repeat description.

Attributes:

Name	Type	Description
`count`	`int`	Number of repeated units.
`unit`	`str \| None`	Literal base(s) repeat unit. None when repeat unit is described in the form of `location[count]`.
`location`	`Interval[NucleotideCoordinate] \| None`	Location per repeat block. None when repeat unit is described in the form of `unit[count]`.

Examples:

A literal 3bp bases repeat with 23 units:

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000014.8:g.123CAG[23]")
>>> block = variant.description.edit.blocks[0]
>>> block.unit
'CAG'
>>> block.count
23
>>> block.location is None
True

A RNA repeat variant composed of consecutive repeat units, each described in the form location[count], rather than unit[count]: a repetitive unit from a location:

>>> variant = parse_hgvs("NM_004006.3:r.456_465[4]466_489[9]490_499[3]")
>>> block = variant.description.edit.blocks[1]
>>> block.unit is None
True
>>> block.location.start.coordinate
466
>>> block.location.end.coordinate
489

NucleotideSequenceOmittedEdit

Bases: str, Enum

Nucleotide edits whose altered sequence is not written explicitly.

Attributes:

Name	Type	Description
`NO_CHANGE`		No nucleotide change, written as `=`.
`DELETION`		Deletion of the reference interval, written as `del`.
`DUPLICATION`		Duplication of the reference interval, written as `dup`.
`INVERSION`		Inversion of the reference interval, written as `inv`.

Examples:

A coding DNA deletion:

>>> from tinyhgvs import NucleotideSequenceOmittedEdit, parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.5697del")
>>> variant.description.edit is NucleotideSequenceOmittedEdit.DELETION
True

A genomic duplication:

>>> variant = parse_hgvs("NC_000001.11:g.1234_2345dup")
>>> variant.description.edit
<NucleotideSequenceOmittedEdit.DUPLICATION: 'duplication'>

NucleotideSubstitutionEdit `dataclass`

NucleotideSubstitutionEdit(reference: str, alternate: str)

Model describing nucleotide substitution.

Examples:

A reference base C is substituted by A at the described location.

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.33038255C>A")
>>> variant_edit = variant.description.edit
>>> variant_edit.reference
'C'
>>> variant_edit.alternate
'A'
>>> variant_edit.kind
'substitution'

NucleotideInsertionEdit `dataclass`

NucleotideInsertionEdit(items: tuple[NucleotideSequenceItem, ...])

Model describing nucleotide insertion.

Attributes:

Name	Type	Description
`items`	`tuple[NucleotideSequenceItem, ...]`	Inserted sequence items in the order they appear in the HGVS expression.
`kind`	`Literal['insertion']`	Edit kind.

Examples:

Literal nucleotide insertion:

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000023.10:g.32862923_32862924insCCT")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].value
'CCT'

A composite insertion can mix literal and copied sequence:

>>> variant = parse_hgvs("LRG_199t1:c.419_420ins[T;450_470;AGGG]")
>>> variant_edit = variant.description.edit
>>> len(variant_edit.items)
3
>>> variant_edit.items[0].value
'T'
>>> variant_edit.items[1].is_from_same_reference
True
>>> variant_edit.items[2].value
'AGGG'

Insertion of unspecified repeated bases:

>>> variant = parse_hgvs("NC_000023.10:g.32717298_32717299insN[100]")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].unit
'N'
>>> variant_edit.items[0].count
100

NucleotideDeletionInsertionEdit `dataclass`

NucleotideDeletionInsertionEdit(items: tuple[NucleotideSequenceItem, ...])

Model describing nucleotide deletion-insertion.

Attributes:

Name	Type	Description
`items`	`tuple[NucleotideSequenceItem, ...]`	Replacement sequence items in the order they appear in the HGVS expression.
`kind`	`Literal['deletion_insertion']`	Edit kind.

Examples:

A deleted interval is replaced by one literal sequence component.

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("LRG_199t1:c.850_901delinsTTCCTCGATGCCTG")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].value
'TTCCTCGATGCCTG'

A deleted interval can be replaced by copied sequence from the same reference:

>>> variant = parse_hgvs("NC_000022.10:g.42522624_42522669delins42536337_42536382")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].source_location.start.coordinate
42536337

A deleted interval is replaced by repeated unspecified bases.

>>> variant = parse_hgvs("NM_004006.2:c.812_829delinsN[12]")
>>> variant_edit = variant.description.edit
>>> variant_edit.items[0].unit
'N'
>>> variant_edit.items[0].count
12

NucleotideRepeatEdit `dataclass`

NucleotideRepeatEdit(blocks: tuple[NucleotideRepeatBlock, ...])

Model describing a top-level nucleotide repeat variant.

Attributes:

Name	Type	Description
`blocks`	`tuple[NucleotideRepeatBlock, ...]`	Repeat blocks/units written in the HGVS description.
`kind`	`Literal['repeat']`	Edit kind.

Examples:

A DNA repeat variant with explicit repeat unit:

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NC_000014.8:g.123CAG[23]")
>>> variant_edit = variant.description.edit
>>> variant_edit.blocks[0].unit
'CAG'
>>> variant_edit.blocks[0].count
23

A RNA repeat variant composed of consecutive blocks/units, each represented as a location span:

>>> variant = parse_hgvs("NM_004006.3:r.456_465[4]466_489[9]490_499[3]")
>>> variant_edit = variant.description.edit
>>> len(variant_edit.blocks)
3
>>> variant_edit.blocks[2].count
3

NucleotideVariant `dataclass`

NucleotideVariant(location: Interval[NucleotideCoordinate], edit: NucleotideEdit)

Model describing a nucleotide-level variant.

Attributes:

Name	Type	Description
`location`	`Interval[NucleotideCoordinate]`	Inclusive nucleotide interval where the edit is applied.
`edit`	`NucleotideEdit`	Nucleotide edit applied at that interval.

Examples:

A splice-site substitution is represented by a nucleotide location and a nucleotide substitution edit.

>>> from tinyhgvs import NucleotideSubstitutionEdit, parse_hgvs
>>> variant = parse_hgvs("NM_004006.2:c.357+1G>A")
>>> isinstance(variant.description.edit, NucleotideSubstitutionEdit)
True
>>> variant_description = variant.description
>>> variant_description.location.start.coordinate
357
>>> variant_description.location.start.offset
1

Nucleotide Models

NucleotideSequenceItem module-attribute

NucleotideEdit module-attribute

CoordinateSystem

NucleotideAnchor

NucleotideCoordinate dataclass

is_intronic property

is_five_prime_utr property

is_three_prime_utr property

CopiedSequenceItem dataclass

LiteralSequenceItem dataclass

RepeatSequenceItem dataclass

NucleotideRepeatBlock dataclass

NucleotideSequenceOmittedEdit

NucleotideSubstitutionEdit dataclass

NucleotideInsertionEdit dataclass

NucleotideDeletionInsertionEdit dataclass

NucleotideRepeatEdit dataclass

NucleotideVariant dataclass

NucleotideSequenceItem `module-attribute`

NucleotideEdit `module-attribute`

NucleotideCoordinate `dataclass`

is_intronic `property`

is_five_prime_utr `property`

is_three_prime_utr `property`

CopiedSequenceItem `dataclass`

LiteralSequenceItem `dataclass`

RepeatSequenceItem `dataclass`

NucleotideRepeatBlock `dataclass`

NucleotideSubstitutionEdit `dataclass`

NucleotideInsertionEdit `dataclass`

NucleotideDeletionInsertionEdit `dataclass`

NucleotideRepeatEdit `dataclass`

NucleotideVariant `dataclass`