Skip to content

Protein Models

Protein-focused Python model types for parsed HGVS variants.

Type Aliases:

Name Description
ProteinEdit

Tagged union for supported protein edit models.

ProteinEffect

Tagged union for supported protein consequence models.

ProteinEffect module-attribute

Tagged union for supported protein consequence models:

ProteinCoordinate dataclass

ProteinCoordinate(residue: str, ordinal: int)

Protein position written as residue symbol plus ordinal.

Attributes:

Name Type Description
residue str

Amino-acid symbol.

ordinal int

Amino-acid position.

Examples:

A protein substitution at residue 24 is located using the amino-acid symbol and ordinal together.

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NP_003997.1:p.Trp24Ter")
>>> position = variant.description.effect.location.start
>>> position.residue
'Trp'
>>> position.ordinal
24

ProteinSequence dataclass

ProteinSequence(residues: tuple[str, ...])

Ordered amino-acid sequence used by insertions and deletion-insertions.

Attributes:

Name Type Description
residues tuple[str, ...]

Ordered tuple of amino-acid symbols.

Examples:

A protein insertion adds three amino acids in order.

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("p.Lys2_Gly3insGlnSerLys")
>>> variant_edit = variant.description.effect.edit
>>> variant_edit.sequence.residues
('Gln', 'Ser', 'Lys')

ProteinSequenceOmittedEdit

Bases: str, Enum

Protein edits whose altered amino-acid sequence is not written explicitly.

Attributes:

Name Type Description
UNKNOWN

A protein change is expected, but the consequence is not known.

NO_CHANGE

No protein change, written as =.

DELETION

Deletion of the stated amino-acid interval.

DUPLICATION

Duplication of the stated amino-acid interval.

Examples:

A predicted but unspecified consequence at Met1:

>>> from tinyhgvs import ProteinSequenceOmittedEdit, parse_hgvs
>>> variant = parse_hgvs("LRG_199p1:p.(Met1?)")
>>> variant.description.effect.edit is ProteinSequenceOmittedEdit.UNKNOWN
True

An explicitly unchanged residue:

>>> variant = parse_hgvs("NP_003997.1:p.Cys188=")
>>> variant.description.effect.edit
<ProteinSequenceOmittedEdit.NO_CHANGE: 'no_change'>

ProteinSubstitutionEdit dataclass

ProteinSubstitutionEdit(to: str)

Protein substitution to another residue symbol.

Attributes:

Name Type Description
to str

Amino acid or stop symbol replacing the reference residue.

kind Literal['substitution']

Edit kind.

Examples:

A tryptophan residue is replaced by a termination codon.

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NP_003997.1:p.Trp24Ter")
>>> variant_edit = variant.description.effect.edit
>>> variant_edit.to
'Ter'
>>> variant_edit.kind
'substitution'

ProteinInsertionEdit dataclass

ProteinInsertionEdit(sequence: ProteinSequence)

Model describing protein insertion.

Attributes:

Name Type Description
sequence ProteinSequence

Inserted amino-acid sequence.

kind Literal['insertion']

Edit kind.

Examples:

Three amino acids are inserted between residues 2 and 3.

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("p.Lys2_Gly3insGlnSerLys")
>>> variant_edit = variant.description.effect.edit
>>> variant_edit.sequence.residues
('Gln', 'Ser', 'Lys')
>>> variant_edit.kind
'insertion'

ProteinDeletionInsertionEdit dataclass

ProteinDeletionInsertionEdit(sequence: ProteinSequence)

Model describing protein deletion-insertion.

Attributes:

Name Type Description
sequence ProteinSequence

Replacement amino-acid sequence.

kind Literal['deletion_insertion']

Edit kind.

Examples:

One residue is deleted and replaced by two amino acids.

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("p.Cys28delinsTrpVal")
>>> variant_edit = variant.description.effect.edit
>>> variant_edit.sequence.residues
('Trp', 'Val')
>>> variant_edit.kind
'deletion_insertion'

ProteinRepeatEdit dataclass

ProteinRepeatEdit(count: int)

Model describing a top-level protein repeat variant.

Attributes:

Name Type Description
count int

Number of repeated amino-acid units.

kind Literal['repeat']

Edit kind.

Examples:

A protein repeat variant with repeat unit coming from an interval:

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NP_0123456.1:p.Arg65_Ser67[12]")
>>> variant.description.effect.edit.count
12
>>> variant.description.effect.edit.kind
'repeat'

ProteinExtensionTerminal

Bases: str, Enum

Protein terminus toward which an extension variant extends.

Attributes:

Name Type Description
N

Extension toward the N-terminus.

C

Extension toward the C-terminus.

Examples:

N-terminal extension:

>>> from tinyhgvs import ProteinExtensionTerminal, parse_hgvs
>>> variant = parse_hgvs("NP_003997.2:p.Met1ext-5")
>>> variant.description.effect.edit.to_terminal is ProteinExtensionTerminal.N
True

C-terminal extension:

>>> variant = parse_hgvs("NP_003997.2:p.Ter110GlnextTer17")
>>> variant.description.effect.edit.to_terminal is ProteinExtensionTerminal.C
True

ProteinExtensionEdit dataclass

ProteinExtensionEdit(to_terminal: ProteinExtensionTerminal, to_residue: str | None, terminal_ordinal: int | None)

Model describing a protein extension consequence.

Attributes:

Name Type Description
to_terminal ProteinExtensionTerminal

Protein terminus toward which an extension variant extends.

to_residue str | None

Residue replacing the reference stop codon in a C-terminal extension. None for N-terminal extension.

terminal_ordinal int | None

New terminal ordinal. Negative for N-terminal extension, positive for C-terminal extension with known stop, and None when the new stop is unknown.

kind Literal['extension']

Edit kind.

Examples:

An N-terminal extension:

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NP_003997.2:p.Met1ext-5")
>>> variant_edit = variant.description.effect.edit
>>> variant_edit.to_terminal
<ProteinExtensionTerminal.N: 'n'>
>>> variant_edit.to_residue is None
True
>>> variant_edit.terminal_ordinal
-5

A C-terminal extension with known new stop:

>>> variant = parse_hgvs("NP_003997.2:p.Ter110GlnextTer17")
>>> variant_edit = variant.description.effect.edit
>>> variant_edit.to_terminal
<ProteinExtensionTerminal.C: 'c'>
>>> variant_edit.to_residue
'Gln'
>>> variant_edit.terminal_ordinal
17

A C-terminal extension with unknown new stop:

>>> variant = parse_hgvs("p.Ter327ArgextTer?")
>>> variant.description.effect.edit.terminal_ordinal is None
True

ProteinFrameshiftStopKind

Bases: str, Enum

Model describing a stop codon is known (long-form), or omitted (short-form), or unknown (not encountered) due to a frameshift event.

Attributes:

Name Type Description
OMITTED

Short-form frameshift where stop codon information is omitted.

UNKNOWN

Long-form frameshift with unknown stop codon, such as p.Arg97ProfsTer?.

KNOWN

Long-form frameshift with known stop codon, such as p.Arg97ProfsTer23.

Examples:

Protein frameshift variant with unknown stop codon:

>>> from tinyhgvs import parse_hgvs
>>> unknown = parse_hgvs("NP_0123456.1:p.Arg97ProfsTer?")
>>> unknown.description.effect.edit.stop.kind
<ProteinFrameshiftStopKind.UNKNOWN: 'unknown'>

Short-form protein frameshift variant:

>>> short = parse_hgvs("NP_0123456.1:p.Arg97fs")
>>> short.description.effect.edit.stop.kind
<ProteinFrameshiftStopKind.OMITTED: 'omitted'>

ProteinFrameshiftStop dataclass

ProteinFrameshiftStop(ordinal: int | None, kind: ProteinFrameshiftStopKind)

Model describing stop codon information in a protein frameshift edit.

Attributes:

Name Type Description
ordinal int | None

Stop codon ordinal. None when stop codon information is either omitted (short-form) or unknown.

kind ProteinFrameshiftStopKind

Whether the stop is omitted, unknown, or known.

Examples:

Protein frameshift variant with a known stop codon:

>>> from tinyhgvs import parse_hgvs
>>> long = parse_hgvs("NP_0123456.1:p.Arg97ProfsTer23")
>>> long_stop = long.description.effect.edit.stop
>>> long_stop.ordinal
23
>>> long_stop.kind
<ProteinFrameshiftStopKind.KNOWN: 'known'>

Short-form frameshift variant leaves the stop ordinal optional:

>>> short = parse_hgvs("NP_0123456.1:p.Arg97fs")
>>> short_stop = short.description.effect.edit.stop
>>> short_stop.ordinal is None
True
>>> short_stop.kind
<ProteinFrameshiftStopKind.OMITTED: 'omitted'>

ProteinFrameshiftEdit dataclass

ProteinFrameshiftEdit(to_residue: str | None, stop: ProteinFrameshiftStop)

Model describing a protein frameshift consequence.

Attributes:

Name Type Description
to_residue str | None

First newly encoded residue. None when in short-form.

stop ProteinFrameshiftStop

Model describing stop codon information in frameshift variant.

kind Literal['frameshift']

Edit kind.

Examples:

A short-form protein frameshift variant:

>>> from tinyhgvs import parse_hgvs
>>> short = parse_hgvs("NP_0123456.1:p.Arg97fs")
>>> short_edit = short.description.effect.edit
>>> short_edit.to_residue is None
True
>>> short_edit.stop.kind
<ProteinFrameshiftStopKind.OMITTED: 'omitted'>

A long-form protein frameshift variants:

>>> long = parse_hgvs("NP_0123456.1:p.Arg97ProfsTer23")
>>> long_edit = long.description.effect.edit
>>> long_edit.to_residue
'Pro'
>>> long_edit.stop.ordinal
23

A predicted long-form protein frameshift variant:

>>> predicted = parse_hgvs("p.(Arg97ProfsTer?)")
>>> predicted.description.is_predicted
True
>>> predicted.description.effect.edit.stop.kind
<ProteinFrameshiftStopKind.UNKNOWN: 'unknown'>

ProteinUnknownEffect dataclass

ProteinUnknownEffect()

Model describing the protein consequence p.?.

Attributes:

Name Type Description
kind Literal['unknown']

Effect kind.

Examples:

The protein consequence is entirely unknown.

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("NP_003997.1:p.?")
>>> variant.description.effect.kind
'unknown'

ProteinNoProteinProducedEffect dataclass

ProteinNoProteinProducedEffect()

Model describing the protein consequence p.0.

Attributes:

Name Type Description
kind Literal['no_protein_produced']

Effect kind.

Examples:

The variant predicts that no protein product is made.

>>> from tinyhgvs import parse_hgvs
>>> variant = parse_hgvs("LRG_199p1:p.0")
>>> variant.description.effect.kind
'no_protein_produced'

ProteinEditEffect dataclass

ProteinEditEffect(location: Location[ProteinCoordinate], edit: ProteinEdit)

Protein consequence at specified location.

Attributes:

Name Type Description
location Location[ProteinCoordinate]

Location where the edit occurs.

edit ProteinEdit

Protein edit applied at the location.

kind Literal['edit']

Effect kind.

Examples:

A deletion spanning residues Lys23 to Val25 is represented by a protein location and a protein deletion edit.

>>> from tinyhgvs import ProteinSequenceOmittedEdit, parse_hgvs
>>> variant = parse_hgvs("NP_003997.2:p.Lys23_Val25del")
>>> effect = variant.description.effect
>>> effect.location.start.residue
'Lys'
>>> effect.location.end.residue
'Val'
>>> effect.edit is ProteinSequenceOmittedEdit.DELETION
True

A protein frameshift consequence:

>>> variant = parse_hgvs("NP_0123456.1:p.Arg97ProfsTer23")
>>> effect = variant.description.effect
>>> effect.location.start.residue
'Arg'
>>> effect.edit.to_residue
'Pro'

A protein extension consequence:

>>> variant = parse_hgvs("NP_003997.2:p.Ter110GlnextTer17")
>>> effect = variant.description.effect
>>> effect.location.start.residue
'Ter'
>>> effect.edit.kind
'extension'
>>> effect.edit.terminal_ordinal
17

ProteinVariant dataclass

ProteinVariant(is_predicted: bool, effect: ProteinEffect)

Parsed protein-level consequence.

Attributes:

Name Type Description
is_predicted bool

Whether the effect was written in parentheses.

effect ProteinEffect

Parsed protein consequence model.

Examples:

An observed protein consequence is not predicted:

>>> from tinyhgvs import parse_hgvs
>>> parse_hgvs("NP_003997.1:p.Trp24Ter").description.is_predicted
False

A parenthesized protein consequence is predicted:

>>> parse_hgvs("LRG_199p1:p.(Met1?)").description.is_predicted
True

Frameshift consequences:

>>> variant = parse_hgvs("NP_0123456.1:p.Arg97ProfsTer23")
>>> variant.description.effect.edit.kind
'frameshift'

Extension consequences:

>>> variant = parse_hgvs("NP_003997.2:p.Ter110GlnextTer17")
>>> variant.description.effect.edit.kind
'extension'