Genomics comes with a lot of terminology. Our glossary features some of the terms you may come across in genetics and genomics. From allele to zygosity, we explain everything in easy-to-understand language.



A variant form (or version) of a gene. Some genes have different forms that are found in the same place in the genome. Humans have two alleles for most genes, with one inherited from each parent. Individuals can have two of the same allele (homozygous) or have different alleles (heterozygous).

Alternative splicing

A process during gene expression where different combinations of exons are spliced together, enabling a single gene to encode multiple proteins.

Amino acid

Molecules that act as building blocks for proteins. A protein is made up of chains of amino acids. Properties of amino acids such as size and charge, determine the structure, function, and folding of a protein.

Annotate; annotation

Adding information about a variant in the genomic data file, such as the variant’s chromosome location, gene location, or predicted effect on protein structure or function.


A numbered chromosome, unrelated to the sex of an organism.

Back to top.



Parts of the DNA building blocks that are commonly called the four ‘letters’ of DNA: adenine (A), cytosine (C), guanine (G) and thymine (T). The ‘letters’ of RNA are adenine (A), cytosine (C), guanine (G) and uracil (U).


Having variants on both copies (alleles) of a gene. An affected individual could be homozygous or compound heterozygous.


A field of biology that uses algorithms and software to analyse biological data, using the data to make biological discoveries, construct models or make predictions.

Back to top.


Call (a variant)

The process of identifying a variant from DNA sequence data. Sample DNA is sequenced and aligned to a reference genome for comparison. Differences in the sample are determined to be variants – they are ‘called’.

Cascade screening

Genetic testing of biological relatives of an individual with a variant known to cause a genetic condition. Testing aims to identify family members carrying the variant and their chance of developing the condition or passing the variant on to their children.


A compact, threadlike structure composed of a DNA molecule coiled around histone proteins. Humans have 22 pairs of numbered chromosomes (autosomes) and one pair of sex chromosomes (XX or XY), with one of each pair inherited from each parent.

Chromosomal microarray

A diagnostic test to identify structural changes in chromosomes, such as an altered number of whole chromosomes and copy number variants.


A sequence of 3 bases in mRNA that codes for a particular amino acid in a protein.

Copy number variant; copy number variation (CNV)

A difference in the number of copies of a specific section of DNA, such as large sequence duplications and deletions.

Back to top.


De novo

A ‘new’ variant that arises in a gamete (sperm or egg), early in embryonic development, or in cancer cells is a de novo variant. De novo variants will be seen in an affected individual but not their parents – the variant is not inherited.


The loss (or deletion) of one or more nucleotides (DNA building blocks) from a DNA sequence.


Having two complete sets of chromosomes, with each parent contributing one of each pair.

DNA (Deoxyribonucleic Acid)

The genetic material of life on earth. DNA is built from 4 nucleotides – adenine (A), cytosine (C), guanine (G) and thymine (T) – joined in strands by phosphodiester bonds. Two linked strands form a double helix of complementary base pairs (A-T and C-G).

DNA sequence

The order of the nucleotide bases in a DNA molecule.

Driver mutation

In cancer, a gene with one or more variants that increases the rate of cell replication.

Back to top.


Epigenetic change

Modification of a DNA molecule by addition of chemical ‘tags’ without changing the DNA sequence. These changes can alter the way genes are turned on and off, and can be inherited.  


The portion of the genome that includes all protein-coding genes (the exons). It makes up 1-2% of the entire genome.


The protein-coding region of a gene.

Back to top.



A change in the reading frame – groups of 3 nucleotides – of a gene. An insertion or deletion that is not a multiple of 3 nucleotides will produce a frameshift.

Fusion (gene/protein)

A gene made by joining parts of two different genes. The fusion-gene codes for a fusion-protein. It is a common genetic variant in cancer.

Back to top.



A section of DNA that carries the code for a protein or RNA molecule. Individuals inherit genes from their parents. They contain information that determines physical and biological characteristics.

Gene expression

The process of turning genes on and off to decode a DNA sequence into a protein. Technically, this involves two processes: transcription, which ‘copies’ a gene into an mRNA molecule, and translation which ‘reads’ the mRNA to make a protein.  

Gene list

A list of candidate genes associated with a phenotype, used when scientists and clinicians are analysing an individual’s genomic sequence data to find a genetic cause for a condition.

Gene structure

The elements of a gene, including introns and exons, promoters, regulatory regions, and untranslated regions.


The entire set of DNA information of an organism, including all of the genes. It contains all the information necessary for the human body to develop and function. The human genome has about 3 billion DNA base pairs and around 20,000 protein coding genes.


The genetic makeup of an individual, comprising the alleles of all genes.

Germline variants

Genetic variants that are present in gametes (egg and sperm cells) and can potentially be inherited by offspring.

Back to top.



Having only one copy of a gene as a result of having only one copy of the chromosome. Examples include the genes on the X-chromosome in males, or loss of alleles due to deletion of a section of chromosome.

Heterozygous; Heterozygote

Having inherited two different alleles for a given gene from each parent.


A protein complex within the cell nucleus. The long strands of chromosomal DNA coil around histones for a more compact shape.

Homozygous; Homozygote

Having inherited two identical alleles for a given gene from each parent.

Back to top.



Addition of one or more nucleotides into a DNA sequence.


A sequence in DNA that resides between the exons in a gene but do not carry coding information for a protein. Introns are also called intervening sequences.

Back to top.



A laboratory-produced representation of a person’s complete set of chromosomes in numerical order.

Back to top.


Mendelian (inheritance)

Patterns of inheritance of how characteristics are passed down from parents. The patterns establish how children can inherit traits due to a single gene (monogenic conditions). Examples of patterns include: recessive, dominant, X-linked inheritance.


A collective term used to refer to around 4000 genes known to carry variants associated with conditions caused by a single gene variant (monogenic conditions). The term derives from ‘Mendelian’ inheritance.


A genetic variant (nucleotide substitution) causing a change in one amino acid in the resulting protein.


A condition caused by a variant in a single gene.


Messenger RNA (mRNA) carries the information needed to produce proteins. mRNA is produced by transcription of the DNA template. The initial transcript of the gene contains both introns and exons. Introns are spliced out to produce mature mRNA.

mRNA splicing

The process of removing intron sequences in the initial RNA transcript and joining the exon regions together to produce the messenger RNA (mRNA).

Multigene panel test

A laboratory test that looks at several candidate genes known to cause a condition. It is used to identify variants that may be the cause of the condition.


A change in the DNA sequence. In a clinical setting, mutations are usually now called variants.

Back to top.


Next-generation sequencing (NGS)

DNA sequencing technology used for sequencing many genes at once. It is faster than preceding sequencing methods, such as Sanger sequencing. It is also called massively parallel sequencing. NGS technology is the method used for sequencing the entire genome.

Nonsense mediated decay (NMD)

A cellular pathway that breaks down mRNA transcripts carrying a nonsense variant which produces an early ‘stop codon’.

Nonsense variant

A gene change that causes a premature stop codon, a signal to stop producing the protein, rather than coding for an amino acid. This results in producing a short or truncated protein product. It can cause NMD (see above).


The building block of nucleic acids DNA and RNA. It is comprised of sugar, phosphate, and a nitrogenous base. The base components in DNA are adenine (A), cytosine (C), guanine (G) and thymine (T); in RNA: adenine (A), cytosine (C), guanine (G) and uracil (U).

Back to top.


Panel test


A publicly available knowledgebase and source of gene panels for the analysis of a genomic sequence. See PanelApp Australia.


Disease-causing. A pathogenic variant affects cell function and causes a genetic condition.


A chart with symbols representing inheritance over 2 or more generations of a family.


The physical appearance and physiology of an individual, resulting from expression of an individual’s genetic makeup (genotype) and influenced by environmental factors.


A variant that occurs frequently in a population, with a frequency >1%. Polymorphic genes contribute to typical variations with the population, e.g., the genes that control hair colour are polymorphic.


The individual through whom a family with a genetic disorder is ascertained – the first person in a family to be diagnosed with a genetic disorder.


Molecules encoded by genes, comprised of amino acids in a sequence specified by the DNA sequence. The sequence of amino acids determines how a protein folds and functions.


An inactive version of a gene. Pseudogenes began as a functional protein-coding gene but have lost their ability to code for proteins due to accumulated mutations through evolution.

Back to top.



The sequencing copies of a DNA sequence. Many reads of the same DNA region are needed for reliable variant identification when compared to a reference genome.

Reference sequence or reference genome

A ‘representative’ sequence of a gene or genome for comparison to individual gene or exome sequences. Reference sequences are assembled by scientists from many different genome sequences.

Regulatory gene

A gene encoding a protein that controls how other genes are turned on or off.

RNA (Ribonucleic Acid)

A nucleic acid similar to DNA but containing ribose sugar instead of deoxyribose sugar in its structure. RNA is often single-stranded, and the nucleotide bases are adenine (A), cytosine (C), guanine (G) and uracil (U).

Back to top.


Sanger sequencing

A method of determining the order of nucleotides in DNA, one gene at a time. It is used to confirm variants and single gene sequence.

Segregation studies

Genetic testing of the parents and/or grandparents of an individual with a pathogenic variant, to gain information on how the variant was inherited. e.g., de novo, recessive or dominant.

Sex chromosome

In mammals, the X chromosome and Y chromosome that typically determine the biological sex of the individual.


Genes located on the sex chromosomes (X or Y chromosomes).

Single gene test

A laboratory test to identify variants in one gene associated with a trait or clinical presentation.


Genomic testing performed on an individual subject; as compared to trio analysis, where the affected individual and their parents are tested.

SNP (Single nucleotide polymorphism)

A singe base pair in DNA that shows polymorphism – having alternate alleles – in a population.

SNV (Single nucleotide variant)

A single base difference between an individual’s DNA sequence compared to a reference sequence in a genomic test.

Somatic variant

A change in DNA that occurs after fertilisation of egg and sperm and is not inherited.

Splice site variant

A genetic alteration in the DNA sequence at the boundary of an exon and intron – known as the splice site. Splice sites are two bases at either side of an intron recognised by the cell’s mechanism to cut the introns out of RNA. A change can disrupt RNA splicing and result in altered proteins.

Structural variant (SV)

Large deletions, insertions, inversions, translocations, gene fusions and gene duplications occurring in chromosomes.


A variant where one nucleotide is replaced by one other nucleotide.

Back to top.



The RNA produced by transcription of a gene, where the DNA sequence is ‘copied’ into an RNA sequence.


The process of a ribosome reading the mRNA sequence to bring the correct amino acids needed to produce a polypeptide or protein.

Trinucleotide repeat; triplet repeat

Three consecutive nucleotides that repeat in tandem at one location, e.g. CCGCCGCCGCCG. Also called triplet repeat expansion.


The genomic testing of an affected individual and both their biological parents.

Back to top.


Uniparental disomy

A situation where an individual has two copies of a chromosome (or part of a chromosome) originating from one biological parent rather than one from each parent.


A change or variation in a DNA sequence as compared to a reference sequence. Variants range from single base changes to large rearrangements of DNA.

Variant classification

The scale used to describe the likelihood of a variant being pathogenic or benign. The classifications used are typically: class 5-Pathogenic, class 4-Likely Pathogenic, class 3-Variant of Uncertain Significance, class 2-Likely Benign and class 1- Benign.

Variant curation

The process of gathering evidence for and against a variant being pathogenic or benign.

Variant interpretation

The overall process of finding and prioritising the variants (gene changes) found in a genomic test, then collecting and curating evidence (variant curation) to determine how likely they are to explain the cause of a condition or cancer (variant classification) and identifying whether the result provides information on treatment changes for a patient.

VUS (VOUS), variant of uncertain significance

A change in DNA sequence where it is unclear whether it is the cause of a condition.

Back to top.

W, X, Y, Z

WES (whole exome sequencing)

Determining the sequence of all the exons in a genome.

WGS (whole genome sequencing)

Determining the sequence of all the DNA in an individual – both the regions that code for protein and the ‘non-coding’ regions.


The inactivation of one copy of the X-chromosome in females. Biological females have two X chromosomes, XX, compared to males with one X chromosome and one Y chromosome. X-inactivation ‘evens up’ the dosage of X-linked genes in males and females.


The degree of similarity of the alleles at a given genomic location, usually defined by the terms homozygous, heterozygous or hemizygous.

Back to top.


We could talk genomics all day, but we’ll send you only what’s useful and interesting.

Melbourne Genomics acknowledges the Wurundjeri people of the Kulin Nation, on whose lands we work, and all First Nations peoples across Victoria. We pay respect to Elders past and present. We also acknowledge the First Nations health professionals, researchers and leaders who are shaping the future of genomic medicine.

© 2014–2024 Melbourne Genomics Health Alliance