GRch38

sequencing

Whole Genome Sequencing

Whole genome sequencing (also known as full genome sequencing, complete genome sequencing, or entire genome sequencing) is a laboratory process that determines the complete DNA sequence of an organism’s genome at a single time.

Genome sequencing is figuring out the order of DNA nucleotides, or bases, in a genome—the order of As, Cs, Gs, and Ts that make up an organism’s DNA. The human genome is made up of over 3 billion of these genetic letters

Reference Genome

The reference genome provides a template by which sequencing reads can be mapped to their chromosomal locations. It is an indispensable resource for geneticists worldwide, who use it to piece together sequences, understand the context of reads, and find areas of genetic variation by comparing genomes against a “standard” sequence.

Genome Reference Consortium

The Genome Reference Consortium (GRC) is an international collective of academic and research institutes with expertise in genome mapping, sequencing, and informatics, formed to improve the representation of reference genomes

The GRC is a collaborative effort which interacts with various groups in the scientific community, however the primary member institutes are:

  • The Wellcome Trust Sanger Institute
  • The Genome Institute at Washington University
  • The European Bioinformatics Institute
  • The National Center for Biotechnology Information

Grch38

In the final days of 2013, the Genome Reference Consortium (GRC) released the eagerly awaited GRCh38 human genome assembly, the first major revision of the human genome in more than four years. and the sequence is available in the following ftp location

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/

What is new in Grch38?

Alternate sequences:

Alternate loci are mostly sequences in regions known to be highly polymorphic. Several human chromosomal regions exhibit sufficient variability to prevent adequate representation by a single sequence. To address this, the GRCh38 assembly provides alternate sequence for selected variant regions through the inclusion of alternate loci scaffolds (or alt loci)

GRch38:

This assembly contains 261 alt loci, many of which are associated with the LRC/KIR area of chr19 and the MHC region         on chr6.

GRch37:

Contains 9 alternate sequences from the following 3 regions

  • 6:28,477,796-33,448,353 — the MHC region
  • 17:43,384,863-44,913,631 — long inversion
  • 4:69,170,076-69,878,206

Centromere:

GRCh38

The large megabase-sized gaps that were previously used to represent centromeric regions in human assemblies              have been replaced by sequences from centromere models created by Karen Miga et al. using centromere databases         developed during her work in the Willard lab at Duke University and analysis software developed while working in the       Kent lab at UCSC.

   centomereGRCh37:

Like all previous builds of the reference genome, represented them as standard 3-megabase gaps, which fails even to         express the variation in size between different chromosomes’ centromeres

    The large megabase-sized gaps that were previously used to represent centromeric regions in human assemblies have       been replaced by sequences from centromere models created by Karen Miga et al. using centromere databases                      developed during her work in the Willard lab at Duke University and analysis software developed while working in the        Kent lab at UCSC.

 Mitochondrial genome

GRCh38:

The mitochondrial reference sequence included in the GRCh38 assembly and hg38 Genome Browser (termed “chrM” in the browser) is the Revised Cambridge Reference Sequence (rCRS) from MITOMAP with GenBank accession number J01415.2 and RefSeq accession number NC_012920.1.

GRCh37:

This differs from the chrM sequence (RefSeq accession number NC_001907) used by the previous hg19 Genome Browser, which was not updated when the GRCh37 assembly later transitioned to the new version.