GENOME Research: Genome browser

Monday, April 18, 2011

Medaka Hd-rR: Whole Genome Sequencing Project

Sequencing of the medaka genome was started at the Academia Sequencing Center of the National Institute of Genetics (NIG) in mid 2002. The project was supported by group grant Genome Science (Grant-in-Aid for Scientific Research on Priority Areas supported by the Ministry of Education, Culture, Sports, Science and Technology of Japan).

The sequencing was conducted by the whole-genome shotgun strategy using southern inbred strain, Hd-rR. The genome was assembled from 13.8 million reads, obtained from the whole genome shotgun plasmid, fosmid, and bacterial artificial chromosome (BAC) libraries. The total size of the assembled contigs was 700.4 megabases (Mb). 50% of nucleotides are covered in scaffolds (or contigs) of length 1.41Mb (9.8 kilobases) that are called N50 values. This contiguity is sufficient to characterize the genomic structures of genes.

Four versions of the medaka genome sequence data named 200406, 200506, version 0.9, and version 1.0 have been released to the public to provide users with timely information. The former two versions had shorter scaffolds that were not anchored on the medaka chromosomes because they were built in 2004 and 2005, before genetics markers were available. Versions 0.9 and 1.0 created in 2006, when comprehensive genetic markers were available, so that about 90% of their scaffolds and ultracontigs were located on the twenty-four medaka chromosomes. Versions 0.9 and 1.0 were built from the identical contigs and scaffolds, but the assembly of version 1.0 is longer than that of version 0.9 because more genetic markers could be used to generate version 1.0. Version 0.9 is left open to the public because most of the data analysis in the medaka genome paper published in Nature (2007) was based on version 0.9.

The University of Tokyo Medaka Genome Browser (UTGB Medaka) a web-based genome database browser, which provides various information related to medaka genomes, including assembly sequences, genes, clones, homologus genome sequences to other species, etc.

Thursday, February 10, 2011

Zv9, the most recent genome assembly for zebrafish.

The zebrafish (Danio rerio) is an important model organism for the study of vertebrate development and disease, organ function, behavior, and toxicology. Some of the features that make the zebrafish so experimentally amenable include its short generation time, large numbers of embryos produced per mating, and the development of transparent embryos outside the mother, allowing all stages of development to be observed. The Sanger Institute started the zebrafish genome sequencing project in 2001 and has released several genome assemblies, the latest is Zv9 .

Ninth assembly, Zv9 of the zebrafish genome released is recently been made available in the UCSC Genome Browser. This assembly comprises a total sequence length of 1.4 Gb in 4,560 scaffolds. The remaining gaps were filled with sequence from WGS31, a combined Illumina and capillary assembly. The assembly integration process involves sequence alignments as well as cDNA, marker and BAC/Fosmid end sequence placements. The sequences that are based on clone contigs or are linked to chromosomes via markers are named 'Zv9_scaffold' followed by a number. The WGS contigs that could not be placed onto chromosomes are named 'Zv9_NA' followed by a number.

This preliminary assembly was produced by The Wellcome Trust Sanger Institute, UK.

http://www.sanger.ac.uk/Projects/D_rerio/