3000 Rice Genomes Project

Rice, Oryza sativa L., is the staple food for half the world’s population. By 2030, rice production must increase by at least 25% to keep pace with population growth. Accelerated genetic gains in rice improvement are needed to mitigate the effects of climate change and loss of arable land and to ensure global food supply. On May 28, 2014, data from an international effort resequencing a core collection of 3,024 rice accessions from 89 countries was released as a global public good. This data provides a foundation for large-scale discovery of novel alleles for important rice phenotypes using various bioinformatics and/or genetic approaches. It also serves to understand, at a higher level of detail, the genomic diversity within O. sativa, and provides a foundation for establishing a global, public rice genetic/genomic database and information platform for advancing rice breeding technology for future rice improvement.

The initial publications on the dataset, namely The 3,000 rice genomes project, data note and The 3,000 rice genomes project: new opportunities and challenges for future rice research, commentary are published in GigaScience Journal.

The MAIN publication of the 3K RG project in Nature journal is entitled:

Genomic variation in 3,010 diverse accessions of Asian cultivated rice

The complete list of rice accessions sequenced for the 3K RG project is available in this site and in GigaDB.

RAW SEQUENCING DATA AVAILABILITY

The 3,024 sequenced rice genomes had an average sequencing depth of 14X, average genome coverage and mapping rates of 94.0% and 92.5%, respectively. Raw sequencing data are available from GigaDB, EBI, NCBI (accession PRJEB6180), and DDBJ (accession ERP005654).

ANALYZED DATA AVAILABILITY

To further enable the utilization of the 3K RG dataset by the global rice community, we also released the primary analyses results for variant discovery on the sequencing data, with 24 additional genomes being included , resulting in over 120 terabytes of downloadable data. The dataset is released under the stipulations for data analysts and data users in the Toronto Statement , in the following resources:

1. SNP-Seek download area

2. Amazon Web Services (AWS) Public Data Set. Through a partnership with AWS, the 3000 Rice Genome data is freely available on Amazon S3. This enables anyone to use AWS on-demand computing resources to perform analysis and create new products. You can learn more about accessing and utilizing the data on AWS from the 3000 Rice Genome on AWS page. You can view IRIC resources that use the 3K RG dataset on AWS here.

3. Philippine DOST-ASTI COARE facility: IRRI is collaborating with the Philippines’ Department of Science and Technology - Advanced Science and Technology Institute (DOST-ASTI) to utilize their data storage service — COARE Data Catalog. The COARE Data Catalog is a web-based research repository that hosts a number of research datasets. Also, it offers a web interface to search and access research datasets. The 3kRG dataset is hosted in the COARE Data Catalog. To access the dataset, visit here and register for an account.

4. Web resources hosted by SouthGreen Bioinformatics Platform

- - - - IRD Gigwa

INFORMATION ABOUT THE 3,000 ACCESSIONS

The following tables give more information about the re-sequenced accessions from the International Rice Genebank collection at IRRI (Table 1) , and from the China National Crop Genebank and the CAAS working collections (Table 2).

Table 1. Information for the 2,466 rice accessions from the International Rice Genebank Collection at the International Rice Research Institute

Table 2. Information for the 534 rice accessions from the China National Crop Genebank and the CAAS working collections

*MC = the mini-core collection accessions established by China Agricultural University [9]; IRMBN are the parental lines used in the international rice molecular breeding network, selected previously to represent the mini-core collections based on the isozyme data

Selected Varieties

We suggest using this list of 72 varieties in further studies. The selection was made based on diversity, availability of seeds in IRRI Genebank and sequencing coverage.

Selected

Publications

♦ 3K R.G.P. The 3,000 rice genomes project. GigaScience 2014, 3:7.

♦ Li, J., Wang, J. and Zeigler, R. S. The 3,000 rice genomes project: new opportunities and challenges for future rice research. GigaScience 2014, 3:8.

Page updated

Report abuse