Rice, Oryza sativa L., is the staple food for half the world’s population. By 2030, rice production must increase by at least 25% to keep pace with population growth. Accelerated genetic gains in rice improvement are needed to mitigate the effects of climate change and loss of arable land and to ensure global food supply. On May 28, 2014, data from an international effort resequencing a core collection of 3,000 rice accessions from 89 countries was released as a global public good. This data provides a foundation for large-scale discovery of novel alleles for important rice phenotypes using various bioinformatics and/or genetic approaches. It also serves to understand, at a higher level of detail, the genomic diversity within O. sativa, and provides a foundation for establishing a global, public rice genetic/genomic database and information platform for advancing rice breeding technology for future rice improvement.
The initial publications on the dataset, namely The 3,000 rice genomes project, data note and The 3,000 rice genomes project: new opportunities and challenges for future rice research, commentary are published in GigaScience Journal. The complete list of rice accessions sequenced for the 3K RG project is now available within this site and in GigaDB. ).
RAW SEQUENCING DATA AVAILABILITY
The 3,000 sequenced rice genomes had an average sequencing depth of 14X, average genome coverage and mapping rates of 94.0% and 92.5%, respectively. Raw sequencing data are available from GigaDB, EBI, NCBI (accession PRJEB6180), and DDBJ (accession ERP005654).
ANALYZED DATA AVAILABILITY
To further enable the utilization of the 3K RG dataset by the global rice community, we also released the primary analyses results for variant discovery on the sequencing data, with 24 additional genomes being included , resulting in over 120 terabytes of downloadable data. We are now preparing two main manuscripts about this dataset, to be submitted on April 30, 2016. In the spirit of open scientific collaboration, the dataset is released under the stipulations for data analysts and data users in the Toronto Statement , in the following resources:
1. SNP-Seek download area
2. Amazon Web Services (AWS) Public Data Set. Through a partnership with AWS, the 3000 Rice Genome data is freely available on Amazon S3. This enables anyone to use AWS on-demand computing resources to perform analysis and create new products. You can learn more about accessing and utilizing the data on AWS from the 3000 Rice Genome on AWS page. You can view IRIC resources that use the 3K RG dataset on AWS here.
3. Philippine DOST-ASTI iRODs facilities: IRRI is collaborating with the Philippines’ Department of Science and Technology Advanced Science and Technology Institute to utilize their data storage service infrastructure. DOST-ASTI utilizes iRODS, an open-source, data management platform designed to handle, manage and store massive amounts of data across heterogeneous storage systems. Users who wish to get the 3K RG data could use two file transfer methods, namely iRODS icommands , which is more for power users and developers, and offer greater transfer speeds, and by WebDav , an extension to the popular HTTP protocol that web browsers use.
Before downloading, please read the permissive license of the dataset at ASTI here.
To directly use the files from the ASTI resource in your applications using HTTPS, use the URL of the file you wish to access. As an example, to access the BAM file for accession IRIS_313-10371 as aligned to Nipponbare reference genome, type this URL in your web browser to download the file:
Alternatively , you can use the wget command to download, example...
wget --no-check-certificate https://anonymous:email@example.com/pub/3kRG/nipponbare/bam/01_Nipponbare_IRIS_313-10371.realigned.bam
The manifest list of all BAM and VCF files and their locations, for all reference genomes are in the following links:
Using a web browser, if a dialog box asking credentials appear, type user name anonymous, password anonymous. If a security alert regarding connection safety also appears, just ignore the message, the connection is safe. Just click the links to proceed with connection. Please see the iRODS instruction document for more details.
4. Web resources hosted by the Chinese Academy of Agricultural Sciences (CAAS)
5. Internally at IRRI, the sequences, alignment and SNP call files , in fastq, BAM and VCF formats, respectively, are available for copying; just send an email request to the IRIC coordinator. Be advised that these files are huge.
The FastQC report for the 3K sequencing data is available here.
Table 1. Information for the 2,466 rice accessions from the International Rice Genebank Collection at the International Rice Research Institute.
The following tables give more information about the re-sequenced accessions from the International Rice Genebank collection at IRRI (Table 1) , and from the China National Crop Genebank and the CAAS working collections (Table 2).
Table 2. Information for the 534 rice accessions from the China National Crop Genebank and the CAAS working collections.
*MC = the mini-core collection accessions established by China Agricultural University ; IRMBN are the parental lines used in the international rice molecular breeding network, selected previously to represent the mini-core collections based on the isozyme data
We suggest using this list of 72 varieties in further studies. The selection was made based on diversity, availability of seeds in IRRI Genebank and sequencing coverage.
♦ 3K R.G.P. The 3,000 rice genomes project. GigaScience 2014, 3:7. Download PDF
♦ Li, J., Wang, J. and Zeigler, R. S. The 3,000 rice genomes project: new opportunities and challenges for future rice research. GigaScience 2014, 3:8. Download PDF