3KRG in AWS
To further enable the utilization of the 3K RG dataset by the global rice community, we have released primary analyses results for variant discovery on the sequencing data, with 24 additional genomes being included. All data generated by the 3K RGP is now publicly available online as an Amazon Web Services (AWS) Public Data Set. You can learn more about accessing the data here at 3000 Rice Genome on AWS webpage. The manifest for the alignment files can also be used to download the files directly from AWS Public Data.
The data includes alignment of the 3,024 rice genome sequences to 5 published O. sativa genomes representing 3 major cultivated groups (aus, indica, and japonica) and the raw SNP calls from these alignments. The software pipeline, along with the parameters used in the analysis, are also available at the resource. In this current analyses, over 30 million variants (SNPs and small indels) were discovered from the 3,024 accessions sequenced, representing almost 10% of the total rice genome. These discovered variants span all of the known and predicted rice genes, as well as the potential regulatory regions surrounding these genes. More in-depth analyses of this dataset could lead to inferences about novel alleles causative to important agronomic traits such as higher yield and stress tolerance (to pests, diseases, resilience to climate change).
We have also developed SNP-Seek (Alexandrov et al 2014) that can connect to AWS 3K RG, enabling easy access to particular genome regions of interest (e.g. an important gene for drought tolerance) across a selected set of accessions, visualize this region in a genome browser, and conduct further analyses on this subset data, as illustrated in the following examples.