Datasets » History » Version 20

« Previous - Version 20/21 (diff) - Next » - Current version
Paweł Widera, 12/09/2013 02:45 PM
Table formatting corrected. Links to datasets fixed.


Datasets

How to label datasets

Datasets are labeled in the following way:

[[DatasetName]].tgz
[[DatasetName]]_Literature.tgz
[[DatasetName]]_NumberOfChains_Extraction.tgz

whereas:
DatasetName Authors or web that site proposed the dataset
Literature Special name given in the literatur
NumberOfChains Number of (extracted) chains contained in the dataset
Extraction Extraction of the models/chains: none, first-first, first-all, all-all

Available Datasets

The following datasets are available from the repository (special privileges required!):
DatasetName Extraction !NumberOfChains Size in MB Link
LelukKoniecznyRoterman - - 3.5 Download
first-first 6 0.4 Download
first-all 15 0.9 Download
DatasetName Extraction !NumberOfChains Size in MB Link
ChewKedem - - 3.8 Download
first-first 34 1.3 Download
first-all 54 2.0 Download
all-all 132 4.1 Download
DatasetName Extraction !NumberOfChains Size in MB Link
ProteinKinaseResource - - 3.6 Download
first-first 45 2.4 Download
first-all 49 2.5 Download
all-all 106 4.0 Download
DatasetName Extraction !NumberOfChains Size in MB Link
Skolnick - - 5.1 Download
first-first 33 1.1 Download
first-all 65 2.1 Download
all-all 179 5.9 Download
DatasetName Extraction !NumberOfChains Size in MB Link
RostSander - - 7.4 Download
RS126 126 4.3 Download
first-first 119 4.4 Download
first-all 212 7.6 Download
DatasetName Extraction !NumberOfChains Size in MB Link
KinjoHorimotoNishikawa - - 98 Download
first-first 1012 46 Download
first-all 2013 88 Download
DatasetName Description Extraction !NumberOfChains Size in MB Link
Shah Randomly selected 1000 proteins from PDB - - 114 Download
first-first 1000 41 Download
first-all 1943 80 Download
all-all 4007 124 Download
DatasetName Description Extraction !NumberOfChains Size in GB Link
PDB_SELECT30_04-2008 Downloaded from PDB web site on 10/04/2008 - - 1.1, ucmp*: 4.8 Download
with criteria "Remove similar sequences at 30% identity" first-first 7183 0.285, ucmp*: 1.2 Download
first-all 14651 0.60 , ucmp:*2.7* Download
all-all 43025 1.2 , ucmp*: 5.5 Download
DatasetName Description Extraction !NumberOfChains Size in GB Link
PDB_SELECT25_10-2007 PDB_SELECT25 as of October2007 - - 0.746, ucmp*: 3.4 Download
it's a six monthly updated list of- first-first 3464 0.12, ucmp*: 0.54, Download
non-redundent protein structures first-all 8581 0.30, ucmp*: 1.4 Download
Mostly used in Protein Structure Prediction all-all 31288 0.854 , ucmp*: 4.1 Download

*ucmp: uncompressed