Datasets » History » Version 7
Version 6 (Anonymous, 04/11/2008 09:34 AM) → Version 7/21 (Anonymous, 04/11/2008 09:49 AM)
= Datasets =
== How to label datasets ==
Datasets are labeled in the following way:
{{{
DatasetName.tgz
DatasetName_Literature.tgz
DatasetName_NumberOfChains_Extraction.tgz
}}}
whereas:
|| '''!DatasetName''' || Authors or web that site proposed the dataset
|| '''Literature''' || Special name given in the literatur
|| '''!NumberOfChains''' || Number of (extracted) chains contained in the dataset
|| '''Extraction''' || Extraction of the models/chains: ''none'', ''first-first'', ''first-all'', ''all-all''
== Available Datasets ==
The following datasets are available from the repository (special privileges required!):
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !LelukKoniecznyRoterman || - || - || 3.5 || [source:Datasets/LelukKoniecznyRoterman.tgz Download]
|| || first-first || 6 || 0.4 || [source:Datasets/LelukKoniecznyRoterman_6_first-first.tgz Download]
|| || first-all || 15 || 0.9 || [source:Datasets/LelukKoniecznyRoterman_15_first-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !ChewKedem || - || - || 3.8 || [source:Datasets/ChewKedem.tgz Download]
|| || first-first || 34 || 1.3 || [source:Datasets/ChewKedem_34_first-first.tgz Download]
|| || first-all || 54 || 2.0 || [source:Datasets/ChewKedem_54_first-all.tgz Download]
|| || all-all || 132 || 4.1 || [source:Datasets/ChewKedem_132_all-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !ProteinKinaseResource || - || - || 3.6 || [source:Datasets/ProteinKinaseResource.tgz Download]
|| || first-first || 45 || 2.4 || [source:Datasets/ProteinKinaseResource_45_first-first.tgz Download]
|| || first-all || 49 || 2.5 || [source:Datasets/ProteinKinaseResource_49_first-all.tgz Download]
|| || all-all || 106 || 4.0 || [source:Datasets/ProteinKinaseResource_106_all-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| Skolnick || - || - || 5.1 || [source:Datasets/Skolnick.tgz Download]
|| || first-first || 33 || 1.1 || [source:Datasets/Skolnick_33_first-first.tgz Download]
|| || first-all || 65 || 2.1 || [source:Datasets/Skolnick_65_first-all.tgz Download]
|| || all-all || 179 || 5.9 || [source:Datasets/Skolnick_179_all-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !RostSander || - || - || 7.4 || [source:Datasets/RostSander.tgz Download]
|| || RS126 || 126 || 4.3 || [source:Datasets/RostSander_RS126.tgz Download]
|| || first-first || 119 || 4.4 || [source:Datasets/RostSander_119_first-first.tgz Download]
|| || first-all || 212 || 7.6 || [source:Datasets/RostSander_212_first-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !KinjoHorimotoNishikawa || - || - || 98 || [source:Datasets/KinjoHorimotoNishikawa.tgz Download]
|| || first-first || 1012 || 46 || [source:Datasets/KinjoHorimotoNishikawa_1012_first-first.tgz Download]
|| || first-all || 2013 || 88 || [source:Datasets/KinjoHorimotoNishikawa_2013_first-all.tgz Download]
|| '''!DatasetName''' || '''Description''' || '''Extraction type / time''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| Shah1 || Randomly selected 1000 proteins from PDB || - || - || 114 || [source:Datasets/Shah.tgz Download]
|| || || first-first || 1000 || 41 || [source:Datasets/Shah_1000_first-first.tgz Download]
|| || || first-all || 1943 || 80 || [source:Datasets/Shah_1943_first-all.tgz Download]
|| || || all-all || 4007 || 124 || [source:Datasets/Shah_4007_all-all.tgz Download]
|| '''!DatasetName''' || '''Description''' || '''Extraction type / time''' || '''!NumberOfChains''' || '''Size in GB''' MB''' || '''Link'''
|| Shah2 || Downloaded from PDB web site on 10/04/2007 || - || - || compressed: 1.1, un-compressed: 4.8 ||
|| || with criteria "Remove similar sequences at 30% identity" || first-first || 7307 || ||
|| || || first-all || || ||
|| || || all-all || || ||
|| '''!DatasetName''' || '''Description''' || '''Extraction type / time''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| Shah3 || PDB_SELECT25 as of October2007 || - || - || compressed: 0.746 , un-compressed: 3.4 ||
|| || it's a six monthly updated list of- || first-first || 3560 || ||
|| || non-redundent protein structures || first-all || || ||
|| || Mostly used in Protein Structure Prediction || all-all || || ||
== How to label datasets ==
Datasets are labeled in the following way:
{{{
DatasetName.tgz
DatasetName_Literature.tgz
DatasetName_NumberOfChains_Extraction.tgz
}}}
whereas:
|| '''!DatasetName''' || Authors or web that site proposed the dataset
|| '''Literature''' || Special name given in the literatur
|| '''!NumberOfChains''' || Number of (extracted) chains contained in the dataset
|| '''Extraction''' || Extraction of the models/chains: ''none'', ''first-first'', ''first-all'', ''all-all''
== Available Datasets ==
The following datasets are available from the repository (special privileges required!):
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !LelukKoniecznyRoterman || - || - || 3.5 || [source:Datasets/LelukKoniecznyRoterman.tgz Download]
|| || first-first || 6 || 0.4 || [source:Datasets/LelukKoniecznyRoterman_6_first-first.tgz Download]
|| || first-all || 15 || 0.9 || [source:Datasets/LelukKoniecznyRoterman_15_first-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !ChewKedem || - || - || 3.8 || [source:Datasets/ChewKedem.tgz Download]
|| || first-first || 34 || 1.3 || [source:Datasets/ChewKedem_34_first-first.tgz Download]
|| || first-all || 54 || 2.0 || [source:Datasets/ChewKedem_54_first-all.tgz Download]
|| || all-all || 132 || 4.1 || [source:Datasets/ChewKedem_132_all-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !ProteinKinaseResource || - || - || 3.6 || [source:Datasets/ProteinKinaseResource.tgz Download]
|| || first-first || 45 || 2.4 || [source:Datasets/ProteinKinaseResource_45_first-first.tgz Download]
|| || first-all || 49 || 2.5 || [source:Datasets/ProteinKinaseResource_49_first-all.tgz Download]
|| || all-all || 106 || 4.0 || [source:Datasets/ProteinKinaseResource_106_all-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| Skolnick || - || - || 5.1 || [source:Datasets/Skolnick.tgz Download]
|| || first-first || 33 || 1.1 || [source:Datasets/Skolnick_33_first-first.tgz Download]
|| || first-all || 65 || 2.1 || [source:Datasets/Skolnick_65_first-all.tgz Download]
|| || all-all || 179 || 5.9 || [source:Datasets/Skolnick_179_all-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !RostSander || - || - || 7.4 || [source:Datasets/RostSander.tgz Download]
|| || RS126 || 126 || 4.3 || [source:Datasets/RostSander_RS126.tgz Download]
|| || first-first || 119 || 4.4 || [source:Datasets/RostSander_119_first-first.tgz Download]
|| || first-all || 212 || 7.6 || [source:Datasets/RostSander_212_first-all.tgz Download]
|| '''!DatasetName''' || '''Extraction''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| !KinjoHorimotoNishikawa || - || - || 98 || [source:Datasets/KinjoHorimotoNishikawa.tgz Download]
|| || first-first || 1012 || 46 || [source:Datasets/KinjoHorimotoNishikawa_1012_first-first.tgz Download]
|| || first-all || 2013 || 88 || [source:Datasets/KinjoHorimotoNishikawa_2013_first-all.tgz Download]
|| '''!DatasetName''' || '''Description''' || '''Extraction type / time''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| Shah1 || Randomly selected 1000 proteins from PDB || - || - || 114 || [source:Datasets/Shah.tgz Download]
|| || || first-first || 1000 || 41 || [source:Datasets/Shah_1000_first-first.tgz Download]
|| || || first-all || 1943 || 80 || [source:Datasets/Shah_1943_first-all.tgz Download]
|| || || all-all || 4007 || 124 || [source:Datasets/Shah_4007_all-all.tgz Download]
|| '''!DatasetName''' || '''Description''' || '''Extraction type / time''' || '''!NumberOfChains''' || '''Size in GB''' MB''' || '''Link'''
|| Shah2 || Downloaded from PDB web site on 10/04/2007 || - || - || compressed: 1.1, un-compressed: 4.8 ||
|| || with criteria "Remove similar sequences at 30% identity" || first-first || 7307 || ||
|| || || first-all || || ||
|| || || all-all || || ||
|| '''!DatasetName''' || '''Description''' || '''Extraction type / time''' || '''!NumberOfChains''' || '''Size in MB''' || '''Link'''
|| Shah3 || PDB_SELECT25 as of October2007 || - || - || compressed: 0.746 , un-compressed: 3.4 ||
|| || it's a six monthly updated list of- || first-first || 3560 || ||
|| || non-redundent protein structures || first-all || || ||
|| || Mostly used in Protein Structure Prediction || all-all || || ||