Datasets¶
How to label datasets¶
Datasets are labeled in the following way:
[[DatasetName]].tgz
[[DatasetName]]_Literature.tgz
[[DatasetName]]_NumberOfChains_Extraction.tgz
whereas:
|
DatasetName |
|
Authors or web that site proposed the dataset |
|
|
Literature |
|
Special name given in the literatur |
|
|
NumberOfChains |
|
Number of (extracted) chains contained in the dataset |
|
|
Extraction |
|
Extraction of the models/chains: none, first-first, first-all, all-all |
|
Available Datasets¶
The following datasets are available from the repository (special privileges required!):
|
DatasetName |
|
Extraction |
|
!NumberOfChains |
|
Size in MB |
|
Link |
|
|
LelukKoniecznyRoterman |
|
- |
|
- |
|
3.5 |
|
Download |
|
|
|
|
first-first |
|
6 |
|
0.4 |
|
Download |
|
|
|
|
first-all |
|
15 |
|
0.9 |
|
Download |
|
|
DatasetName |
|
Extraction |
|
!NumberOfChains |
|
Size in MB |
|
Link |
|
|
ChewKedem |
|
- |
|
- |
|
3.8 |
|
Download |
|
|
|
|
first-first |
|
34 |
|
1.3 |
|
Download |
|
|
|
|
first-all |
|
54 |
|
2.0 |
|
Download |
|
|
|
|
all-all |
|
132 |
|
4.1 |
|
Download |
|
|
DatasetName |
|
Extraction |
|
!NumberOfChains |
|
Size in MB |
|
Link |
|
|
ProteinKinaseResource |
|
- |
|
- |
|
3.6 |
|
Download |
|
|
|
|
first-first |
|
45 |
|
2.4 |
|
Download |
|
|
|
|
first-all |
|
49 |
|
2.5 |
|
Download |
|
|
|
|
all-all |
|
106 |
|
4.0 |
|
Download |
|
|
DatasetName |
|
Extraction |
|
!NumberOfChains |
|
Size in MB |
|
Link |
|
|
RostSander |
|
- |
|
- |
|
7.4 |
|
Download |
|
|
|
|
RS126 |
|
126 |
|
4.3 |
|
Download |
|
|
|
|
first-first |
|
119 |
|
4.4 |
|
Download |
|
|
|
|
first-all |
|
212 |
|
7.6 |
|
Download |
|
|
DatasetName |
|
Extraction |
|
!NumberOfChains |
|
Size in MB |
|
Link |
|
|
KinjoHorimotoNishikawa |
|
- |
|
- |
|
98 |
|
Download |
|
|
|
|
first-first |
|
1012 |
|
46 |
|
Download |
|
|
|
|
first-all |
|
2013 |
|
88 |
|
Download |
|
|
DatasetName |
|
Description |
|
Extraction |
|
!NumberOfChains |
|
Size in MB |
|
Link |
|
|
Shah |
|
Randomly selected 1000 proteins from PDB |
|
- |
|
- |
|
114 |
|
Download
|
|
|
|
|
|
first-first |
|
1000 |
|
41 |
|
Download |
|
|
|
|
|
|
first-all |
|
1943 |
|
80 |
|
Download |
|
|
|
|
|
|
all-all |
|
4007 |
|
124 |
|
Download |
|
|
DatasetName |
|
Description |
|
Extraction |
|
!NumberOfChains |
|
Size in GB |
|
Link |
|
|
PDB_SELECT30_04-2008 |
|
Downloaded from PDB web site on 10/04/2008 |
|
- |
|
- |
|
1.1, ucmp*: 4.8 |
|
Download |
|
|
|
|
|
with criteria "Remove similar sequences at 30% identity" |
|
first-first |
|
7183 |
|
0.285, ucmp*: 1.2 |
|
Download |
|
|
|
|
|
|
first-all |
|
14651 |
|
0.60 , ucmp:*2.7* |
|
Download |
|
|
|
|
|
|
all-all |
|
43025 |
|
1.2 , ucmp*: 5.5 |
|
Download |
|
|
DatasetName |
|
Description |
|
Extraction |
|
!NumberOfChains |
|
Size in GB |
|
Link |
|
|
PDB_SELECT25_10-2007 |
|
PDB_SELECT25 as of October2007 |
|
- |
|
- |
|
0.746, ucmp*: 3.4 |
|
Download |
|
|
|
|
it's a six monthly updated list of- |
|
first-first |
|
3464 |
|
0.12, ucmp*: 0.54, |
|
Download |
|
|
|
|
non-redundent protein structures |
|
first-all |
|
8581 |
|
0.30, ucmp*: 1.4 |
|
Download |
|
|
|
|
Mostly used in Protein Structure Prediction |
|
all-all |
|
31288 |
|
0.854 , ucmp*: 4.1 |
|
Download |
|
*ucmp: uncompressed