DataStorage » History » Version 8
Version 7 (Anonymous, 10/05/2007 11:08 PM) → Version 8/16 (Anonymous, 10/05/2007 11:13 PM)
= Data Storage =
This page describes the design of the database that is/will be used in order to store all necessary pieces information that are obtained from the "stand-alone" ProCKSI ''core'' application (see [wiki:DataStandardisation]).
== Database Design for the (static) Protein Multiverse ==
[[Image(ProteinMultiverseDataBase.png)]]
'''Explanation '''Explenation of the database design''':
* There are multiple similarity comparison ''Methods'': e.g. USM, MaxCMO, !DaliLite, DaliLite, ...
* There are multiple similarity ''Measures'': e.g. Z-score, TM-score, Number of Alignments, ...
* Some ''Methods'' produce ''Measures'' with the same name, but not necessarily the same meaning: e.g. !DaliLite/Z, DaliLite/Z, TMalign/Z, ...[[br]]
Thus, a ''!MethodMeasures'' relation is necessary.
* Each ''Method'' can have multiple (different) ''Parameters'': e.g. USM/Compressor, USM/Equation, ...
* Each ''Method'' can have multiple (different) ''!ParameterOptions'': USM/Compressor/bzip, USM/Compressor/gzip, ...
* A "!ParameterSet" is used to calculate the ''Similarity'' of ''!StructurePairs''. It is a collection of specific ''!ParameterSetOptions''. If a ''Method'' does not use any parameters, it is not included in the ''!ParameterSet'', but accessible via the ''!MethodMeasure'' relation.
* The ''!StructurePairs'' relation holds all possible combinations of ''Structures'', and a link to a further ''Results'' file in XML format. This file may contain results for multiple ''!StructurePairs'', e.g. alignments, matrices, etc.
* Each ''Structure'' is uniquely determined by its PDB code, model and chain. (Domains are not taken into accout yet.) The location of the PDB file is given and a link to a further ''Results'' file in XML format. This file may contain additional information for multiple ''Structures'', e.g. sequence, secondary structure, experimental resolution, ...
* Each ''Structure'' is extended by further classifiction information from ''CATH'' and ''SCOP''.
== Extended Database Design for the (static) Protein Multiverse ==
[[Image(ProteinMultiverseDataBaseExt4.png)]]
This proposal for an extended database design for the (static) Protein Multiverse aims to include not only ''Comparisons'' but also ''Transformations'' and ''Compositions'' (following the latest development of the I/O specificaitions for the ProCKSI "stand-alone" ''core'' application):
* A ''Transformation'' is a process that derives ONE (main) ''Result'' from ONE single input file.[[br]]
__Example__: The transformation of ''Structure'', ''Tree'', ''!SimilarityMatrix'', etc., using a certain ''Method'' with a certain ''!ParameterSet'', produces a contact map, a tree, ...
* A ''Comparison'' is a process that derives ONE (main) ''Result'' from TWO input files. [[br]]
__Example__ The comparison of ''Structures'', ''Trees'', etc., using a ''Method'' with a certain ''!ParameterSet'', produces a similarity value and an alignment
* A ''Composition'' is a process that derive ONE (main) ''Result'' from SEVERAL input files that are grouped together into a ''DataSets''. [[br]]
__Example__ The composition of ''!SimilarityMatrices'', ''Trees'', using a ''Method'' with a certain ''!ParameterSet'', produces a consensus similarity matrix, a consensus tree, ...
This design does not allow ''Datasets'' to comprise other files than ''Structures'' although some of the ''Results'' need to be grouped into form a ''Dataset'', too.[[br]] ''Dataset''.[[br]]
__Example__ Contact maps that have been produces by a ''Transformation'' of ''Structures'' and that are available from within the ''Results'' need to form a ''Dataset'' in order to act as input for the ''Comparisons'' with the USM or MaxCMO ''Methods''.
== Extended Database Design for (dynamic) Management of Experiments (ProCKSI) ==
This has not been modelled yet, but the database for the (static) Protein Multiverse was designed with the ProCKSI integration in mind.
Some remarks:
* ''Experiments'' (formerly ''Requests'') apply "Methods" to "!DataSets" with a certain "!ParameterSet''.
* ''Packages'' (formerly ''Jobs'') deal with a subset of a "!DataSet" and a subset of the requested ''Methods'', partitioning the the 3D problem space, and are calculated using the ProCKSI's "stand-alone" core application "in one go". If they are sent to a queuing system, they become a ''Job'' there.
* It has to be discussed if there is still the need of a ''Tasks'' relation in the database, which have always been rather ''!RequestMethods''.
This page describes the design of the database that is/will be used in order to store all necessary pieces information that are obtained from the "stand-alone" ProCKSI ''core'' application (see [wiki:DataStandardisation]).
== Database Design for the (static) Protein Multiverse ==
[[Image(ProteinMultiverseDataBase.png)]]
'''Explanation '''Explenation of the database design''':
* There are multiple similarity comparison ''Methods'': e.g. USM, MaxCMO, !DaliLite, DaliLite, ...
* There are multiple similarity ''Measures'': e.g. Z-score, TM-score, Number of Alignments, ...
* Some ''Methods'' produce ''Measures'' with the same name, but not necessarily the same meaning: e.g. !DaliLite/Z, DaliLite/Z, TMalign/Z, ...[[br]]
Thus, a ''!MethodMeasures'' relation is necessary.
* Each ''Method'' can have multiple (different) ''Parameters'': e.g. USM/Compressor, USM/Equation, ...
* Each ''Method'' can have multiple (different) ''!ParameterOptions'': USM/Compressor/bzip, USM/Compressor/gzip, ...
* A "!ParameterSet" is used to calculate the ''Similarity'' of ''!StructurePairs''. It is a collection of specific ''!ParameterSetOptions''. If a ''Method'' does not use any parameters, it is not included in the ''!ParameterSet'', but accessible via the ''!MethodMeasure'' relation.
* The ''!StructurePairs'' relation holds all possible combinations of ''Structures'', and a link to a further ''Results'' file in XML format. This file may contain results for multiple ''!StructurePairs'', e.g. alignments, matrices, etc.
* Each ''Structure'' is uniquely determined by its PDB code, model and chain. (Domains are not taken into accout yet.) The location of the PDB file is given and a link to a further ''Results'' file in XML format. This file may contain additional information for multiple ''Structures'', e.g. sequence, secondary structure, experimental resolution, ...
* Each ''Structure'' is extended by further classifiction information from ''CATH'' and ''SCOP''.
== Extended Database Design for the (static) Protein Multiverse ==
[[Image(ProteinMultiverseDataBaseExt4.png)]]
This proposal for an extended database design for the (static) Protein Multiverse aims to include not only ''Comparisons'' but also ''Transformations'' and ''Compositions'' (following the latest development of the I/O specificaitions for the ProCKSI "stand-alone" ''core'' application):
* A ''Transformation'' is a process that derives ONE (main) ''Result'' from ONE single input file.[[br]]
__Example__: The transformation of ''Structure'', ''Tree'', ''!SimilarityMatrix'', etc., using a certain ''Method'' with a certain ''!ParameterSet'', produces a contact map, a tree, ...
* A ''Comparison'' is a process that derives ONE (main) ''Result'' from TWO input files. [[br]]
__Example__ The comparison of ''Structures'', ''Trees'', etc., using a ''Method'' with a certain ''!ParameterSet'', produces a similarity value and an alignment
* A ''Composition'' is a process that derive ONE (main) ''Result'' from SEVERAL input files that are grouped together into a ''DataSets''. [[br]]
__Example__ The composition of ''!SimilarityMatrices'', ''Trees'', using a ''Method'' with a certain ''!ParameterSet'', produces a consensus similarity matrix, a consensus tree, ...
This design does not allow ''Datasets'' to comprise other files than ''Structures'' although some of the ''Results'' need to be grouped into form a ''Dataset'', too.[[br]] ''Dataset''.[[br]]
__Example__ Contact maps that have been produces by a ''Transformation'' of ''Structures'' and that are available from within the ''Results'' need to form a ''Dataset'' in order to act as input for the ''Comparisons'' with the USM or MaxCMO ''Methods''.
== Extended Database Design for (dynamic) Management of Experiments (ProCKSI) ==
This has not been modelled yet, but the database for the (static) Protein Multiverse was designed with the ProCKSI integration in mind.
Some remarks:
* ''Experiments'' (formerly ''Requests'') apply "Methods" to "!DataSets" with a certain "!ParameterSet''.
* ''Packages'' (formerly ''Jobs'') deal with a subset of a "!DataSet" and a subset of the requested ''Methods'', partitioning the the 3D problem space, and are calculated using the ProCKSI's "stand-alone" core application "in one go". If they are sent to a queuing system, they become a ''Job'' there.
* It has to be discussed if there is still the need of a ''Tasks'' relation in the database, which have always been rather ''!RequestMethods''.