DataStorage » History » Version 11
Version 10 (Anonymous, 10/06/2007 09:54 AM) → Version 11/16 (Anonymous, 10/06/2007 10:02 AM)
= Data Storage =
This page describes the design of the database that is/will be used in order to store all necessary pieces information that are obtained from the "stand-alone" ProCKSI ''core'' application (see [wiki:DataStandardisation]).
== Database Design for the (static) Protein Multiverse ==
[[Image(ProteinMultiverseDataBase.png)]]
'''Explanation of the database design''':
* There are multiple similarity comparison ''Methods'': e.g. USM, MaxCMO, !DaliLite, ...
* There are multiple similarity ''Measures'': e.g. Z-score, TM-score, Number of Alignments, ...
* Some different ''Methods'' produce ''Measures'' with the same name, but not necessarily the same meaning: e.g. !DaliLite/Z, TMalign/Z, ...[[br]]
Thus, a ''!MethodMeasures'' relation is necessary.
* Each ''Method'' can have multiple (different) ''Parameters'': e.g. USM/Compressor, USM/Equation, ...
* Each ''Method'' can have multiple (different) ''!ParameterOptions'': USM/Compressor/bzip, USM/Compressor/gzip, ...
* A "!ParameterSet" is used to calculate the ''Similarity'' of ''!StructurePairs''. It is a collection of specific ''!ParameterSetOptions''. [[br]]
If a ''Method'' does not use any parameters, it is not included in the ''!ParameterSet'', but accessible via the ''!MethodMeasure'' relation. [[br]]
Alternatively, such methods could have a ''Parameter'' "none" with an ''!ParameterOption'' ''ParameterOption'' "none" so that the ''!ParameterSet'' would always include all possible ''Methods''. [[br]]
It can be argued that there is another strong entity ''Options'' needed that holds only possible values and their description, e.g. "CoM" and "Centre of Mass". Similar to "!MethodMeasures", the ''!ParameterOption'' would only hold combinations of ''Parameters'' with ''Options''.
* The ''!StructurePairs'' relation holds all possible combinations of ''Structures'', and a link to a further ''Results'' file in XML format. This file may contain results for multiple ''!StructurePairs'', e.g. alignments, matrices, etc.
* Each ''Structure'' is uniquely determined by its PDB code, model and chain. (Domains are not taken into accout yet.) The location of the PDB file is given and a link to a further ''Results'' file in XML format. This file may contain additional information for multiple ''Structures'', e.g. sequence, secondary structure, experimental resolution, ...
* Each ''Structure'' is extended by further classifiction information from ''CATH'' and ''SCOP''.
* It can be further argued that there is no list that describes what further information can be found in the external files.
== Extended Database Design for the (static) Protein Multiverse ==
[[Image(ProteinMultiverseDataBaseExt4.png)]]
This proposal for an extended database design for the (static) Protein Multiverse aims to include not only ''Comparisons'' but also ''Transformations'' and ''Compositions'' (following the latest development of the I/O specificaitions for the ProCKSI "stand-alone" ''core'' application):
* A ''Transformation'' is a process that derives ONE (main) ''Result'' from ONE single input file.[[br]]
__Example__: The transformation of ''Structure'', ''Tree'', ''!SimilarityMatrix'', etc., using a certain ''Method'' with a certain ''!ParameterSet'', produces a contact map, a tree, ...
* A ''Comparison'' is a process that derives ONE (main) ''Result'' from TWO input files. [[br]]
__Example__ The comparison of ''Structures'', ''Trees'', etc., using a ''Method'' with a certain ''!ParameterSet'', produces a similarity value and an alignment
* A ''Composition'' is a process that derive ONE (main) ''Result'' from SEVERAL input files that are grouped together into ''DataSets''. [[br]]
__Example__ The composition of ''!SimilarityMatrices'', ''Trees'', using a ''Method'' with a certain ''!ParameterSet'', produces a consensus similarity matrix, a consensus tree, ...
This design does not allow ''Datasets'' to comprise other files than ''Structures'' although some of the ''Results'' need to be grouped into a ''Dataset'', too.[[br]]
__Example__ Contact maps that have been produces by a ''Transformation'' of ''Structures'' and that are available from within the ''Results'' need to form a ''Dataset'' in order to act as input for the ''Comparisons'' with the USM or MaxCMO ''Methods''.
== Extended Database Design for (dynamic) Management of Experiments (ProCKSI) ==
This has not been modelled yet, but the database for the (static) Protein Multiverse was designed with the ProCKSI integration in mind.
Some remarks:
* ''Experiments'' (formerly ''Requests'') apply "Methods" to "!DataSets" with a certain "!ParameterSet''.
* ''Packages'' (formerly ''Jobs'') deal with a subset of a "!DataSet" and a subset of the requested ''Methods'', partitioning the the 3D problem space, and are calculated using the ProCKSI's "stand-alone" core application "in one go". If they are sent to a queuing system, they become a ''Job'' there.
* It has to be discussed if there is still the need of a ''Tasks'' relation in the database, which have always been rather ''!RequestMethods''.
This page describes the design of the database that is/will be used in order to store all necessary pieces information that are obtained from the "stand-alone" ProCKSI ''core'' application (see [wiki:DataStandardisation]).
== Database Design for the (static) Protein Multiverse ==
[[Image(ProteinMultiverseDataBase.png)]]
'''Explanation of the database design''':
* There are multiple similarity comparison ''Methods'': e.g. USM, MaxCMO, !DaliLite, ...
* There are multiple similarity ''Measures'': e.g. Z-score, TM-score, Number of Alignments, ...
* Some different ''Methods'' produce ''Measures'' with the same name, but not necessarily the same meaning: e.g. !DaliLite/Z, TMalign/Z, ...[[br]]
Thus, a ''!MethodMeasures'' relation is necessary.
* Each ''Method'' can have multiple (different) ''Parameters'': e.g. USM/Compressor, USM/Equation, ...
* Each ''Method'' can have multiple (different) ''!ParameterOptions'': USM/Compressor/bzip, USM/Compressor/gzip, ...
* A "!ParameterSet" is used to calculate the ''Similarity'' of ''!StructurePairs''. It is a collection of specific ''!ParameterSetOptions''. [[br]]
If a ''Method'' does not use any parameters, it is not included in the ''!ParameterSet'', but accessible via the ''!MethodMeasure'' relation. [[br]]
Alternatively, such methods could have a ''Parameter'' "none" with an ''!ParameterOption'' ''ParameterOption'' "none" so that the ''!ParameterSet'' would always include all possible ''Methods''. [[br]]
It can be argued that there is another strong entity ''Options'' needed that holds only possible values and their description, e.g. "CoM" and "Centre of Mass". Similar to "!MethodMeasures", the ''!ParameterOption'' would only hold combinations of ''Parameters'' with ''Options''.
* The ''!StructurePairs'' relation holds all possible combinations of ''Structures'', and a link to a further ''Results'' file in XML format. This file may contain results for multiple ''!StructurePairs'', e.g. alignments, matrices, etc.
* Each ''Structure'' is uniquely determined by its PDB code, model and chain. (Domains are not taken into accout yet.) The location of the PDB file is given and a link to a further ''Results'' file in XML format. This file may contain additional information for multiple ''Structures'', e.g. sequence, secondary structure, experimental resolution, ...
* Each ''Structure'' is extended by further classifiction information from ''CATH'' and ''SCOP''.
* It can be further argued that there is no list that describes what further information can be found in the external files.
== Extended Database Design for the (static) Protein Multiverse ==
[[Image(ProteinMultiverseDataBaseExt4.png)]]
This proposal for an extended database design for the (static) Protein Multiverse aims to include not only ''Comparisons'' but also ''Transformations'' and ''Compositions'' (following the latest development of the I/O specificaitions for the ProCKSI "stand-alone" ''core'' application):
* A ''Transformation'' is a process that derives ONE (main) ''Result'' from ONE single input file.[[br]]
__Example__: The transformation of ''Structure'', ''Tree'', ''!SimilarityMatrix'', etc., using a certain ''Method'' with a certain ''!ParameterSet'', produces a contact map, a tree, ...
* A ''Comparison'' is a process that derives ONE (main) ''Result'' from TWO input files. [[br]]
__Example__ The comparison of ''Structures'', ''Trees'', etc., using a ''Method'' with a certain ''!ParameterSet'', produces a similarity value and an alignment
* A ''Composition'' is a process that derive ONE (main) ''Result'' from SEVERAL input files that are grouped together into ''DataSets''. [[br]]
__Example__ The composition of ''!SimilarityMatrices'', ''Trees'', using a ''Method'' with a certain ''!ParameterSet'', produces a consensus similarity matrix, a consensus tree, ...
This design does not allow ''Datasets'' to comprise other files than ''Structures'' although some of the ''Results'' need to be grouped into a ''Dataset'', too.[[br]]
__Example__ Contact maps that have been produces by a ''Transformation'' of ''Structures'' and that are available from within the ''Results'' need to form a ''Dataset'' in order to act as input for the ''Comparisons'' with the USM or MaxCMO ''Methods''.
== Extended Database Design for (dynamic) Management of Experiments (ProCKSI) ==
This has not been modelled yet, but the database for the (static) Protein Multiverse was designed with the ProCKSI integration in mind.
Some remarks:
* ''Experiments'' (formerly ''Requests'') apply "Methods" to "!DataSets" with a certain "!ParameterSet''.
* ''Packages'' (formerly ''Jobs'') deal with a subset of a "!DataSet" and a subset of the requested ''Methods'', partitioning the the 3D problem space, and are calculated using the ProCKSI's "stand-alone" core application "in one go". If they are sent to a queuing system, they become a ''Job'' there.
* It has to be discussed if there is still the need of a ''Tasks'' relation in the database, which have always been rather ''!RequestMethods''.