DataStorage » History » Version 13
Anonymous, 10/28/2007 01:04 AM
1 | 1 | Anonymous | = Data Storage = |
---|---|---|---|
2 | 1 | Anonymous | |
3 | 3 | Anonymous | This page describes the design of the database that is/will be used in order to store all necessary pieces information that are obtained from the "stand-alone" ProCKSI ''core'' application (see [wiki:DataStandardisation]). |
4 | 1 | Anonymous | |
5 | 3 | Anonymous | == Database Design for the (static) Protein Multiverse == |
6 | 13 | Anonymous | The database stores results from ''Transformations'', ''Comparisons'' and ''Compositions'': |
7 | 13 | Anonymous | * A ''Transformation'' is a process that derives ONE (main) ''Result'' from ONE single input file.[[br]] |
8 | 13 | Anonymous | __Example__: The transformation of ''Structure'', ''Tree'', ''!SimilarityMatrix'', etc., using a certain ''Method'' with a certain ''!ParameterSet'', produces a contact map, a tree, ... |
9 | 13 | Anonymous | * A ''Comparison'' is a process that derives ONE (main) ''Result'' from TWO input files. [[br]] |
10 | 13 | Anonymous | __Example__ The comparison of ''Structures'', ''Trees'', etc., using a ''Method'' with a certain ''!ParameterSet'', produces a similarity value and an alignment |
11 | 13 | Anonymous | * An ''Aggregation'' is a process that derive ONE (main) ''Result'' from SEVERAL input files that are grouped together into ''!DataSets''. [[br]] |
12 | 13 | Anonymous | __Example__ The aggregation of ''!SimilarityMatrices'', ''Trees'', using a ''Method'' with a certain ''!ParameterSet'', produces a consensus similarity matrix, a consensus tree, ... |
13 | 1 | Anonymous | |
14 | 13 | Anonymous | [[Image(ProteinMultiverseDataBase6.png)]] |
15 | 1 | Anonymous | |
16 | 13 | Anonymous | * There are multiple (similarity comparison) ''Methods'': e.g. USM, MaxCMO, !DaliLite, ... |
17 | 13 | Anonymous | * Each ''Method'' is executed with a specific ''!ParameterSet'', which is a combination of different ''Parameters'' with its values: e.g. MaxCMO/restarts/10, USM/compressor/bzip2, ... |
18 | 13 | Anonymous | * If a ''Method'' does not accept any ''Parameters'', the ''!ParameterSet'' does exist but is empty; e.g. !DaliLite, CE, ... |
19 | 13 | Anonymous | * Each ''Method'' procudes multiple similarity ''Measures'': e.g. !DaliLite/Z, FAST/Z, MaxCMO/Overlap, ... |
20 | 1 | Anonymous | |
21 | 13 | Anonymous | * Each ''Structure'' is uniquely determined by its PDB code, model and chain. (Domains are not taken into accout yet.) The location of the PDB file is given and a link to a further ''Container'' file that holds further information in XML format: e.g. sequence, secondary structure, experimental resolution, ... |
22 | 13 | Anonymous | * Each ''Structure'' is extended by further classifiction information from ''CATH'' and ''SCOP'' in separtate relations. |
23 | 13 | Anonymous | * Multiple ''Structures'' can be grouped together into ''!DataSets'', which are needed for ''Aggregations''. |
24 | 12 | Anonymous | |
25 | 13 | Anonymous | * The location of the ''Containers'' in which results are stored can be found in the ''Transformations'', ''Comparisons'', and ''Aggregations'' relations, respectively. |
26 | 13 | Anonymous | * Additionally, similarity values from ''Comparisons'' are stored directly in the database for quicker access. Alignments could be accessed in the same way, as soon as a standardised format has been defined. |
27 | 3 | Anonymous | |
28 | 13 | Anonymous | Note that this design does not allow ''Datasets'' to comprise other files than ''Structures'' although some of the ''Results'' need to be grouped into a ''!DataSet'', too.[[br]] |
29 | 13 | Anonymous | __Example__ Contact maps that have been produces by a ''Transformation'' of ''Structures'' and that are available from within the ''Containers'' need to form a ''!DataSet'' in order to act as input for the ''Comparisons'' with the USM or MaxCMO ''Methods''. |
30 | 3 | Anonymous | |
31 | 13 | Anonymous | === Storing Further Information and Results externally === |
32 | 1 | Anonymous | Similarity values are stored directly in the relational database. All further information regarding one structure (e.g. sequence, resolution, ...) or regarding a pair of structures (e.g. alignment, rotation/translation matrices, ...) are stored in external files.[[br]] |
33 | 8 | Anonymous | For storing further information for ''single structures'', there are several approaches: |
34 | 1 | Anonymous | * All information in one file: file too big |
35 | 5 | Anonymous | * All information in separate files grouped by the protein structure |
36 | 8 | Anonymous | |
37 | 5 | Anonymous | For storing further information for ''pairs of structures'', there are several approaches: |
38 | 5 | Anonymous | * All information in separate files grouped by methods: files too big |
39 | 5 | Anonymous | * All information in separate files grouped by pairs: too many files |
40 | 5 | Anonymous | * All information in separate files grouped by the the first structure: files with unbalanced sizes |
41 | 5 | Anonymous | * All information in separate files with fixed size:[[br]] |
42 | 5 | Anonymous | "Bin-packing" algorithm decides where to put new information, and opens a new "bin" if necessary. "Bins" must be balanced from time to time in order to provide a fast retrieval of information. |
43 | 5 | Anonymous | |
44 | 5 | Anonymous | |
45 | 5 | Anonymous | == Extended Database Design for (dynamic) Management of Experiments (ProCKSI) == |
46 | 5 | Anonymous | |
47 | 1 | Anonymous | This has not been modelled yet, but the database for the (static) Protein Multiverse was designed with the ProCKSI integration in mind. |
48 | 1 | Anonymous | |
49 | 1 | Anonymous | Some remarks: |
50 | 13 | Anonymous | * ''Experiments'' (formerly ''Requests'') apply ''Methods'' to''!DataSet'' with a certain ''!ParameterSet''. |
51 | 13 | Anonymous | * ''Packages'' (formerly ''Jobs'') deal with a subset of a ''!DataSet'' and a subset of the requested ''Methods'', partitioning the the 3D problem space, and are calculated using the ProCKSI's "stand-alone" core application "in one go". If they are sent to a queuing system, they become a ''Job'' there. |
52 | 1 | Anonymous | * It has to be discussed if there is still the need of a ''Tasks'' relation in the database, which have always been rather ''!RequestMethods''. |