DataStandardisation » History » Version 9

Anonymous, 10/09/2007 06:50 PM

1 7 Anonymous
= The ProCKSI "stand-alone" ''Core'' Application =
2 1 Anonymous
3 4 Anonymous
ProCKSI utilises a variety of similarity comparison methods (e.g. USM, MaxCMO, TMaling, ...) producing different similarity measures  (e.g. Zscore, TMscore, RMSD, ...) each. Each of the comparison methods produces output with different formats and additional content such as alignments, rotation matrix, etc. Some of them produce just one output file, others a set of linked HTML files.
4 1 Anonymous
5 7 Anonymous
Additionally, there are pre- and post-processing methods, e.g. preparation of contact maps from structures, or clustering of similarity matrices, that have their own input parameters and produce different output.
6 1 Anonymous
7 7 Anonymous
The goal can be described as follows: [[br]]
8 7 Anonymous
Allow the  ProCKSI "stand-alone" ''core'' application to 
9 9 Anonymous
 1. be developped independently from the ProCKSI ''framework'' or ''server'' (incl. webserver/database), and allow collaborators to seamlessly integrate their own methods. One might even think of making the code publically available and allow the community to improve it.
10 7 Anonymous
 2. run on any (Linux) machine that has the necessary methods installed. This can be either a collaborator's desktop machine, the ProCKSI cluster, the University Cluster, or even a machine on the Grid.
11 7 Anonymous
 3. further distribute the given task using local machines, Grid and Web Service technology in order to obtain their results without the need to schedule everything from one central point (''Orchestration'' vs. ''Choreography'').
12 7 Anonymous
 4. return its results in a standardised format that can easily be integrated into the ProCKSI database and thus resused be the ProCKSI framework and all other experiments "on the command line".
13 7 Anonymous
14 7 Anonymous
15 7 Anonymous
= Standardising Results with XML =  
16 1 Anonymous
17 8 Anonymous
The principle API for the ProCKSI "stand-alone" ''core'' application can be visualised as follows:
18 8 Anonymous
19 8 Anonymous
[[Image(ProCKSI-core-API.png)]]
20 8 Anonymous
21 8 Anonymous
One file in XML format is fed into the ProCKSI "stand-alone" ''core'' application, describing the entire dataset, all tasks and the necessary input parameters. At the end, one output file in XML format is written, which might link to further external files in specific format (e.g. PDF, CM, ...) if necessary.
22 8 Anonymous
23 8 Anonymous
24 1 Anonymous
== Input Specifications ==
25 8 Anonymous
This is the latest proposal for the XML input file:
26 1 Anonymous
27 9 Anonymous
In principle, all possible results from the requested methods are returned. All unnecessary results can be requested to be excluded. A log file is generated if a file name is provided.
28 9 Anonymous
29 9 Anonymous
Optional tags: '''exclude''' (measure, result), '''log'''[[BR]]
30 6 Paweł Widera
Optional attributes: '''description'''
31 1 Anonymous
32 1 Anonymous
{{{
33 5 Paweł Widera
<job id="ID" description="TEXT">
34 5 Paweł Widera
  <log filename="FILENAME" />
35 5 Paweł Widera
  
36 5 Paweł Widera
  <input type="structure|tree|contact map|similarity matrix">
37 5 Paweł Widera
    <item id="ID" label="TEXT" filename="FILENAME" />
38 6 Paweł Widera
    :::
39 5 Paweł Widera
    <item id="ID" label="TEXT" filename="FILENAME" />
40 5 Paweł Widera
  </input>  
41 1 Anonymous
42 5 Paweł Widera
  <method id="ID" name="TEXT">
43 5 Paweł Widera
    <param name="TEXT">VALUE</param>
44 6 Paweł Widera
    :::
45 5 Paweł Widera
    <param name="TEXT">VALUE</param>
46 1 Anonymous
47 5 Paweł Widera
    <exclude>
48 5 Paweł Widera
      <measure>NAME</measure>
49 6 Paweł Widera
      :::
50 5 Paweł Widera
      <measure>NAME</measure>
51 5 Paweł Widera
      
52 6 Paweł Widera
      <result>NAME</result>
53 5 Paweł Widera
      :::
54 5 Paweł Widera
      <result>NAME</result>
55 5 Paweł Widera
    </exclude>
56 6 Paweł Widera
  </method>
57 6 Paweł Widera
  :::
58 1 Anonymous
  <method ...>
59 1 Anonymous
    ...
60 1 Anonymous
  </method>
61 1 Anonymous
</job>
62 1 Anonymous
}}}
63 1 Anonymous
64 1 Anonymous
The data used as an input could be protein structures, similarity trees, contact maps or similarity matrices. All specified methods should be able to operate on given data files. This dependency could be verified automatically using XML Schema.
65 1 Anonymous
66 9 Anonymous
Comments dxb:
67 9 Anonymous
 * The ''<job>'' tag coulg be renamed into ''<packages>''
68 9 Anonymous
 * The ''<input>'' tag could be renamed into ''<dataset>'' or ''<data_set>'', in order to be consistent with the [wiki:DataStorage Extended Database Design].
69 9 Anonymous
 * The set of all ''<method>'' tags could be grouped together into ''<parameterset>'' or ''<parameter_set>'', in order to be consistent with the [wiki:DataStorage Extended Database Design]. 
70 9 Anonymous
This would allow to resuse the entire input description as it is and just extend it by the results.
71 9 Anonymous
72 9 Anonymous
73 9 Anonymous
74 1 Anonymous
== Output Specifications ==
75 8 Anonymous
This is the latest proposal for the XML output file:
76 6 Paweł Widera
77 6 Paweł Widera
Optional tags: '''log''', '''message''', '''similarity''' (used only if output is a ''comparison'') [[BR]]
78 6 Paweł Widera
Optional attributes: '''description''', '''node''', '''start''', '''end''', '''ref_id''' (only if output type is ''composition''), '''ref_id2''' (only if output type is not ''comparison'')
79 6 Paweł Widera
80 6 Paweł Widera
{{{
81 6 Paweł Widera
<job id="ID" description="TEXT" node="TEXT" start="TIME" end="TIME">
82 6 Paweł Widera
  <log filename="FILENAME" />
83 6 Paweł Widera
84 6 Paweł Widera
  <message type="error|warning|info">TEXT</message>
85 6 Paweł Widera
  :::
86 6 Paweł Widera
  <message type="error|warning|info">TEXT</message>
87 6 Paweł Widera
  
88 6 Paweł Widera
  <input type="structure|tree|contact map|similarity matrix">
89 6 Paweł Widera
    <item id="ID" label="TEXT" filename="FILENAME" />
90 6 Paweł Widera
    :::
91 6 Paweł Widera
    <item id="ID" label="TEXT" filename="FILENAME" />
92 6 Paweł Widera
  </input>  
93 6 Paweł Widera
94 6 Paweł Widera
  <parameters>
95 6 Paweł Widera
    <method id="ID" name="NAME">
96 6 Paweł Widera
      <parameter name="TEXT">VALUE</parameter>
97 6 Paweł Widera
      :::
98 6 Paweł Widera
      <parameter name="TEXT">VALUE</parameter>
99 6 Paweł Widera
    </method>
100 6 Paweł Widera
    :::
101 6 Paweł Widera
    <method ...>
102 6 Paweł Widera
      ...
103 6 Paweł Widera
    </method>
104 6 Paweł Widera
  </parameters>
105 6 Paweł Widera
106 6 Paweł Widera
  <output type="transformation|comparison|composition" ref_id="" ref_id2=" ">
107 6 Paweł Widera
    <method id="ID">
108 6 Paweł Widera
      <message type="error|warning|info">TEXT</message>
109 6 Paweł Widera
      :::
110 6 Paweł Widera
      <message type="error|warning|info">TEXT</message>
111 6 Paweł Widera
112 6 Paweł Widera
      <similarity measure="NAME">VALUE</similarity>
113 6 Paweł Widera
      :::
114 6 Paweł Widera
      <similarity measure="NAME">VALUE</similarity>
115 6 Paweł Widera
116 6 Paweł Widera
      <file type="TEXT" label="TEXT" name="FILENAME" />
117 6 Paweł Widera
      :::
118 6 Paweł Widera
      <file type="TEXT" label="TEXT" name="FILENAME" />
119 6 Paweł Widera
    <method>
120 6 Paweł Widera
  </output>
121 6 Paweł Widera
  :::
122 6 Paweł Widera
  <output ...>
123 6 Paweł Widera
    ...
124 6 Paweł Widera
  </output>
125 6 Paweł Widera
</job>
126 6 Paweł Widera
}}}
127 1 Anonymous
128 1 Anonymous
Message being an error, warning or additional information could be passed on a global or a method level. Input data and parameters defined in the input file could be repeated in the output if needed (self-contained output). Output could be a 1->1 transformation (e.g. structure -> contact map), a 2->1 comparison (e.g. 2*structure -> similarity measure) or N->1 composition (e.g. N*tree -> total tree or N*similarity matrix -> consensus similarity matrix). The results other than similarity measures for a pair of proteins are stored in external files and are just referenced from the XML file.
129 1 Anonymous
130 1 Anonymous
The alignment data could be described in the XML file, as there is no single format used by all programs. This yet to be decided.
131 9 Anonymous
132 9 Anonymous
Comments dxb:
133 9 Anonymous
 * The ''<input>'' tag could be renamed into ''<dataset>'' or ''<data_set>'', in order to be consistent with the [wiki:DataStorage Extended Database Design].
134 9 Anonymous
 - The ''<parameters>'' could be renamed into ''<parameterset>'' or ''<parameter_set>'', in order to be consistent with the [wiki:DataStorage Extended Database Design].
135 9 Anonymous
 - The ''<output>'' tag could be renamed into ''<results>'', ''<resultset>'' or ''<result_set>''
136 9 Anonymous
This would allow to take the input description as it is and extend it by the results.