DataStandardisation » History » Version 5

Version 4 (Anonymous, 07/30/2007 12:25 PM) → Version 5/14 (PaweÅ‚ Widera, 10/05/2007 08:02 PM)

= == Standardising Results with XML = ==

ProCKSI utilises a variety of similarity comparison methods (e.g. USM, MaxCMO, TMaling, ...) producing different similarity measures (e.g. Zscore, TMscore, RMSD, ...) each. Each of the comparison methods produces output with different formats and additional content such as alignments, rotation matrix, etc. Some of them produce just one output file, others a set of linked HTML files.

== Input ==

''Optional tags'': '''exclude''' (measure, result), '''log''' (no log
The similarity comparisons are performed on compute nodes while the database that shall contain all results is generated if not specified) [[BR]]
''Optional attributes'': '''description'''
located on the head node. Thus, all results must be parsed and transmitted (in a compressed form) from the compute to the head node before they can be made available in the database. I have devised a very general concept that parses the results from different methods in a first step directly on the compute node, translates them into a standardised format, which is parsed again on the head node and entered into the database.

Hence, I have designed the prototype of an XML document that shall be used to store the results of similarity comparisons of pairs of protein structures with different comparison methods.

{{{
<job id="ID" description="TEXT">
<log filename="FILENAME" />

<input type="structure|tree|contact map|similarity matrix">


&lt;SimilarityComparison&gt;

<item id="ID" label="TEXT" filename="FILENAME" /> &lt;Job&gt;
&lt;ID&gt; &lt;/ID&gt;
&lt;Label&gt; &lt;/Label&gt;

... &lt;/Job&gt;

&lt;Structures&gt;
&lt;Structure&gt;
&lt;ID&gt; &lt;/ID&gt;
&lt;Label&gt; &lt;/Label&gt;
&lt;/Structure&gt;
&lt;Structure&gt;
&lt;ID&gt; &lt;/ID&gt;
&lt;Label&gt; &lt;/Label&gt;
&lt;/Structure&gt;

<item id="ID" label="TEXT" filename="FILENAME" />
</input>

<method id="ID" name="TEXT">
<param name="TEXT">VALUE</param>
...
<param name="TEXT">VALUE</param>
&lt;Structures&gt;

<exclude>
<measure>NAME</measure>
...
<measure>NAME</measure>

<result>NAME</result>
...
<result>NAME</result>
&lt;Method&gt;
&lt;ID&gt; &lt;/ID&gt;
&lt;Name&gt; &lt;/Name&gt;

&lt;Messages&gt;
&lt;Errors&gt;
&lt;Error&gt; &lt;Error&gt;
&lt;/Errors&gt;
&lt;Warnings&gt;
&lt;Warning&gt; &lt;Warning&gt;
&lt;/Warnings&gt;
&lt;Notices&gt;
&lt;Notice&gt; &lt;Notice&gt;
&lt;/Notices&gt;
&lt;/Messages&gt;

&lt;Measures&gt;
&lt;Measure&gt;
&lt;Name&gt; &lt;/Name&gt;
&lt;Value&gt; &lt;/Value&gt;
&lt;/Measure&gt;
&lt;/Measures&gt;

&lt;Alignments&gt;
&lt;Alignment&gt; &lt;/Alignment&gt;
&lt;/Alignments&gt;

&lt;Matrices&gt;
&lt;Matrix&gt;
&lt;Name&gt; &lt;/Name&gt;
&lt;Content&gt; &lt;/Content&gt;
&lt;/Matrix&gt;
&lt;/Matrices&gt;

&lt;Files&gt;
&lt;File&gt;
&lt;Label&gt; &lt;/Label&gt;
&lt;Name&gt; &lt;/Name&gt;
&lt;/File&gt;
&lt;/Files&gt;

</exclude>
</method>
...
<method>
...
</method>
&lt;Method&gt;

&lt;/SimilarityComparison&gt;

</job>
}}}

The data used as an input could be protein structures, similarity trees, contact maps or similarity matrices. All specified methods should be able to operate on given data files. This dependency could be verified automatically using XML Schema.