DataManagement » History » Version 4

Anonymous, 07/10/2007 03:05 PM

1 1 Anonymous
= Data Management and Representation =
2 1 Anonymous
3 1 Anonymous
== Definitions ==
4 1 Anonymous
5 1 Anonymous
 * '''User''' (to be implemented in the future):
6 1 Anonymous
   * Represented by a unique email-address
7 1 Anonymous
   * Authentication to gain access to user data (requests, personalised settings)
8 1 Anonymous
   * Manage multiple requests
9 1 Anonymous
10 1 Anonymous
 * '''Request''':
11 1 Anonymous
   * Unique handle for the combination of a dataset, tasks, and request parameters
12 1 Anonymous
   * Request parameters: e.g. request description, settings for notification by email
13 1 Anonymous
14 1 Anonymous
 * '''Task''':
15 1 Anonymous
   * Something to be performed with the given dataset[[br]]
16 1 Anonymous
     e.g. calculation of PDB structure pictures (1D), or comparison of pairs of proteins with a given similarity method (2D)
17 1 Anonymous
   * Task parameters: e.g. parameters for each comparison method, output parameters for picture generation, ...
18 1 Anonymous
19 1 Anonymous
 * '''Job''':
20 1 Anonymous
   * Everything that lives in a queue[[br]]
21 1 Anonymous
     e.g. local queue (ProCKSI cluster), remote queue (University cluster), external queue (web service, grid)
22 1 Anonymous
   * Currently, a job is equal to a task:[[br]]
23 1 Anonymous
     e.g. task = pairwise comparison of the proteins in the ''entire'' dataset with ''several'' given similarity methods[[br]]
24 1 Anonymous
     jobs = ''separate'' jobs calculating all pairwise comparisons of the entire dataset with ''one'' similarity method
25 1 Anonymous
   * Future plans:[[br]]
26 1 Anonymous
     Divide 3D problem space into subsets of datasets and methods, each subset being an independent job[[br]]
27 1 Anonymous
     See next section for further details on the ''3D Problem Space''
28 1 Anonymous
29 1 Anonymous
 * '''Dataset''':
30 1 Anonymous
   * Currently: Collection of PDB structures, previously calculated similarity matrices 
31 1 Anonymous
   * Future plans: Previously calculated similarity matrices should be uploaded in a post-processing step, not in a pre-processing step (ticket:28)
32 1 Anonymous
33 1 Anonymous
 * '''Results''':
34 1 Anonymous
   * Currently, entire similarity matrices of different sources 
35 1 Anonymous
   * Future plans: Generate similarity matrices directly from single pairwise comparison results stored in the database
36 1 Anonymous
37 1 Anonymous
38 1 Anonymous
== The 3D Problem and Solution Spaces ==
39 1 Anonymous
 * '''Problem Space''':[[br]]
40 1 Anonymous
   The problem space for an all-against-all comparison of a dataset of P protein structures using M different similarity comparison methods can be represented a 3D cube: [[br]]
41 1 Anonymous
   x: Dataset: list of proteins[[br]]
42 1 Anonymous
   y: Dataset: list of proteins[[br]]
43 1 Anonymous
   z: Tasks: list of similarity comparison methods
44 1 Anonymous
 * '''Partitionig the Problem Space''':[[br]]
45 1 Anonymous
   For a most efficient calculation of all cells in the 3D problem space, it can be subdivided into sub-cubes, which are called jobs when placed into the queue of a queing system. Examples:[[br]]
46 1 Anonymous
   a. Comparison of ''one pair of proteins'' using ''one method'' in the task list => PxPxM jobs, each performing 1 comparison
47 1 Anonymous
   b. All-against-all comparison of the ''entire dataset'' with ''one one method'' => M jobs, each performing PxP comparisons
48 1 Anonymous
   c. Comparison of ''one pair of proteins'' using ''all methods'' in the task list => PxP jobs, each performing M comparisons
49 1 Anonymous
   d. Intelligent partitioning of the 3D problem space, comparing a subset of proteins with a subset of methods
50 1 Anonymous
 * '''Solution Space''':[[br]]
51 1 Anonymous
   Each similarity comparison ''methods'' can provide several similarity ''measures''[[br]]
52 1 Anonymous
   For one slice in the 3D problems space using one particular method, we might get several slices in the 3D solution space providing several measures
53 2 Anonymous
 * '''Special Cases''':[[br]]
54 2 Anonymous
   The 3D problem space is reduced to a 2D problem space (1xPxM) when using methods that to not compare pairs of proteins but work on one single protein, e.g. calculating the PDB picture, or getting additional data from the iHOP web service.      
55 3 Anonymous
56 3 Anonymous
57 3 Anonymous
== Lifecycle of Requests ==
58 3 Anonymous
A new request is submitted in the browser:[[br]]
59 3 Anonymous
 * the request is registered in the database, request parameters, the dataset (structures and matrices) and user tasks are added[[br]]
60 3 Anonymous
   Request Status: P = prepared[[br]]
61 3 Anonymous
The ''cron'' administrational tools check the status of all tasks periodically:[[br]]
62 3 Anonymous
 * As soon as the first task has started (R), the request is said to be ''running'' and the status of the request is changed in the database.[[br]]
63 3 Anonymous
   Request Status: R = running
64 3 Anonymous
 * As soon as the last task has finished successfully (F) or with errors (E), the request is said to have ''finished'' and its status is changed in the database. The user gets a nofification email if requested.[[br]]
65 3 Anonymous
   Request Status: F = finished
66 3 Anonymous
 * If a request has finished, and the expiration date (soft limit) has been exceeded, the request is said to have ''expired'' and its status is changed in the database. The user gets a nofification email if requested.[[br]]
67 3 Anonymous
   Request Status: X = Expired
68 3 Anonymous
 * If a request has been expired and the deletion date (hard limit) has been exceeded, the complete requeste including all tasks and data is deleted from the database and hard disk. The user gets a nofification email if requested.
69 3 Anonymous
70 3 Anonymous
71 3 Anonymous
== Lifecycle of Tasks ==
72 3 Anonymous
'''Attention:''' Currently, a ''Job'' equals a ''Task''.
73 3 Anonymous
74 3 Anonymous
A new request is submitted in the browser:[[br]]
75 3 Anonymous
 * ProCKSI's main scheduler is always registered as the "Root Task" into the database[[br]] 
76 3 Anonymous
   Status of Root Task: P = prepared
77 3 Anonymous
 * All ''User Tasks'', which the user has selected to be performed, are prepared and registered into the database.[[br]] 
78 3 Anonymous
   Status of User Tasks: P = prepared
79 3 Anonymous
 * The ''Root Task'' is submitted to the queing system.[[br]]
80 3 Anonymous
   Status of Root Task: Q = Queued
81 3 Anonymous
82 3 Anonymous
The queing system automatically starts the tasks according to the order in which they were queued:[[br]]
83 3 Anonymous
 * As soon as the ''Root Task'' starts, it changes its own status in the database, analyses all ''User Tasks'', partitions the 3D problem space into sub-cubes, and submits these to the queing system.[[br]]
84 3 Anonymous
   Status of Root Task:  R = Running
85 3 Anonymous
   Status of User Tasks: Q = Queued
86 3 Anonymous
 * As soon as a ''User Task'' starts, it changes its own status in the database, and performs its calculations
87 3 Anonymous
   Status of User Task: R = Running
88 3 Anonymous
 * As soon as any task finishes, it changes its own status in the database[[br]]
89 3 Anonymous
   Task Status: C = Completed
90 3 Anonymous
91 3 Anonymous
The ''cron'' administrational tools check the status of tasks periodically:[[br]]
92 3 Anonymous
 * If a ''Task'' has been completed, its results are post-processed (e.g. registered in the database), and the status of the task is changed in the database, and an expiration and deletion date is set. The user gets a nofification email if requested.[[br]]
93 3 Anonymous
   Task Status: F = Finished (successfully)
94 4 Anonymous
 * In case that there have occured any serious problems, the status of the task is changed in the database accordingly.[[br]]
95 3 Anonymous
   Task Status; E = Errors
96 3 Anonymous
 
97 3 Anonymous
== Task and Job Dependencies ==
98 4 Anonymous
 * Some tasks must have finished successfully before a dependent task can be started:[[br]]
99 3 Anonymous
   e.g. ''Contacts'' must have been calculated before ''USM'' and ''MaxCMO'' similarities can be calculated
100 3 Anonymous