DataManagement » History » Version 2

« Previous - Version 2/7 (diff) - Next » - Current version
Anonymous, 07/09/2007 10:48 AM


= Data Management and Representation =

Definitions * '''User''' (to be implemented in the future): * Represented by a unique email-address * Authentication to gain access to user data (requests, personalised settings) * Manage multiple requests * '''Request''': * Unique handle for the combination of a dataset, tasks, and request parameters * Request parameters: e.g. request description, settings for notification by email * '''Task''': * Something to be performed with the given datasetbr
e.g. calculation of PDB structure pictures (1D), or comparison of pairs of proteins with a given similarity method (2D) * Task parameters: e.g. parameters for each comparison method, output parameters for picture generation, ... * '''Job''': * Everything that lives in a queuebr
e.g. local queue (ProCKSI cluster), remote queue (University cluster), external queue (web service, grid) * Currently, a job is equal to a task:br
e.g. task = pairwise comparison of the proteins in the ''entire'' dataset with ''several'' given similarity methodsbr
jobs = ''separate'' jobs calculating all pairwise comparisons of the entire dataset with ''one'' similarity method * Future plans:br
Divide 3D problem space into subsets of datasets and methods, each subset being an independent jobbr
See next section for further details on the ''3D Problem Space'' * '''Dataset''': * Currently: Collection of PDB structures, previously calculated similarity matrices * Future plans: Previously calculated similarity matrices should be uploaded in a post-processing step, not in a pre-processing step (ticket:28) * '''Results''': * Currently, entire similarity matrices of different sources * Future plans: Generate similarity matrices directly from single pairwise comparison results stored in the database The 3D Problem and Solution Spaces * '''Problem Space''':br
The problem space for an all-against-all comparison of a dataset of P protein structures using M different similarity comparison methods can be represented a 3D cube: br
x: Dataset: list of proteinsbr
y: Dataset: list of proteinsbr
z: Tasks: list of similarity comparison methods * '''Partitionig the Problem Space''':br
For a most efficient calculation of all cells in the 3D problem space, it can be subdivided into sub-cubes, which are called jobs when placed into the queue of a queing system. Examples:br
a. Comparison of ''one pair of proteins'' using ''one method'' in the task list => PxPxM jobs, each performing 1 comparison
b. All-against-all comparison of the ''entire dataset'' with ''one one method'' => M jobs, each performing PxP comparisons
c. Comparison of ''one pair of proteins'' using ''all methods'' in the task list => PxP jobs, each performing M comparisons
d. Intelligent partitioning of the 3D problem space, comparing a subset of proteins with a subset of methods * '''Solution Space''':br
Each similarity comparison ''methods'' can provide several similarity ''measures''br
For one slice in the 3D problems space using one particular method, we might get several slices in the 3D solution space providing several measures * '''Special Cases''':br
The 3D problem space is reduced to a 2D problem space (1xPxM) when using methods that to not compare pairs of proteins but work on one single protein, e.g. calculating the PDB picture, or getting additional data from the iHOP web service.