UC-EELA-WISDOM
From EGI Knowledge Base
Use Case title: Participation of the EELA1 Project in the WISDOM Data Challenge-II
Short description: The objective of WISDOM1 is the proposition of new inhibitors for a family of proteins produced by Plasmodium falciparum. The WISDOM platform performs High-Throughput virtual Dockings of million of chemical compounds available in the databases of ligands to several targets. The second Data Challenge (DC-II) was done in the end of 2006.
Actors involved: Eight (8) EELA sites were involved in the process, among them UPV (Universidad Politécnica de Valencia, Spain) who coordinated and ULA (Universidad de Los Andes, Venezuela) who proposed two targets to be docked for the Plasmodium vivax: The binding site is in the loop Asn117 until Tyr125; and, the loop SER117 until Tyr125.
Related Requirement: The DC-II used the WISDOM scripts developed in EGEE, and were launched from the UPV and carefully monitored several times per day. Since both projects, EGEE and EELA, use the same middleware, no specific requirements had to be taken into account once the licenses for the use of some tools were arranged by the WISDOM consortium.
Steps: Once the RPMs are distributed and installed in the different sites, the submission of jobs and their monitoring are crucial for the development of the docking process. These kind of DCs normally stop when 90-95% of the execution of the job is completed, but EELA, for testing the availability and robustness of the infrastructure, maintained the execution until the 100% was reached.
There were several lessons learnt during the EELA DC:
- Difference of scale between EGEE and EELA. The scripts were not as efficient for a reduced infrastructure in which the sites have in total fewer resources;
- The automatic resubmission of scheduled jobs make throughput to be reduced as the number of Jobs remaining decreased. Changes on the scripts were made on-the-fly;
- The EELA DC detected that some files in SEs were corrupted because it reached the 100% of completion. It will be necessary to double check the files transferred;
As a consequence, it is highly advisable:
- Adapted scripts to the conditions of the infrastructure, considering number of resources available, average queuing times and performance. Dynamic reconfiguration is desirable;
- Double check the input files, even if it takes several days;
- Check the evolution of the process several times per day;
- Check wall-time in all resources.
The statistics of the EELA-DC have been:
- Number of original jobs of the first target: 2422;
- Number of jobs successfully completed: 2421;
- Total number of submitted jobs: 109551 (Most of the jobs did not reached the queues);
- Average computing time per job: 2065 minutes;
- Total effective running time: 228 CPU - days;
- Results Obtained: 53 Gbytes.
Post-conditions: Conditions after the use case takes place. Once the targets are docked, the further analysis has to be made by the biologists or physicians.
Its worth mentioning that such application was identified, selected and proposed by the EELA Project since its beginning. Hence, the first steps in order to propose to the WISDOM consortium several targets of interest to be docked in silico were taken.
The following factors were very important during this data challenge:
- The wide bioinformatics user community in Latin America interested in using Grids to do research;
- The high social impact of such disease as malaria has on that region;
- The experience provided by some EELA institutions;
