UC-UPV-epidemiology

From EGI Knowledge Base

Jump to: navigation, search

Use Case title: Epidemiological Analysis on the Grid

Short description: Epidemiology research deals with the management of distributed sources of health data and their processing through computer intensive models. It is one of the candidate areas defined in the Roadmap for the Take-up of HealthGrids developed in the frame of the SHARE project (www.eu-share.org). The study of Epidemiology burst and the efficiency of treatments is a very important topic. Currently, on one hand, electronic information is starting to be available in a low-periodicity basis, and on the other hand, complex simulation models for endemic severe diseases with rapid propagation, such as Dengue, are being developed, requiring large computing power and short response time. The use of Grids for the integration of distributed databases and knowledge extraction is currently under development and the availability of a larger-scale platform will increase the quality of the models and the knowledge on the efficiency of the therapy.

Actors involved:

  • Data Providers: Public health authorities owning epidemiological data.
  • Users: Epidemiology researchers.
  • Application Developers: Integrators of the statistical and data mining tools on the Grid to compute the evolution of the epidemiological models.
  • Grid Operators: Persons related to the infrastructure able to watch and look after the application performance and data integrity.

Related Requirements: The requirements are related to the following three areas:

  • Computing. This factor depends on the amount of health data to be used, but ranges between 50-100 CPU weeks per case study iteration. Normally each case study requires several iterations, although the complexity of the iterations decreases progressively.
  • Storage. Again, it depends on the amount of data to be used. Data used is anonymised and filtered, so its size is much smaller than the raw data.
  • Security. Data is dissociated, which relaxes the requirements on the matter of data protection. However, and in order to avoid the exposure of the information which could end up with the identification of rare cases, data is encrypted when stored in the repositories.

Pre-Conditions: Availability of statistical analysis software installed on the resources. Open source packages, such as “R” are being used. The installation of the software on the resources is negotiated directly with the site managers.

Steps:

  1. Gathering data from the epidemiology sources.
  2. Data pre-processing, this implies mainly the enhancement of quality.
  3. Job and Data Splitting, according to the statistical tool and the resources available.
  4. Job Submission and Monitoring.
  5. Collection of Results.
  6. Analysis.

Post-conditions: Operators must watch for the preparation and execution of the jobs and the gathering and postprocessing of the results.

Projects involved: Own resources.

Middleware: gLite and some tools from GT4.

Application: This work mainly consists on the migration of statistical processing tools on the Grid, which are operated by the Grid experts according to the interest of the users. The results are key for the analysis of the efficiency of the therapy, which can only be obtained when dealing with the whole population. Efficiency of vaccines and secondary effects of anti-rheumatoid drugs are key use cases that are being driven by public health authorities.

Personal tools
hidden pages