Bioinformatics and Health Sciences

From EGI Knowledge Base

Jump to: navigation, search

Contents

[edit] Bioinformatics Applications

[edit] GPS@: Grid genomic web portal

Bioinformatic analysis of data produced by complete genome sequencing projects is one of the major challenge of the next years. Integrating up-to-date databanks and relevant algorithms is a clear requirement of such an analysis. Grid computing, such as the infrastructure provided by the EGEE European project, would be a viable solution to distribute data, algorithms, computing and storage resources for Genomics. Providing bioinformatician with a good interface to grid infrastructure will also be a challenge that should be successful. GPS@ web portal, Grid Protein Sequence Analysis, aims to be such a user-friendly interface for these grid genomic resources on the EGEE grid. NPSA (Network Protein Sequence Analysis) serves hundreds of bioinformaticians daily (about 3000 jobs/day). Currently it is limited by its resources (one quad-CPUs motherboard) and therefore the number of users connecting to the portal and the size of the data sets they can process are restricted by the server. The same user community will eager to (transparently) use the grid version of the same service once it has proven to be as stable and as efficient as the original service.

NPSA/GPS@ portal

[edit] xmipp_MLrefine: Macromolecular 3D structure analysis

Electron microscopy is increasingly being used for structural characterization of large macromolecular complexes. In particular, cryo-electron microscopy, where macromolecules are rapidly frozen in a thin layer of vitreous ice, has allowed structural characterization of large biological assemblies almost in their native state. However, due to the requirement of a low electron dose to minimize radiation damage and a low contrast between biological matter and ice, the recorded images typically suffer from large amounts of noise. In the single-particle approach, many electron microscopy images corresponding to different views of the specimen are combined in a reconstruction process to obtain three-dimensional structural information. Since the molecules under study generally adopt random orientations on the specimen support, the relative orientations of the experimental projections are not known and need to be determined. Furthermore, combination of multiple images is only justified if the different views correspond to projections of (identical copies of) the same three-dimensional object. Although traditionally the researcher aims at isolating a well-defined biochemical state of the specimen, many samples are large multi-domain protein complexes that exploit molecular flexibility for their functionality. Even in a well-defined chemical environment such complexes may still display different conformations or assembly states. Therefore, one of the main limitations of the information attained in single particle electron microscopy experiments is the sample structural heterogeneity.

In the favourable case of a single structural state, the relative orientations of the projections can be assigned based on a maximum cross-correlation criterion, comparing each experimental image with a library of projections of a reference volume. This volume, which represents the object to be reconstructed, is improved by iterative angular assignment and reconstruction using the new angles. If multiple structural states exist in the specimen, this procedure can be extended by considering multiple volumes in the process. However, due to the high levels of noise in electron microscopy images, the employed cross-correlation functions typically suffer from many false maxima and the assignment of both orientation and reference volume becomes degenerate. We have developed a novel approach for multi-reference refinement of electron microscopy structures. This approach is based on maximum likelihood principles, which aim at finding the most likely model that describes the experimental data. Preliminary tests have shown that compared to conventional multi-reference refinement significant improvements can be obtained using this approach. The implications of these results are very far reaching in the field and the work is performed under a Network of Excellence in 3DEM.

[edit] Xmipp_assign_multiple_CTFs: Micrographia CTF calculation

Images obtained from the electron microscope are affected by many forms of aberration arising from the complex interaction between the matter and the electron beam in the microscope. These aberrations are mainly produced by the electron source, magnetic lenses and the defocus used in experimental practice. Mathematically, the differencebetween a theoretical specimen projection and the actual experimental projectionobtained in the micrograph is modeled by a linear transfer function known in electron microscopy field as contrast transfer function (CTF). Although a well-established theory of image formation in transmission electron microscopy exists [1–4], that describes the CTF in parametric form, there is still a need for a good estimation method, allowing theactual shape of the CTF affecting the experimental images to be determined.

2D-auto regressive moving average modelling (ARMA) is a powerful parametric spectral estimation technique that we had applied to contrast transfer function (CTF) detection in electron microscopy. Parametric techniques such as auto regressive (AR) and ARMA models allow a more exact determination of the CTF than traditionalmethods based only on the Fourier transform of the complete image or parts of it and performing some average (periodogram averaging). Previous works revealed that AR models can be used to improve CTF estimation and the detection of its zeros. ARMA models reduce the model order and the computing time, and more interestingly, achieveincreased accuracy. ARMA models are generated from electron microscopy (EM) images, and then a stepwise search algorithm is used to fit all the parameters of a theoretical CTF model in the ARMA model previously calculated. Furthermore, this adjustment is truly two-dimensional, allowing astigmatic images to be properly treated. Finally, an individual CTF can be assigned to every point of the micrograph, by means of an interpolation at the functional level, provided that a CTF has been estimated in each one of a set of local areas. The user need only know a few a priori parameters of the experimental conditions of his micrographs, for turning this technique into an automatic and very powerful tool for CTF determination, prior to CTF correction in 3D-EM are very far reaching in the field and the work is performed under a Network of Excellence in 3DEM.

[edit] SPLATCHE: genome evolution modeling

SPLATCHE (for SPatiaL And Temporal Coalescences in Heterogeneous Environment) is a modeling tool (cellular automata-based) that allows one to (1) simulate the spread of humans into a geographically realistic landscape; (2) take into account the spatial and temporal heterogeneity of the environment to reconstruct past demography of human populations; (3) generate the molecular diversity of one or several samples of genes drawn at any location of the current human's range.

To estimate the values of various parameters of interest for human populations, SPLATCHE uses a rejection-sampling Bayesian framework that requires 100'000s independent demographic and genetic simulations.

To date, this unique modeling tool has been successfully used by our Lab (CMPG at university of Bern, Switzerland) to address important questions in the field of human evolutionary genetics, such as the geographic origin of modern human populations, the genetic signature of spatially expanding populations, and the genetic contacts between modern humans and Neanderthals.

Its future use will be applied toward the identification of regions of the genome subject to selection (adaptation, genetic diseases), which is of vital importance for finding the genetic bases of complex diseases and for understanding the evolution of our species.

SPLATCHE is dependent on the population genetics program ARLEQUIN to analyse the output genetic data. ARLEQUIN has been developed by our lab, and it can be run jointly with SPLATCHE on the grid. Intermediate output files therefore do not need to be saved, as we only keep results analysed by ARLEQUIN.

SPLATCHE home page


[edit] Drug Discovery

[edit] Docking platform for tropical diseases: grid-enabled docking platform for in sillico drug discovery

This application is the deployment of a high throughput virtual screening platform in the perspective of in silico drug discovery for neglected diseases. This platform will be jointly developed by the SIMDAT and EGEE projects in collaboration with the SwissBioGRID initiative, Swiss Institute of Bioinformatics (Biozentrum Basel), the INSTRUIRE regional grid in Auvergne and the CampusGRID Bonn Aachen regional GRID. First step for the EGEE project is to run several docking software with large compounds databases on malaria and dengue targets.

A computing environment called WISDOM (Wide In Silico Docking On Malaria) has been set up and led to a very large scale docking experiment during summer 2005. A detailed project description and all computation results are available from WISDOM web site. The docking data are currently being analyzed. A WISDOM open day was organized in Bonn in December 16, 2005, to launch a large initiative around the concept of virtual screening for neglected diseases using grid technology.

WISDOM home page

Healthgrid home page

Personal tools
hidden pages