UC-IEG-VII
From EGI Knowledge Base
Use Case title: UltraSound Computing Tomography
Short description: Ultrasound Computer Tomography is a new method of medical imaging based on the reconstruction by numerical techniques of an image, using as input the data measured by a scanner of Ultra Sounds which surrounds the object of interest.
In the basic setup every point (transducer) of a cylindrical scanner emits a pulse with a circular wave front, while all others receive the scattered signal. When the process is repeated for all transducers, it is possible to reconstruct the object that has produced the scattering patterns recorded by all transducers along the process.
The researchers from the Forschungszentrum Karlsruhe are investigating the application of this technique to early detection of breast cancer. This means detection of tumours as small as possible, and with the most precise spatial location possible. For this we need sub millimetre resolution in three dimensions. Conventional reconstructions techniques are slow and in practice suppress its usage on a large scale in Hospitals.
Actors involved:
Requirements
The computational cost of image reconstruction is very high. In a typical situation we face an amount of data of about 20GB. The reconstruction of the full data set would take in a single workstation about 180 years. This would imply sub millimetre resolution, for the whole volume.
However such resolution is rarely needed, and in Hospital cases one restricts de volume of study and also the areas of high resolution to be more limited. In a normal use case we would be talking about reconstruction times of about 30 days in a single workstation. Evidently this is too much time, and our goal is to reduce this time to something acceptable profiting from Grid technologies.
The bottleneck in this application is data transmission. If we split the calculation in a number N of jobs, we have to submit the 20GB of data along with every job. This strategy has the advantage that the calculation of every job can be restricted to a subset of the image, and the output of the individual jobs is smaller. It has however the drawback of having to send at submission time N times 20GB of data.
Data are submitted as a tar ball containing 3.5 Million input files, which is locally unpacked into many files and many directories. This will put stress in the filesystems on the local clusters. Accessing a lot of small files will probably put stress on the local network because NFS or even parallel distributed filesystems are very sensitive to accessing many small files.
Besides filesystem issues, we should consider to use always SMP machines for this application with at least 2 physical processors per motherboard. The situation is that the linux kernel, by construction, uses one of the CPU for data access and the other for floating point calculation.
Besides computing requirements, there are other user requirements. One is trying to hide as much as possible the complexity of using a Grid infrastructure, in terms of Grid certificates, submission of large amount of jobs, etc… Efforts have to be put into a reliable strategy to upgrade software versions on the repositories, and of course, into fault tolerance issues.
