UC-VO-ILDG-search
From EGI Knowledge Base
Use Case title: Search for scientific data
Short description: An LQCD researcher is looking for an ensemble of gauge configurations that exhibit specific scientific properties.
Actors involved: An LQCD ‘researcher’
Prerequisites: Researcher is registered member of ILDG VO.
Steps:
- Researcher opens the ILDG Browser.
- Researcher creates an XPath query that identifies the scientific properties they are looking for (following the QCDML schema for an ensemble). ILDG Browser includes a query constructor module for users who are not familiar with XPath.
- Researcher submits query to regional grid Metadata Catalogues and waits for results.
- Researcher browses results from query and identifies an interesting ensembles.
- Researcher uses ILDG Browser to retrieve a list of the (LFNs for) configurations within the particular ensemble.
- Researcher generates a proxy certificate and then uses the ILDG ‘getURL’ client to contact regional grid file catalogues and establish SURLs for each of the configurations.
- Researcher uses SURLs to download configuration data to local computer, possibly using srmcp, globus-url-copy, or wget, for SRM, GSIFTP, and HTTP protocols, respectively. Note that, due to the size/number of the datasets and potential bandwidth constraints, this download step may take some time.
- Researcher performs analysis on retrieved datasets.
Middleware/applications involved:
- Web service containers (hosting regional grid MDCs and FCs).
- Regional grid file catalogues (e.g. gLite File Catalogue or Globus Replica Location Service).
- File transfer services/clients – SRM-compliant, GridFTP, and so on.
- Bespoke ILDG clients:
- ILDG Browser (http://forge.nesc.ac.uk/projects/qcdgrid/).
- ILDG getURL client (currently in development).
Notable items
- Different regional grids provide different data transfer protocols, as dictated by resource providers and historic decisions.
- At the time of writing, no common standards exist for interfacing with file catalogues from different middleware stacks (most notably, gLite File Catalogue and Globus RLS).
- At the time of writing, users have experienced poor bandwidth between Europe and Australasia. The level of network performance has, in some instances, made downloading ensembles of data impossible.
