logo text
ACM TechNews

Semantic Provenance for eScience: Managing the Deluge of Scientific Data

Internet Computing (08/08) Vol. 12, No. 4, P. 46; Sahoo, Satya S.; Sheth, Amit; Henson, Cory

Metadata that is essential to the effective management of exponentially growing volumes of scientific data from industrial-scale experiment protocols is known as provenance information in eScience. Expressive provenance information and domain-specific provenance ontologies are incorporated into the semantic provenance architecture for eScience data, which applies the information to data management. The authors write that provenance information has to be expressive and software-interpretable so that it can be employed effectively for eScience data management, and to accomplish this, the authors have combined the concept of provenance information with domain knowledge and ontological underpinning. A new approach that separates the task of producing high-quality semantic provenance from the core functionality of workflow engines is called for, and the authors present a "two-degrees-of-separation" strategy in which semantic provenance creation is handled by specialized services that cite one or more domain-specific provenance ontologies and can be embedded within scientific workflows on demand, while a workflow engine would be equipped with a set of services and a suite of domain-specific provenance ontologies as resources that could be flexibly blended into a scientific workflow based on user needs. The semantic provenance framework for eScience is described by the authors as incorporating three basic dimensions representing semantic provenance annotation, domain provenance ontologies, and usage. The first dimension entails a set of specialized tools interfacing with a scientific workflow on demand to generate semantic-provenance information; the second dimension utilizes domain-specific provenance ontologies to model scientific processes, data, and agents as formally defined concepts connected via named relationships; and the third dimension involves software agents using reasoning tools to process the semantic-provenance information and answer sophisticated domain queries.

full paper


© Copyright 2008 Information, Inc. This service may be reproduced for internal distribution.