Describing Data and Workflow Provenance Using Design Patterns and Controlled Vocabularies

Smriti Rajkarnikar Tamrakar, University of Texas at El Paso


In any scientific experiment, researchers are required to access, compute, and analyze data to produce useful information to the scientific community. In order to instill trust on such scientific research products, the product users need to understand the procedure applied and the assumptions incorporated. The reuse and replication of reliable scientific data need methods that help the users to understand data origin and the derivation process, i.e. provenance. Although several standards for representing provenance such as PROV model (a W3C recommendation) have been recommended, they have not been widely utilized by scientific communities due to difficulty in aligning such recommended standards to their needs. However, use of this standard has not improved, as suggested by provenance usage studies in the literature. In this research we propose controlled vocabularies for describing provenance data using three provenance design patterns. These provenance design patterns were used in three domains, i.e., Smart Cities, Water Modeling, and Biodiversity Modeling. We evaluate the proposed vocabulary with users of the interdisciplinary, international USDA-funded Water modeling project. The results show that in general, provenance is important to understand and trust a final product. This work provides a building block to create and evaluate complex provenance design patterns that can be embedded in systems that manipulate data and executes scientific workflows.

Subject Area

Information science|Computer science

Recommended Citation

Rajkarnikar Tamrakar, Smriti, "Describing Data and Workflow Provenance Using Design Patterns and Controlled Vocabularies" (2017). ETD Collection for University of Texas, El Paso. AAI10284128.