Publication Date



Technical Report: UTEP-CS-13-24


The process of collecting and transforming data can extend across different platforms, both physical and digital. Capturing provenance that reflects the actions involved in such a process in a consistent manner can be difficult and involve the use of multiple tools. An approach based on formal ontologies and software engineering practices is presented to capture data provenance. The approach starts by creating ontologies about data collection and transformation processes. These ontologies, referred to as Workflow-Driven Ontologies, establish a consistent view of the process that is independent of the platform used to carry out the process. Next, software modules are generated, targeting specific types of platforms on which data processes are implemented, so that data provenance can be captured as the process is being carried out. This paper presents the software architecture of the approach and discusses the generation of software modules, leveraging the structure and terminology of Workflow Driven Ontologies to capture data provenance. The result of this approach is the creation and population of knowledge bases that capture the processes used to collect and transform data, as well as provenance about how individual datasets were produced.