PhD Defense by David Marchant

David Marchant will defend his PhD at 14:00 on July 28th in Auditorium M on Blegdamsvej.

Title: Herding Cats with MEOW: Using File Events to Automate the Creation and Maintenance of Scientific Workflows

Abstract: To manage complex scientific data processing, the concept of workflows has been adapted from the world of business. This is not a recent innovation, and so a vast range of different tools have arisen to the point that all manner of specialised needs and use-cases can be accommodated by one tool or another. However, one requirement that has been recently identified as lacking in the current crop of workflow management tools is the need to be adaptable at runtime. This could be for a variety of reasons, such as the exploratory nature of scientific workflows, error handling, or human-in-the-loop interactions.

This lack of dynamic support is caused by most workflow systems being built in a static, top-down paradigm where all of the constituent parts of a workflow are identified and scheduled before any processing takes place. Therefore, to meet the need to be dynamic a new, bottom-up paradigm of scientific analysis is proposed. This new system is known as Managing Event Oriented Workflows (MEOW), and uses file system events to schedule scientific analysis on a continuous basis. An implementation is provided within the Python package mig meow. This provides definitions for a variety of MEOW constructs, as well as widgets for use within Jupyter Notebooks. The aim of this is to make an accessible system for new users to manage their analysis, as well as provide a variety of provenance about whatever processing has occurred. MEOW is designed to work primarily with the Minimum intrusion Grid system to enable shareable, repeatable, and completely dynamic scientific analysis. However, it is also capable of working in a reduced manner as an independent system in its own right.

As well as MEOW, a variety of supplementary teaching materials are presented to make the learning of new users easier. Supporting work, such as an investigation into converting between Static and Dynamic workflows is also presented, as is a system for integrating remote cloud resources into existing scientific applications.

This thesis acts as support to the scientific work of researchers by providing a tool for automating large amounts of scientific processing in an adaptable manner.

Supervised by Brian Vinter, Kenneth Skovhede, and James Avery.