User login

Executing Workflows

Besides offering experimental data, the DAE platform also provides access to reference algorithms. These algorithms are provided as web-services and can be executed without need to install or compile specific document-analysis related software.

CAVEAT: Please be aware that the following instructions apply to the standalone VirtualBox environment. In order to operate on the live DAE server, some modifications need to be taken into account.

List of Available Algorithms

The algorithms available on this demo platform are listed here. Those available on the live DAE platform are here.

Clicking on the "Algorithms" item on the left hand side of this window, will only list those algorithms who have had successful executions, and will provide access to those actually executed by the currently connected user, when clicking on the sub-item "My Runs".

Combining Algorithms into Workflows

The main advantage of exposing algorithms as web-services is to provide an open and flexible way to combine them into workflows, without needing to program or adapt to specific hardware or software constraints (cf. "An Open Architecture for End-to-End Document Analysis Benchmarking", Bart Lamiroy and Daniel Lopresti, 11th International Conference on Document Analysis and Recognition - ICDAR 2011 (2011) 42-47)

Step 1. Using Taverna to Execute a Simplified Workflow

This demonstration platform comes with a pre-installed version of Taverna, used to combine web-services into more complex workflows. In this section we proceed in opening a predefined workflow of DAE web-services, execute it, and retrieve its results.

  1. To start Taverna, double-click on the Taverna Workbench icon on the desktop.Taverna Logi
  2. Once started, the Taverna Workbench interface should look like this:

    Taverna Screen

  3. Select File -> Recent Workflows -> /home/dae/Taverna Workflows/Ocrad-Tesseract_comparison_localhost.t2flow as shown below
    Taverna Selection
  4. The final result should look like this:
    Final Taverna Window

The workflow thus opened (and displayed in the right hand panel) takes a predefined image, username and password, and feeds them as input to two separate sub-workflows: one (on the left) first converting the image to the pgm format (the first green box) and passing the result to the Ocrad OCR engine (second green box); another (on the right) passing the image to the Tesseract OCR engine. The results of both executions are then merged into the final output blue box, at the bottom.

Note:

The purple boxes are visually annoying but necessary data encapsulation - decapsulation steps that can be ignored here, for the sake of simplicity. More detailed, and advanced information is available here, but can be conveniently neglected until later.

In order to execute the workflow, one only needs to click on the green "Run the Current Workflow" icon in the top banner. The Taverna interface then changes its configuration, and will progressively grey out the various boxes of the workflow as and when the corresponding stages are executed successfully.

After successful execution the bottom pane should look approximately look like this: Execution Result
representing a list of 2 results on the left side (one for each execution path: Ocrad and Tesseract). Clicking on either of them will display an URL on the right hand side, containing the results of the execution. Cutting and pasting these URLs into a web browser allows to view their contents.

Step 2. Understanding what Just Happened

The previous series of actions in Step 1 have consisted of:

  1. opening a workflow;
  2. executing it.

Remark:

Incidently, the web-services were eventually executed on the local demo host. However, the Taverna orchestrator was unaware of this, and form its viewpoint the services could have been hosted anywhere on the Internet.

It is important to note that the execution of this workflow was merely orchestrated by Taverna: the blue and purple boxes (i.e. basic data manipulation, encapsulation and decapsulation) were done locally by the tool itself; the green boxes, i.e. the actual image treatment web-services, were invoked in a black-box mode over the network, beyond the actual control of Taverna.

This means that

  1. Taverna needs a way to find out where on the network the web-services are located;
  2. Taverna needs a way to identify these services' input and output parameters and types;
  3. somewhere on the network, web-services are listening to incoming connections and to provide computational resources to handle requests.

2.1 Defining Service Repositories in Taverna

Note:

If you're running this demo platform from a host that is not connected to Internet, you most likely will have had an error message at the startup of Taverna, and the dae.cse.lehigh.edu URL may be missing.

This has no incidence on the rest of the demo.

Taverna's service panel, on the top left side of the window lists the URLs of all web-service providers it currently has access to. The following screenshot shows that the Demo version points to services located on the live DAE server (http://dae.cse.lehigh.edu/DAE/services/soap?wsdl) as well as those hosted on the Demo host (http://localhost/localDAE/services/soap?wsdl).

Taverna Service Pane

Expanding the lines of each server will provide the list of all services hosted at that location.

New service providers can be added by clicking on the "Import new services" button, and by providing the appropriate URL. This falls under advanced use of the platform, and will be explained in detail here.

2.2 Accessing Service Input and Output Parameters

The input and output parameters and types are directly provided by the service providers' WSDL interfaces and are therefore imported in a transparent way. They can be made visible in the Taverna interface, by switching to the "Design" view, and by selecting the Detail Button button in the Workflow Diagram pane. Below is a zoom of what should be visible, showing all input and output names and types.

Detailed Workflow Crop

2.3 Providing Web Services

This falls under advanced use of the platform, and will be explained in detail here.

Step 3: Going a Bit Further ...

Both previous steps were just intended to provide a quick and general overview of how to run workflows. For argument's sake they made some simplifying assumptions we need to lift, in order to avoid some misconceptions.

  • The example workflow has hard-coded username and password values. While this is an acceptable choice for a demo workflow on a sandboxed standalone server as this demonstration environment, it obviously becomes impossible to maintain in a live environment.
    Most often, username and password will be part of the workflow's "input ports".

    Example:

    1. Open a new workflow, by selecting File -> Recent Workflows -> /home/dae/Taverna Workflows/Tesseract_localhost.t2flow. This is actually one of the sub-workflows of the previous one.
    2. Click on the green "Run the Current Workflow" icon. The following window should appear:Input Ports Window
    3. By successively selecting each of the three tabs corresponding to the related input ports "page_image", "password", "username" and by selecting the "New value" button, provide the appropriate values:
    4. Click on the "Run workflow" button at the bottom of the window and observe the same execution behavior as in the previous example.
  • The example workflows were stored locally on the demonstration environment. More laborate workflows are available through the MyExperiment repository. They can be directly accessed via the Taverna interface, by clicking on the "MyExperiment" button, and searching for the "DAE" group.
    Note that most of the workflows on MyExperiment are interacting with the live DAE server. Consequently, username and password are not dae/dae, but your personal credentials.