OpenRefine is Java application with a browser based user-interface that provides a range of tools for cleaning tabular datasets. OpenRefine has been provided as part of a TM351 Data Management and Analysis virtual computing environment since 2016.
OpenRefine can be added to a container-builder environment and exposed via shortcuts in a JupyterLab or Jupyter notebook environment by making defined it as a custom web app.
As well as the web app specification, the application itself needs to be downloaded and installed during the build stage, and copied over to the deployed container via an output block. (Documentation for recommended output block weight ranges is available via the container-builder documentation.) Operating system packages required to run the application need to be installed in the deploy stage. Several utility environment path variables are also set.
An initialisation script that is run at startup time to ensure any required directories, etc., are available is also provided in an external file.
# The Open Refine application is pre-built and only needs Java to be available in the deploy stage.
# Required Java packages are thus installed as apt.deploy packages.
# Any packages specifically required in the build stage should be install as apt.build packages.
# wget is required in the build stage, but it is installed by default.
packages:
apt:
deploy:
- openjdk-17-jre
- openjdk-17-jre-headless
# The paths are referenced by the application, but we can also use them to specify
# the version of the application downloaded, and its target path.
environment:
- name: OPENREFINE_VERSION
value: 3.8.0
- name: OPENREFINE_PATH
value: "/var/openrefine"
# Add any initialisation scripts that need to run on startup
content:
- source: ./webapp_init
target: /etc/ou_webapp
overwrite: always
scripts:
# The pre-built Open Refine application is downloaded and unarchived during a build step.
# It would also be possible to build the application from source in this step, which might
# also require additional apt.build packages to be installed.
- stage: build
commands:
# Download the archived application
- wget -q -O openrefine-\${OPENREFINE_VERSION}.tar.gz https://github.com/OpenRefine/OpenRefine/releases/download/\${OPENREFINE_VERSION}/openrefine-linux-\${OPENREFINE_VERSION}.tar.gz
# Unarchive it
- tar xzf openrefine-\${OPENREFINE_VERSION}.tar.gz
# Copy it to a known path
- mv openrefine-\${OPENREFINE_VERSION} $OPENREFINE_PATH
output_blocks:
deploy:
# The COPY instruction copies from the "build" container to the "deploy" container.
# We copy the OpenRefine application from the original download path in the build stage
# to the target path we require in the deployed containerd.
# Could we also use the environment variable here?
- block: COPY --from=base /var/openrefine /var/openrefine
# We use a weight in the range: 2001 - 3000 Recommended: User blocks
# The weight is reminiscent of the OpenRefine default port (3333).
# Docs on weight ranges: https://docs.ocl.open.ac.uk/container-builder/v3/developer/output_block_weights.html
weight: 2333
content:
# This requires that we have a copy of the SVG icon available at the specified source path
# We could alternatively download the icon in a build step
# and copy it over to the deploy container via an output block.
- source: ./icons/openrefine.svg
target: /var/ou/icons/openrefine.svg
overwrite: always
With the application installed, we now need to provide a web_app specification which defines how to call it. The specification requires the path that the application should be published to when proxied in the VCE and the command required to call it. Additional metadata defines a timeout period that is used to raise an error if an application is slow to start, whether the application should be loaded in a new tab, and a description of the name and the path to any icon (icon_path) used to refer to the application when launching the application.
Web apps are
web_apps:
- path: openrefine
options:
command:
- /var/openrefine/refine
- -i
- "127.0.0.1"
- -p
- "{port}"
- -d
- /home/ou/TM351-24J/openrefine
- -H
- "*"
- -x
- refine.display.new.version.notice=false
timeout: 120
new_browser_tab: true
launcher_entry:
title: OpenRefine
icon_path: /var/ou/icons/openrefine.svg
OpenRefine requires that a specified directory is available when it starts up, so we need to include a startup script that ensures that the required directory is available and has appropriate permissions set.
scripts:
- stage: startup
name: 475-initialising-openrefine-webapp
commands:
- sudo LOCAL_HOME=/home/$USER/${MODULE_CODE}-${MODULE_PRESENTATION} /etc/ou_webapp/openrefine_init.sh
#! /bin/bash
# Local path: ./webapp_init/openrefine_init.sh
# File location: /etc/ou_webapp/openrefine_init.sh
# The script should be run via sudo, added elsewhere:
# echo "ou ALL=(ALL:ALL) NOPASSWD: /etc/ou_webapp/openrefine_init.sh" >> /etc/sudoers
mkdir -p $LOCAL_HOME/openrefine
chown ou:users $LOCAL_HOME/openrefine