OU-XML Health Checks and Quality Audits
Contents
OU-XML Health Checks and Quality Audits#
As a structured document, OU-XML contains a wide variety of elements that are countable. As such, it is trivial to generate reports that describe how many elements of a particular kind are contained within a particular document. For example, we might count the number of learning outcomes defined in a particular unit, or the number media assets it includes. These counts can feed into quality checks for the document. For example, we might require that the number of outcomes is non-zero (i.e. at least one learning outcome is described at a particular level within the OU-XML document); or where media items are included, we might flag this unit as one that may have rights clearance considerations, or that may raise accessibility issues.
Furthermore, several structural elements are defined in such a way that they encourage, if not actually require, the inclusion of metadata or supplementary information. For example, an estimated time requirement in an activity, or a transcript or long description for a media item.
Finally, the OU-XML document may include references to assets that are themselves “testable”: web links to third party resources that can be link checked, for example.
In this section, we will review several ways in which we can produce quick summary reports that review the structural content of an OU-XML document and hint at possible health checks and quality audit data we might derive from it.
As well as running reports on individual OU-XML documents, we can also run reports over a set of OU-XML documents, or assets retrieved from them. This means we could generate reports on the health of the OpenLeaen OU-XML corpus as a whole, for example. Or audit OU-XML documents from different faculties to see whether units from one facultry have a different asset mix to units from another, for example.
Database Setup#
Set up a database connection to our OpenLearn curriculum asset database:
from sqlite_utils import Database
import pandas as pd
# Open database connection
dbname = "all_openlean_xml.db"
db = Database(dbname)
OU-XML Media Item Quality Checks#
Where media items are defined in an OU-XML document, they should include information supplementary to the resource reference such as a long description or transcript.
It is trivial to check that the corresponding fields are populated, if not check the actual substance of the content of those fields.
A Source reference attribute is available that may be useful for rights tracking purposes etc.
Checking Figure Asset Definitions#
When image assets are used, there is often a requirement to provide a written description of it that can be used by a screen-reader to provide an alternative way of experiencing the asset to visually impaired readers.
By scraping individual figure items into a database table, we can trivially generate reports on the extent to which alternative descriptions are provided, and identify units where they may be missing.
For example, we can query all the figure assets referenced across all OpenLearn units to identify figures where there is no caption, description or alternative text given:
pd.read_sql("""SELECT xpath, code, name, type, caption, alt, description
FROM figures
WHERE caption IS NULL AND alt IS NULL AND description IS NULL;""",
con=db.conn)
xpath | code | name | type | caption | alt | description | |
---|---|---|---|---|---|---|---|
0 | /MediaContent/Figure | L101 | A brief history of communication: hieroglyphic... | jpg | None | None | None |
1 | /Figure | H807 | Accessibility of eLearning | jpg | None | None | None |
2 | /Figure | Addysg gynhwysol: deall yr hyn a olygwn (Cymru) | tif | None | None | None | |
3 | /Figure | Addysg gynhwysol: deall yr hyn a olygwn (Cymru) | tif | None | None | None | |
4 | /Figure | Addysg gynhwysol: deall yr hyn a olygwn (Cymru) | tif | None | None | None | |
... | ... | ... | ... | ... | ... | ... | ... |
2005 | /Introduction/MediaContent/Figure | Розуміння вашої галузі | jpg | None | None | None | |
2006 | /Introduction/MediaContent/Figure | Розуміння вашої галузі | jpg | None | None | None | |
2007 | /MediaContent/Figure | Розуміння вашої галузі | jpg | None | None | None | |
2008 | /Activity/Question/MediaContent/Figure | K314 | Розуміння проблем психічного здоров'я | jpg | None | None | None |
2009 | /Activity/Multipart/Part/Question/MediaContent... | K314 | Розуміння проблем психічного здоров'я | jpg | None | None | None |
2010 rows × 7 columns
Checking Audio Asset Definitions#
Ideally, every audio asset should have a transcript. We can generate summary reports from the database to identify audio items that do not have an associated transcript.
For example:
pd.read_sql("""SELECT xpath, code, name, type, caption, transcript, description
FROM media WHERE type='audio' AND transcript IS NULL""",
con=db.conn)
xpath | code | name | type | caption | transcript | description | |
---|---|---|---|---|---|---|---|
0 | /Activity/Question/MediaContent | L313 | Advanced German: Language, culture and history | audio | None | None | None |
1 | /Activity/Question/MediaContent | L313 | Advanced German: Language, culture and history | audio | None | None | None |
2 | /Activity/Question/MediaContent | L313 | Advanced German: Language, culture and history | audio | None | None | None |
3 | /Activity/Question/MediaContent | L313 | Advanced German: Language, culture and history | audio | None | None | None |
4 | /Activity/Question/MediaContent | L313 | Advanced German: Language, culture and history | audio | None | None | None |
... | ... | ... | ... | ... | ... | ... | ... |
112 | /Activity/Question/MediaContent | S110 | Using numbers and handling data | audio | None | None | None |
113 | /Activity/Question/MediaContent | S110 | Using numbers and handling data | audio | None | None | None |
114 | /Activity/Question/MediaContent | S110 | Using numbers and handling data | audio | None | None | None |
115 | /Activity/Question/MediaContent | S110 | Using numbers and handling data | audio | None | None | None |
116 | /MediaContent | S111 | What are waves? | audio | Audio 1 The song of a humpback whale. (1:09 min) | None | None |
117 rows × 7 columns
Lots of those appear to be audio files related to activities in language units, so perhaps the provision of a transcript would defeat the purpose of the activity?
The “whale song” item is interesting in its lack of a transcript or description: there is no transcript as such, but the sound presumably still be described?
Checking Video Asset Definitions#
In the same way that we checked audio asset descriptions for the presence of transcripts, etc., we can also check for the presence of video transcripts.
Once again, reporting the xpath
context reference for the asset might provide a clue as to why a transcript is missing in any particular case.
pd.read_sql("""SELECT xpath, code, name, type, caption, transcript, description
FROM media WHERE type='video' AND transcript IS NULL""",
con=db.conn)
xpath | code | name | type | caption | transcript | description | |
---|---|---|---|---|---|---|---|
0 | /Activity/Question/MediaContent | L310 | Advanced French: At the science museum in Paris | video | None | None | None |
1 | /MediaContent | An introduction to geology | video | Video 1.5 Magnetic reversal stripes | None | None | |
2 | /MediaContent | S182 | Aquatic mammals | video | Video 2 Humpback whale feeding | None | None |
3 | /Activity/Question/MediaContent | S182 | Aquatic mammals | video | Video 3 Hunting behaviour of dolphins | None | None |
4 | /MediaContent | Being an OU student | video | None | None | None | |
... | ... | ... | ... | ... | ... | ... | ... |
97 | /Activity/Question/MediaContent | U214 | Англійська у світі сьогодні | video | None | None | None |
98 | /Activity/Question/MediaContent | U214 | Англійська у світі сьогодні | video | None | None | None |
99 | /Activity/Question/MediaContent | U214 | Англійська у світі сьогодні | video | None | None | None |
100 | /Activity/Question/MediaContent | U214 | Англійська у світі сьогодні | video | None | None | None |
101 | /Activity/Question/MediaContent | U214 | Англійська у світі сьогодні | video | None | None | None |
102 rows × 7 columns
Equation Quality Checks#
The equation object definition supports captions, alternative descriptions etc. and as such, completion of these tags is checkable.
Link Checking#
Web links in an OU-XML document may be relative links to content that is referenced relative to the location of the document itself, or absolute ones.
De-referencing the relative links may require knowledge of the local context, in which in situ testing of materials derived from the source document may be more appropriate.
Testing of absolute links can be performed at any time, and can be tested to check not just that the referenced location can be resolved correctly and returns some sort of resource (for example, the page does not return HTTP 404 not found
or HTTP 451 Unavailable for Legal Reasons
error; HTTP 403 Forbidden
errors might also be worth detecting in the context that they identify an authentication context or requirement associated with retrieving the referenced resource). If requests return an HTTP 301 Moved Permanently
response, there might be a case for updating the URL. Similarly, other ways of detecting redirects might be used to flag up concerns regarding the current status of link and whether it should be update to point to a resource more directly.
As well as simply testing that links resolve, the presence of links in an OU-XML document also raises concerns regarding whether or not the returned resource is the intended resource. Web pages are not ncessarily static and may be update at any time by their owners. Or if a resource disappears and an alternative needs to be found, it may not be clear from the context of the OU-XML document linking to the resource what the resource contained or what was particulalry notable about that resource; this in turn can lead to difficulties when it comets to identifying what an appropriate replacement resource might be.
The innovationOUtside/ouxml-link-checker
#
The innovationOUtside/ouxml-link-checker
addresses several issues associated with link quality checking, in particular:
checking link resolution:
broken links (404);
temporary and permanent redirects;
links using OU authentication (e.g. libezproxy linked domains suffixed by
.libezproxy.open.ac.uk
will have the libezproxy component stripped);links using liblinks (of the form
https://www.open.ac.uk/libraryservices/resource/website:ID&f=FID;
thef=FID
seems redundant - what does it do?)
link archiving: for each link to an public web resource, submit the link to the Intenet Archive for archiving purposes. Ideally, OU-XML links to public web resources should have a reference to an archived copy of the referenced resource that can be “swapped in” in the case of the original resource disappearing. (This approprach is used for maintaining link stability and integrity in Wikipedia, for example.)
capturing linked page screenshots: in many cases, simply being able to see a screenshot of a complete web page will provide enough information regarding the content of the page to be able to identify enough context to support the discovery of a replacement resource. A simple gallery of screenshots of pages referenced from an OU-XML document can also be used to support a quick review of linked to content in an off-line way, as well as automated testing to identify whether the content of a page or a partciular area of the page has changed either at all, or in a significant way. (This may be important for accessibility testing, for example, where the accessibility of linked to resources needs to be checked.)