Introduction#

This living document provides a quick demonstration of how we can build a simple curriculum asset bank that supports discoverability and reuse of “micro-assets” including glossary items, learning outcomes, activities, self-assessment questions and media assets..

The asset bank is generated by mining the structured OU-XML documents that are distributed alongside many OpenLearn Units and from which OpenLearn HTML documents are rendered.

The asset bank separates out different sorts of asset on the basis that when you are searching for a particular sort of resource, less is sometimes more. For example, if you are searching for glossary items, you don’t necessarily want to discover figure descriptions or video transcripts.

A datasette in-browser client can be used to query the database. You can acces it here. Currently, the demo database includes tables containing the unit listing, the raw OU-XML for each unit, extracted figures and media items, learning outcomes and glossary items. See the section on Querying the Database With Datasette for more information.

Note that there is nothing “AI” or “future-technology” looking in this document, although I will probably add a simple doc2vec “semantic search” demo at some point that would perhaps have been classed as AI 5 years ago. Some of the asset extraction components might provide useful data inputs for machine learning systems, but that is out of scope for this document.

The document is historical looking in sense that there is nothing contained in here (except the Doc2Vec bits, if I get round to them, and some named entity extraction bits, if I add them) that could not have been done 15 years ago.

However, the document is current and future looking in the sense that it demonstrates some very simple functionality that is not currently available, that may be useful, and that could be provided today to support production and maintenance.

Furthermore, whilst the demo is limited to a consideration of OU-XML source material for openly licensed, public OpenLearn units, the recipes could equally be applied to the OU-XML source material for OU modules produced over the last 15 years.

A Note on Granularity#

When considering reuse or a search for inspiration in course design, what is an appropriate level of granularity for discovering design inspirational or reusable assets?

When creating a new module in a similar area to a previous one, it is likely that one or more of the new module authors will have worked on previous related modules. As such, they will have a memory of the overall shape and content of the earlier module, a sense of the overall philosophy of the module in terms of learning design, and knowldge of micro-considerations, such as particularly notable activities or commissioned media assets, for example.

A quick way of gaining awareness of the overall shape of a module is to review the unit titles and major headings within the units. A good way of generating awareness the content of a module is to skim the unit headings and sunheadings. Generating mind map style displays of major navigational elements (kearning outcomes, headings/subheadings etc.) can also provide a high level, macroscopic view of a module or unit.

A sense of the overall structure of a unit can also be determined by visually mapping the unit in terms of nominal length (based on simple word count/reading speed calculations, activity guidance times, video element durations, the presence of images, etc.).

Taking inspiration from modules or module components, particularly from other disciplines, relies on knowing which modules to look at. Typically, module teams are comprised of academics and staff tutors who have worked on modules previously, either in production or presentation, and who have a knowledge of those modules. Academics also have an awareness of other modules through discussion with academics who have worked on those modules. But discovery can also be supported by the ability to search module content at various levels of granularity. For example, in terms of learning design, searching learning outcomes across modules might identify modules that teach a particular skill, albeit in a different context, from which inspiration may be drawn (note that there is an issue here is identifying where in a module the teaching associated with a particular learning outcome is addressed). Particular topics might be identified by search headings and subheadings, as well as free or full-text searches of the complete content of a unit or module. In terms of media discovery and reuse, image assets might be identified through searching figure captions and long descriptions, videos and audio materials identified through searching transcripts, as well as activity descriptions where the media asset is referred to in the context of an activity, and so on.

When it comes to production processes, there are many proximal tasks on a day-to-day, hour-by-hour, or even minute-by-minute basis that might be supported by the reuse, either directly, or through taking inspiration, or previously generated assets.

For example, when authoring learning outcomes, the focus is on creating a set of short sentences with a particular structure and sentiment (“you will understand X”, “you will be able to do Y”); when creating a glossary item, the task is to create a sentence or short paragraph appropriate for a learner at a particular level of understanding in the subject area; when creating an image, the initial task may be the selection or creation of the image, but then a figure descriptions must be generated, and clues as to how to create an appropriate image description may be gleaned from seeing how other, similar images have been previously described; and so on.

The design of activities, SAQs and ITQs is another well-defined, scope limited area where reuse of content or the taking of inspiration is also possible; but again, this requires the ability to discover those assets if they are to play any role in reuse.

Module Wide Discovery#

Currently, searching for modules related to a particular topic might start at a high level by searching over the OU module website, or searching over OpenLearn for related module fragments. A search on the intranet might return items from various Sharepoint collections, but the searches are not likely to be very targeted. OU academics have access to a wide range of modules on the VLE, but search within a module requires searching for the module code, selecting a particular presentation, and then searching just within that module presentation site. If you want to search across multiple units, this can be a laborious process. And it also means that you cannot discover items in modules you don’t explicitly search over.

As often as not, the identification of learning designs, module or unit structures, or assets for reuse is recall-based: academics who have worked on modules before recall to mind something that was previously produced or discussed. However, the ability to discover assets at an appropriate level of granularity through search is not currently supported. There is no central “OU-pedia”, There are no Faculty-wide “OU Glossaries” that aggregates glossary items from across modules produced by a particular Faculty or associated with a particular qualificaiton pathway; there is no “OU learning outcomes” directory where modules can be discovered by searching through learning outcomes; there is no “OU SAQs” directory where SAQs from across multiple modules (current or previously presented) can be disocvered; there is no OU module content image or video gallery (at least, not teaching directed: there maybe a rights related collection somewhere? The OU Digital Archive historical image collection is more to do with PR images than module content images); the OU Digital Archive broadcast programs collection and the Historical OU TV and Radio collection both have some historical video assets, but search is limited and the “currency” of the assets debatable, though there may be something to be learned from how a paritcular topic was addressed in a historical teaching video, for example).

Mining OpenLearn#

“Writing a course” is a production activity that takes months or years; creating a learning outcome, glossary item, figure, activity, SAQ or ITQ is something that takes of the order of minutes and hours, and for which minutes or hours might be saved through a consideration of previously produced assets, if they are readily discoverable at an appropriate level of granularity.

The approach taken in this review is to explore the extent to which we can disaggregate materials published on OpenLearn to support discovery of, the drawing of inspiration from, and the potential re-use of materials at a day-to-day level.

Items are identified at an appropriate level of granularity through the mining of OU-XML structured content documents. From the metadata associated with free materials published on OpenLearn, discovery can be supported across various contexts: within a particular unit, for example, in the context of a particular parent module, or at a particular level (beginner, intermediate, advanced, etc.). “Curriculum-wide” discovery is also supported across all the available materials.

The approaches demonstrated here use freely available, openly licensed (in the most part) OpenLearn materials and as such can be freely and openly discussed. However, the same techniques can be applied internally to OU-XML structured content associated with currently presented courses as well as historical courses created within the last twenty years or so.

Improving the Student Experience#

As well as supporting the discovery and reuse of materials for the purposes of production, the disaggregation of materials from the OU-XML source can also be used to support the student experience: a gradually unfolding glossary of terms constructed uniquely for each student based on the modules they have studied to date; a personal search archive of content based on materials they have studied so far; a “future-looking” map of learning outcomes associated with the modules they expect to take on their planned qualification pathway. And so on.

But that discussion is perhaps best saved for another time, and another place…