Unit Mapping
Contents
Unit Mapping#
work in progress
One of the simplest ways of getting an overview of a unit is to generate a simple index or map of the major headings and subheadings.
Example of prior work: Generating Mind Maps from OU/OpenLearn Structured Authoring XML Documents
from sqlite_utils import Database
# Open database connection
dbname = "all_openlean_xml.db"
db = Database(dbname)
Let’s get the OU-XML for an arbitary unit:
from lxml import etree
import pandas as pd
# If there are multiple units associated with H807, pick the first
h807_xml_raw = pd.read_sql("SELECT xml FROM xml WHERE code='H807'", con=db.conn).loc[0, "xml"]
# Parse the XML into an xml object
root = etree.fromstring(h807_xml_raw)
Bring in our simple utility function to help flatten elements, if required:
import unicodedata
def unpack(x):
return etree.tostring(x)
# via http://stackoverflow.com/questions/5757201/help-or-advice-me-get-started-with-lxml/5899005#5899005
def flatten(el):
"""Utility function for flattening XML tags."""
def _flatten(el):
if el is None:
return "" # Originally returned None; any side effects of move to ''?
result = [(el.text or "")]
for sel in el:
result.append(_flatten(sel))
result.append(sel.tail or "")
return unicodedata.normalize("NFKD", "".join(result)) or " "
return _flatten(el).strip()
We can now grab all the headings and subheadings and render a simple contents list for the unit. To display the contents, we can use a simple tree widget.
Let’s start by parsing out the title of the unit:
title = root.find("ItemTitle").text
code = root.find("CourseCode").text
title, code
('Accessibility of eLearning', 'H807_1')
We can now build up out tree from session and section headings:
#%pip install ipytree
# ipytree provides access to a jstree wdget
from ipytree import Tree, Node
# Create a tree object
tree = Tree()
# Create a unit title node for our tree
node1 = Node(f"{title} ({code})")
# Add the unit title node to the top of the tree
tree.add_node(node1)
sessions = root.findall('.//Unit/Session')
unit_structure = {"title": {}}
for session in sessions:
title = session.find('.//Title').text
subnode = Node(title)
node1.add_node(subnode)
subsessions=session.findall('.//Section')
for subsession in subsessions:
heading = subsession.find('.//Title').text
subnode.add_node( Node(heading) )
tree
The tree
widget doesn’t appear to render when I flow this document as part of a Jupyer Book, so I need to find an alternative tree display for this demo. In the meantime, here’s a screehshot to get a flavour of what you’re missing…
It would be easy enough to generate a table contain session and section headings across all units and then use that as a way of providing a heading level search to retrieve items at that level of granularity.
Generating Tables of Contents Derived From Sections in Different Units#
As well as generating tree listings of session and section headings related to a single unit, we can also generate table of content views over sections retrieved from multiple units.
For example, TO DO - search around a term to retrieve items from multiple units and generate a “customised” uniti on a topic, eg ordered by level, etc