Working With Equations
Contents
Working With Equations#
One of the asset types that is represented within OU-XML is an <Equation>
type. This element can be used to represent mathematical and chemical equations. Internal OU readers can refer to the corresponding OU-XML docs here.
The Equation
element tag accepts the following child tags: <Alternative>
, <Caption>
, <Description>
, <Image>
, <Label>
, <MathML>
, <SourceReference>
, and <TeX>
. There is also an <InlineEquation>
tag that accepts <Image>
, <Alternative>
, <MathML>
and <TeX>
tags. The <Image>
, <MathML>
and <TeX>
tags define the “expression” of the equation.
Equation items may be described using either a MathML expression or an Image. The MathML elements are rendered in the VLE using Mathjax and via LaTex for PDF print publications. Browsers such as Firefox are also capable of rendering MathML directly.
One of the problems with MathML as a structure is that it is not the sort of thing you would write by hand, and as such, it may be difficult to discover via simple search. (A simpler way of writing equations is to use LaTeX, for example.)
One way of supporting discover might be to index the text in the subsection that an equation is contained in. We can find a local context from the the path to each equation element, so for now, let’s just grab the path; we can then think about how we can generate a search context around that path in another section.
Foer now, only handle equations that are described using MathML; ignore eqautions that are rendered as images.
Preparing the Ground#
As ever, we need to set up a database connection:
from sqlite_utils import Database
# Open database connection
xml_dbname = "all_openlean_xml.db"
xml_db = Database(xml_dbname)
eqns_dbname = "openlean_assets.db"
db = Database(eqns_dbname)
And get a sample XML file, selecting one that we know contains structurally marked up equation items:
import pandas as pd
pd.read_sql("SELECT * FROM xml WHERE xml LIKE '%<Equation>%'",
con=xml_db.conn)
code | name | xml | id | |
---|---|---|---|---|
0 | T212 | An introduction to electronics | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | e70841f12a908401ab9e6a69923bdb684928c888 |
1 | An introduction to geology | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | 4c8058285a4de53528f646ee2742dc8394fd4e38 | |
2 | S276 | An introduction to minerals and rocks under th... | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | 6bff78840be5165329dda278418bbbd54c909047 |
3 | T193 | Assessing risk in engineering, work and life | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | 11e5486d113eebd6c01126c9c65b91591c211b9b |
4 | SK299 | Blood and the respiratory system | b'<?xml version="1.0" encoding="UTF-8"?>\n<?dc... | 904a100e4d41cf1a696b547eec1b2f625fc5bd78 |
5 | Discovering chemistry | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | 884164a46f4066c6b26894c812484c74ab2e8531 | |
6 | Mathematics for science and technology | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | 84fea7b4cf86cdd4e31e3272572372972fb81fe2 | |
7 | s315 | Metals in medicine | b'<?xml version="1.0" encoding="UTF-8"?>\n<Ite... | c2c90459369d82e28e768dfd9072047eab95be4d |
8 | SM123 | Particle physics | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | 4095122554b7cc3cff824f31c3cf531087e63b2c |
9 | S112 | Scales in space and time | b'<?xml version="1.0" encoding="UTF-8"?>\n<?sc... | 75a013ae7e703481e8f0e05bde38d6d71fa732b6 |
10 | Succeed with maths: part 2 | b'<?xml version="1.0" encoding="UTF-8"?>\n<!--... | 0ab4176f55d221bdc8bd68fbcd2e9c575d2a5e4e | |
11 | Taking your first steps into higher education | b'<?xml version="1.0" encoding="UTF-8"?>\n<?sc... | 0c0fb0a9800e8b37c269fd12cb8b600a44c1d189 | |
12 | Teaching mathematics | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | 7995427cc08b1874750af6c79cb45f7556addf4b | |
13 | T271 | Toys and engineering materials | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | c86d9d40259a5889e62488029ce5454a8ddb6a00 |
14 | S215 | What chemical compounds might be present in dr... | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | 2adf47ad5e2f97c2983da6e3f08a0e7066e67e8a |
15 | S111 | What is a metal? | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | 0961a26cafb1aa6132dfc455251d123bf390bf70 |
16 | DB123 | You and your money | b'<?xml version="1.0" encoding="utf-8"?>\n<?sc... | b2235d2d9c032379eb891b42b46c41ed96a495c0 |
from lxml import etree
import pandas as pd
# Grab an OU-XML file that is known to contain equation items
# Maybe also: Teaching mathematics
equation_xml_raw = pd.read_sql("SELECT xml FROM xml WHERE name='Discovering chemistry'",
con=xml_db.conn).loc[0, "xml"]
# Parse the XML into an xml object
root = etree.fromstring(equation_xml_raw)
Grabbing the Path#
By walking up the path that leads to an equation element, we can identify its context within an OU-XML document, and from that we should be able to generate a “local search context” we can index and search within in order to support discovery of an equation from terms included in its surrounding text, for example.
# We need to grab a model of the document tree
tree = etree.ElementTree(root)
# And then grab the paths
for e in root.xpath('//Equation'):
display(tree.getpath(e))
'/Item/Unit[2]/Session[1]/Section[2]/Equation'
'/Item/Unit[3]/Session[3]/Section[3]/ITQ[1]/Answer/Equation'
'/Item/Unit[6]/Session[1]/Section/Equation'
'/Item/Unit[6]/Session[2]/Equation[1]'
'/Item/Unit[6]/Session[2]/Equation[2]'
'/Item/Unit[6]/Session[2]/Section[1]/Equation[1]'
'/Item/Unit[6]/Session[2]/Section[1]/Equation[2]'
'/Item/Unit[6]/Session[2]/Section[1]/Equation[3]'
'/Item/Unit[6]/Session[2]/Section[1]/Equation[4]'
'/Item/Unit[6]/Session[2]/Section[2]/ITQ[1]/Answer/Equation'
'/Item/Unit[6]/Session[2]/Section[2]/ITQ[2]/Question/Equation'
'/Item/Unit[6]/Session[5]/Section/Equation'
'/Item/Unit[6]/Session[6]/Equation'
'/Item/Unit[9]/Session[2]/Equation'
'/Item/Unit[9]/Session[2]/Section[2]/Equation'
'/Item/Unit[9]/Session[3]/ITQ[2]/Question/Equation'
'/Item/Unit[9]/Session[3]/ITQ[2]/Answer/Equation[1]'
'/Item/Unit[9]/Session[3]/ITQ[2]/Answer/Equation[2]'
Looking at the paths, we might then identify a context as a particular block level elemet further up the tree. For example, we might say the context is the first element reached as we walk back up the tree from the set Section
, SubSection
, or Session
. For an even more tightly defined search context, we might add an activity element types to that list (Activity
, ITQ
, SAQ
, or Exercise
).
import re
def navigational_context(path, elements=None):
"""Find meaninglful exact local context path."""
elements = ["Section", "SubSection", "Session",
"Activity", "ITQ", "SAQ", "Exercise"
] if elements is None else elements
# Iterate the path elements in reverse order
path_elements = path.split("/")
path_len = len(path_elements)
for i, subpath in enumerate(path_elements[::-1]):
# Clean the numeric index from the path element
if re.sub(r'\[\d+\]', '', subpath) in elements:
return "/".join(path_elements[:path_len-i])
return path
We can the find the exact navigational path to a the first local context element we meet at a desired lvel of granularity.
For example:
example_path = '/Item/Unit[9]/Session[3]/ITQ[2]/Question/Equation'
example_context = navigational_context(example_path)
example_context
'/Item/Unit[9]/Session[3]/ITQ[2]'
We could then index the text of that context block to support discovery of the equation:
from xml_utils import flatten, unpack
# Example text for indexing to support equation discovery
# We are rendering the flattened equation here, so it may not make much sense!
flatten( root.xpath(example_context)[0] ), \
flatten(root.xpath(example_context)[0].find("*//Equation"))
('The combination of sulfur dioxide with oxygen, and the decomposition of steam into hydrogen and oxygen are both reactions of great potential value. These reactions and their equilibrium constants at 427oC (700K) are as follows.2SO2(g)+O2(g)=2SO3\u2062 (g)K=106\u2062 mol−1\u2062\u2062 litre 2H2O(g)=2H2(g)+O2(g)K=10−33\u2062 mol\u2062 \u2062 litre−1Write expressions for the equilibrium constants of the two reactions.When the two reactions are attempted at 700K, neither seems to occur. Which of the two might be ‘persuaded’ to proceed at this temperature, and what form might your persuasion take?The equilibrium constant of the first reaction, K1, is given by K1=[SO3(g)]2[SO2(g)]2[O2(g)]That of the second,K2=[H2(g)]2[O2(g)][H2O(g)]2The data show that K2 is tiny: at equilibrium, the concentrations of the hydrogen and oxygen in the numerator (the top line of the fraction) are minute in comparison with the concentration of steam in the denominator (the bottom line of the fraction). So in a closed system at 700 K, significant amounts of hydrogen and oxygen will never be formed from steam. By contrast, K1 is large, so the equilibrium position at 700 K lies well over to the right of the equation, and conversion of sulfur dioxide and oxygen to sulfur trioxide is favourable. The fact that the reaction does not occur must be due to a slow rate of reaction. We may therefore be able to obtain sulfur trioxide in this way if we can find a suitable catalyst to speed up the reaction. A suitable catalyst is vanadium pentoxide, V2O5, and at 700 K, this reaction is the key step in the manufacture of sulfuric acid from sulfur, oxygen and water.',
'2SO2(g)+O2(g)=2SO3\u2062 (g)K=106\u2062 mol−1\u2062\u2062 litre 2H2O(g)=2H2(g)+O2(g)K=10−33\u2062 mol\u2062 \u2062 litre−1')
Extracting Equation Items#
We can trivially extract equation items from a single OU-XML XML document object:
from xml_utils import unpack
def get_equation_items(root, typ='//Equation'):
"""Extract equations from an OU-XML XML object."""
tree = etree.ElementTree(root)
# Return the mathml and the path
return [(tree.getpath(eq), unpack(eq)) for eq in root.xpath(typ)]
What do we get?
get_equation_items(root)[:3]
[('/Item/Unit[2]/Session[1]/Section[2]/Equation',
b'<Equation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><MathML><math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mmultiscripts><mrow><mi>X</mi></mrow><mprescripts/><mrow><mi>Z</mi></mrow><mrow><mi>A</mi></mrow></mmultiscripts></mrow></math></MathML></Equation>'),
('/Item/Unit[3]/Session[3]/Section[3]/ITQ[1]/Answer/Equation',
b'<Equation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Image>K<sup>+</sup>, Ca<sup>2+</sup>, Al<sup>3+</sup>, S<sup>2-</sup>, F<sup>-</sup> and Br<sup>-</sup></Image></Equation>'),
('/Item/Unit[6]/Session[1]/Section/Equation',
b'<Equation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><MathML><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle mathvariant="normal"><mrow><mstyle mathvariant="normal"><mrow><mi>C</mi><mi>u</mi><mo>(</mo><mi>s</mi><mo>)</mo><mo>+</mo><msub><mrow><mn>4</mn><mi>H</mi><mi>N</mi><mi>O</mi></mrow><mrow><mn>3</mn></mrow></msub><mo>(</mo><mi>a</mi><mi>q</mi><mo>)</mo></mrow></mstyle><mo>=</mo><msub><mrow><msub><mrow><mstyle mathvariant="normal"><mrow><mi>C</mi><mi>u</mi></mrow></mstyle><mo>(</mo><mstyle mathvariant="normal"><mrow><mi>N</mi><mi>O</mi></mrow></mstyle></mrow><mrow><mn>3</mn></mrow></msub><mo>)</mo></mrow><mrow><mn>2</mn></mrow></msub><mstyle mathvariant="normal"><mrow><mo>(</mo><mi>a</mi><mi>q</mi><mo>)</mo></mrow></mstyle><mo>+</mo><msub><mrow><mstyle mathvariant="normal"><mrow><mn>2</mn><mi>N</mi><mi>O</mi></mrow></mstyle></mrow><mrow><mn>2</mn></mrow></msub><mstyle mathvariant="normal"><mrow><mo>(</mo><mi>g</mi><mo>)</mo></mrow></mstyle><mo>+</mo><msub><mrow><mn>2</mn><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub><mi>O</mi><mo>(</mo><mi>l</mi><mo>)</mo></mrow></mstyle></math></MathML><Label>(5.1)</Label></Equation>')]
The equation is represented using MathML.
Let’s just get the <math>
part from one of the equations:
import re
def clean_equation(mathml):
"""Get cleaned equation mathml."""
mathml = mathml.decode() if isinstance(mathml, bytes) else mathml
#Replace \n mutliline
mathml = mathml.replace("\n", "")
# Extract the <math>...</math> component
eqs = re.findall(r'.*<MathML>(.*)</MathML>.*', mathml)
# We might have an image rather than MathML...
eq = eqs[0] if eqs else None
return eq
# Get an example equation element
# We want the Mathml (second item / index [1] in the returned 2-tuple)
eq = get_equation_items(root)[2][1].decode()
eq = clean_equation(eq)
eq
'<math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle mathvariant="normal"><mrow><mstyle mathvariant="normal"><mrow><mi>C</mi><mi>u</mi><mo>(</mo><mi>s</mi><mo>)</mo><mo>+</mo><msub><mrow><mn>4</mn><mi>H</mi><mi>N</mi><mi>O</mi></mrow><mrow><mn>3</mn></mrow></msub><mo>(</mo><mi>a</mi><mi>q</mi><mo>)</mo></mrow></mstyle><mo>=</mo><msub><mrow><msub><mrow><mstyle mathvariant="normal"><mrow><mi>C</mi><mi>u</mi></mrow></mstyle><mo>(</mo><mstyle mathvariant="normal"><mrow><mi>N</mi><mi>O</mi></mrow></mstyle></mrow><mrow><mn>3</mn></mrow></msub><mo>)</mo></mrow><mrow><mn>2</mn></mrow></msub><mstyle mathvariant="normal"><mrow><mo>(</mo><mi>a</mi><mi>q</mi><mo>)</mo></mrow></mstyle><mo>+</mo><msub><mrow><mstyle mathvariant="normal"><mrow><mn>2</mn><mi>N</mi><mi>O</mi></mrow></mstyle></mrow><mrow><mn>2</mn></mrow></msub><mstyle mathvariant="normal"><mrow><mo>(</mo><mi>g</mi><mo>)</mo></mrow></mstyle><mo>+</mo><msub><mrow><mn>2</mn><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub><mi>O</mi><mo>(</mo><mi>l</mi><mo>)</mo></mrow></mstyle></math>'
In Firefox at least, we can render the <math>
MathML markup text directly:
from IPython.display import HTML
# This works in firefox at least
HTML(eq)
To explore:
https://www.geeksforgeeks.org/html5-mathml-display-attribute/ ?
other MathML parsers?
Adding Equations to the Database#
We can create a simple database table to index the equations and either add a “context text column” to that table, or reference it in a separate table. For now, let’s just munge it all together.
all_eqn_tbl = db["equations"]
all_eqn_tbl.drop(ignore=True)
all_eqn_tbl.create({
#"Alternative": str,
#"Description": str,
#"Label": str,
#"SourceReference": str,
#"Image": str,
#"MathML":str,
#"TeX": str,
"equation": str, # This is the raw XML for the object
"xpath": str,
"typ": str,
"search_context_path": str,
"_id": str
}, pk=("_id", "xpath"))
# Note that in this case the _id is not unique
# because the same id may apply to multiple los
# The _id is a reference for joining tables only
all_eqn_context = db["equations_context"]
all_eqn_context.drop(ignore=True)
all_eqn_context.create({
"search_context": str,
"search_context_path": str,
"_id": str
}, pk=("_id", "search_context_path"))
# Enable full text search
# This creates an extra virtual table (glossary_fts) to support the full text search
db[f"{all_eqn_context.name}_fts"].drop(ignore=True)
db[all_eqn_context.name].enable_fts(["search_context", "search_context_path", "_id"], create_triggers=True)
<Table equations_context (search_context, search_context_path, _id)>
We can now add our equations, and their context, to the database.
from xml_utils import create_id
for row in xml_db.query("""SELECT * FROM xml;"""):
_root = etree.fromstring(row["xml"])
# Get the tree structure
tree = etree.ElementTree(_root)
eq_items = [ ("Equation", eq) for eq in get_equation_items(_root, "//Equation")]
eq_items.extend([ ("InlineEquation", eq) for eq in get_equation_items(_root, "//InlineEquation")])
_id = create_id( (row["code"], row["name"]) )
# From the list of equation items,
# create a list of dict items we can add to the database
eq_item_dicts = []
eq_context_dicts = []
_unique_contexts = []
for (typ, eq) in eq_items:
if eq[1]:
# We can unpack and extract items from the XML
eq_dict = {"equation": clean_equation(eq[1]),
"xpath": eq[0],
"typ": typ,
"search_context_path":navigational_context(eq[0]),
"_id": _id}
eq_item_dicts.append(eq_dict)
search_context_path = navigational_context(eq[0])
if eq[1] and search_context_path not in _unique_contexts:
_unique_contexts.append(search_context_path)
search_context = unpack( _root.xpath(navigational_context(eq[0]))[0] ).decode()
eq_context_dicts.append({"search_context_path": search_context_path,
"_id": _id,
# It might be better to flatten than unpack
# For now, unpack lets us render the XML
"search_context": search_context})
# Add items to the database
db[all_eqn_tbl.name].insert_all(eq_item_dicts)
db[all_eqn_context.name].insert_all(eq_context_dicts)
We can now search for equations by context:
from xml_utils import fts
# Sample query
q = 'steam hydrogen oxygen'
example_eq_search = fts(db, "equations_context", q)
example_eq_search
search_context | search_context_path | _id | |
---|---|---|---|
0 | <ITQ xmlns:xsi="http://www.w3.org/2001/XMLSche... | /Item/Unit[9]/Session[3]/ITQ[2] | 884164a46f4066c6b26894c812484c74ab2e8531 |
We can display that context:
from IPython.display import Markdown
from xml_utils import ouxml2md
def display_result_md(search_context):
"""Render Markdown for result path."""
_md = ouxml2md( search_context )
# Hack because Sphinx renders errors
_md = _md.replace("###", "HEADER: ")
_md = _md.replace("####", "SUBHEADER: ")
display(Markdown(_md))
example_eq_search.drop_duplicates(subset="search_context_path")["search_context"].apply(display_result_md)
HEADER: # Question
The combination of sulfur dioxide with oxygen, and the decomposition of steam into hydrogen and oxygen are both reactions of great potential value. These reactions and their equilibrium constants at 427oC (700K) are as follows. 2SO2(g)+O2(g)=2SO3 (g)K=106 mol−1 litre 2H2O(g)=2H2(g)+O2(g)K=10−33 mol litre−1
Write expressions for the equilibrium constants of the two reactions.
When the two reactions are attempted at 700K, neither seems to occur. Which of the two might be ‘persuaded’ to proceed at this temperature, and what form might your persuasion take?
HEADER: # Answer
The equilibrium constant of the first reaction, K1, is given by K1=[SO3(g)]2[SO2(g)]2[O2(g)] That of the second, K2=[H2(g)]2[O2(g)][H2O(g)]2 The data show that K2 is tiny: at equilibrium, the concentrations of the hydrogen and oxygen in the numerator (the top line of the fraction) are minute in comparison with the concentration of steam in the denominator (the bottom line of the fraction). So in a closed system at 700 K, significant amounts of hydrogen and oxygen will never be formed from steam.
By contrast, K1 is large, so the equilibrium position at 700 K lies well over to the right of the equation, and conversion of sulfur dioxide and oxygen to sulfur trioxide is favourable. The fact that the reaction does not occur must be due to a slow rate of reaction. We may therefore be able to obtain sulfur trioxide in this way if we can find a suitable catalyst to speed up the reaction. A suitable catalyst is vanadium pentoxide, V2O5, and at 700 K, this reaction is the key step in the manufacture of sulfuric acid from sulfur, oxygen and water.
0 None
Name: search_context, dtype: object
We can also display the equations that are described in that context by using the the search context path to join the equation context table with the equations table:
pd.read_sql(f"""SELECT e.* FROM equations e, equations_context_fts
WHERE equations_context_fts MATCH {db.quote(q)}
AND e._id=equations_context_fts._id
AND e.search_context_path=equations_context_fts.search_context_path;
""" , db.conn)["equation"].apply(lambda eq: display(HTML(eq+"<hr/>")));