MathML in Opera 9.5

Starting from friday, MathML is enabled by default in Opera 9.5 weekly builds. Featurewise it is slightly behind (not much, mainly mmultiscripts element was sacrificed and operator dictionary is gone) UserJS solution as we currently support MathML for CSS profile rather then MathML 2.0, but native support is much faster, does not affect DOM and does not break CSS selectors, so it is more reasonable solution in long term perspective.

MathML in Webkit?

Not sure where this activity will end up, but there is an attempt to enable basic MathML support in Webkit. Having certain subset of MathML working interoperably in multiple browsers would help that language to gain momentum.

Font subsetting in Prince formatter

YesLogic released new revision of Prince 6.0, it supports now TrueType font subsetting. Instead of embedding whole font it embeds now only subset which is actually used in document, this significantly cuts down size of produced PDF documents.

MathML Basic is ready for implementation

During summer face to face meeting, Math WG agreed to make further changes in MathML for CSS profile. Those chages mainly affect usage of mrow elements in nested layout schemata (mrow is now mandatory in certain nested expressions) and make it easier to handle MathML in CSS environment.

New working draft was released recently and looks quite realistic. Experimental support of profile in Opera 9.5 should not be far away now.

Inline-table baseline handling

Some time ago we asked CSS WG to give us better control over vertical alignment of inline expressions. It was necessary to align MathML multiline layout schemata, such as munderover, properly. So far we got table-baseline property that is expected to end up in one of CSS3 modules (maybe tables). It is implemented in Kestrel (called -o-table-baseline) and makes it easier to format math formulae and Ruby in Opera.

Formally property is defined as follows:  Name: -o-table-baseline Value: <integer> | inherit Initial: 1 Applies to: inline-tables Inherited: no Percentages: N/A Media: visual Computed value: specified value (except for initial and inherit) The '-o-inline-table-baseline' property determines which row of a inline-table should be used as baseline of inline-table. <integer> Baseline of nth row (as determined by the integer value) is baseline of inline-table. Value 1 corresponds to first row and is initial value. Negative values are allowed, -1 corresponds to last row of inline-table, -n stands for nth line from the bottom. If absolute value of property is larger then number of rows in inline-table then initial value should be used instead. If value is 0 then bottom margin edge of inline-table should be treated as baseline. 

New UserJS for MathML

Updated UserJS for MathML is ready. Complex indices, fences and mo tokens are supposed to be handled better now. Two different implementations are available for testing (XSLT one is larger but faster). Later we will keep the one that works better.

MathML in CSS Environment

Consistently integrating MathML into CSS environment is a tricky process that will take some time. It requires changes on both MathML and CSS sides. One also needs to ensure that appropriate CSS extensions are implemented.

MathML part of story is more or less done. I took some time to design MathML profile that can work in CSS environment and we had to sacrifice mmultiscripts construction to make it work, but whatever left is realistic and can be formatted with default CSS stylesheet. Having such profile significantly reduces implementation barier for browsers, so it is likely to speed up adoption of MathML. It was supposed to be branded as MathML Basic, but it was not approved by WG and we ended up with name that is likely to confuse average user — MathML for CSS profile.

On CSS side progress is slow, partly due to lack of interest in issue from other browser vendors (MSIE seems to count on ECMA OMML, Mozilla has different strategy, Apple remains silent). So far only one new property was added to CSS3, it provides better control over vertical alignment of inline-tables and currently is used to ensure proper baseline alignment of MathML mover, munderover and msubsup layout schemata. Properties from CSS3 borders and backgrounds module can be used to handle stretchy delimiters until better solution will be found.

MathML in Opera and Safari

While working on CSS formattable MathML profile, I noticed that MathML actually admits serialization that one could feed to any CSS2.1 rendering engine (including Opera 9, Prince formatter 6, Safari 3 and Mozilla browsers based on Gecko 1.9a3 or later). Here is demo page.

While this particular serialization sucks (mostly due to workarounds for browser bugs and lack of some crucial CSS3 stuff that is not yet implemented in browsers), generally speaking CSS rendering engines have necessary functionality to handle complex inline expressions and if we'll manage to ship reasonable MathML profile that works in CSS environment then barrier for adoption of MathML on web will be reduced significantly.

The trick with serialization above is that CSS can handle many MathML layout schemata provided that order of child elements follows their inflow order so for instance  radix radicand ]]> would work while  radicand radix ]]> is not easy to handle with CSS. One can rewrite MathML example above as  radicand radix radix radicand ]]> Then for native MathML formatters it is equivalent to  radicand radix ]]> due to fallback behaviour that maction has, while CSS formatters can pick elements in order that matches normal flow. By applying such cyclic permutation to mover, munderover, mroot and msubsup one can get MathML serialization that is CSS2.1 formattable. Here is example

It is one bug away from working in Safari (baseline alignment of inline-tables is broken) so the demo page mentioned in the beggining of post relies on extra workarounds for that bug.

Note that with CSS3 named flows one could avoid maction monkey business, but browsers are not there yet. Once we get there then adding support for appropriate MathML profile should be easy for most of web browsers.

Mathematical formulae in new dimension

One should be crazy to expect mobile browsers to be able to render mathematical formulae, especially when most of desktop browsers fail miserably on any kind of mathematical markup. But is seems that things look completely differently when you look at them from a higher dimension.

Upcoming new version of Opera Mini turned out to be capable of handling complex inline layouts, including CSS formatted math formulae. It managed to handle quite sophisticated stress tests much better then some of its desktop friends do.

Biodiversity

Frogs and merlins were followed by lizards, foxes and seamonkeys. Now whole webfauna is represented. Time to send those annoying explorers back to Redmond, after all they only pollute CSS flow.

But, seriously, there are good news. David Baron recently implemented inline-blocks and inline-tables in Gecko. They work fine in recent nightly builds of Seamonkey and XML MAIDEN documents now are formatted properly in Gecko based browsers. CSS2.1 becomes more interoperable and in long term perspective we can count on it.

MathML and CSS

There are plans to modify MathML and extend CSS3 spec to make MathML suitable for usage in XML/CSS environment. Now when on one hand Bert Bos joined Math working group and on the other hand working in Opera I have opportunity to coordinate efforts with Håkon Wium Lie and Morten Stenshorne, it is easier to do.

The main problems that we have on CSS side are: insufficient control over vertical alignment of complex inline expressions such as inline-tables with multiple rows (according to Morten it is not difficult to solve this particular problem), lack of mechanism to control stretching of glyphs or any equivalent functionality that could be used to control stretching of math delimiters and stretchy operators (CSS3 image borders and/or stretchy backgrounds come to mind, not a perfect solution, but still usable until something better will be available) and limited scope of selectors and generated content, which makes it difficult to apply complex formatting to MathML formulae (something like CSS3 ::outside pseudo-element would be necessary, let us hope it will not be steamrolled out of existence by XBL fanboys). Basically these three issues are solvable (only ::outside might be difficult to get).

And of course we have multiple problems on MathML side. Order of children in presentational elements like mover, munderover, mmultiscripts, mroot does not match their in-flow position and makes formatting of such elements more difficult. Stretchy operators, delimiters and accents are not marked explicitly, many formatting related properties are inherited from operator dictionary, that makes matching of such operators using selectors and proper formatting impossible. Presentational elements like mpadded, mspace, mstyle and many presentational attributes often duplicate CSS functionality in CSS incompatible way. These issues are solvable, but require consensus within working group.

Joining Math WG

Recently I joined W3C Math working group, where I will represent Opera Software. I am not the biggest supporter of MathML and often am considered among its opponents, however there are a lot of thing that can be done on MathML side and solving the existing problems requires some kind of coordination between Math WG, CSS folks and browser developers.

The intention is to address MathML/CSS compatibility issues that would make integration of MathML with the rest of technologies supported by browsers much easier. Today it is hard to judge what the actual outcome will be, but in overall I am optimistic as it seems that essential part of MathML can be reformulated in more CSS friendly manner and with a few CSS3 extensions one may merge it in XML/CSS framework.

Once the technical issues will be resolved and realistic spec will be available I think browsers will be able to reconsider their position on MathML support. Not sure whether this will finally bring more MathML content to web as being quite verbose MathML distracts essential part of potential users especially those from LaTeX community, but in the same time MathML3 may have both XML and non-XML input syntax like RELAX NG and XQuery have, so if successful this step may provide some kind of bridge between LaTeX and XML communities.

Moving to Osland

I am moving to Opera Software. Decision was made after e-mail conversation with Håkon Wium Lie, we again discussed issue of using CSS for formatting mathematics, and it seems that our views on topic are quite close. For me it is opportunity to switch from rather abstract research, to something that has actual applications in real life. I will be involved in core quality assurance and standards related activities.

No joy with HTML5

There was quite a long discussion on WHATWG list, some people approved approach, some complained about formatting of radicals and fences in CSS, some suggested to cut down proposal and limit scope to few basic constructs like fractions. However at the end of the day guy bossing on the top of WHATWG concluded that it is not worth to care about.

I am not particularly concerned. After all problem is not in the parser, but in rendering engine. Thus what is actually important is CSS side of the story. On that side we see that rendering engines are gradually growing stronger and it is clear that at some stage they be powerful enough to handle maths regardless progress (or lack of such) made on markup side. At that stage we can send all those kids preaching what is good and what is bad for web (mathematics is useless, tag soup, script-tease and dancing bears are cool of course) to hell and address our needs ourselves (actually we are slowly addressing them already in small projects, just results are not as good as they could be with wide support from browser vendors).

XML MAIDEN 2.1

As promised earlier, new version of XML MAIDEN that fixes issue with radicals just landed. Updated DTDs, style sheets and documentation can be found on XML MAIDEN site.

Mathematics in HTML5

Up to now, I did not pay much attention to HTML5 activity. After using XML for some time, I don't really understand what is the point in reviving weird tagsoup, bundled with annoying script-tease and optimizing it for broken MSHTML parser. Sometimes however there is something interesting going on.

Juan R. González-Álvarez raised issue of adding mathematical markup to HTML5. This sparked discussion. It is not clear whether this discussion will lead to something useful, but taking into account Håkon Wium Lie's reply I think it makes sense to submit concrete proposal. Let us see results.

XSL-TeX Experiment

The idea behind XSL-TeX was to develop experimental LaTeX like mathematical markup that fits well in X(HT)ML + XSLT + CSS, X(HT)ML + XSLT + XSL FO and X(HT)ML + JS + CSS frameworks. Basically it is just experiment intended to find out whether it is worth to send non-XML input to browsers, as well as CSS and XSL formatters. It is inspired by CanonML initiative.

The basic conceptual problem with LaTeX like input syntax is that there is no natural way to apply CSS selectors, Xpath, and DOM Core to LaTeX like markup due to highly non-uniform syntax where one has no analog of hierarchical tree with elements and attributes. In addition such a markup is unlikely to be suitable for client side usage as XSLT and/or JS needed to transform LaTeX like input into something suitable for rendering in browser (X(HT)ML+CSS, MathML, SVG) makes rendering slow and non incremental.

Due to this reasons XSL-TeX inititaive was abandoned almost as quickly as arised. However some resources (convertors, style sheets) for handling XSL-TeX are available and if someone is interested in taking non-XML approach (s)he can reuse these style sheets.

Below is brief summary of input syntax and available tools.

Input Syntax (Brief summary)

Inline formula:
$E = mc^2$
Displayed formula:
$$E = mc^2$$
Subscripts:
Base_{subscript} or Base_s
Superscripts:
Base^{superscript} or Base^s
Stacked indices:
Base_{sub}^{sup} or Base^{sup}_{sub} or Base_s^s or Base^s_s
Presubscript:
\inf{sub}Base
Presuperscript:
\sur{sup}Base
Stacked prescripts:
\pre{sup}{sub}Base
Fractions:
\frac{numerator}{denominator}
Overscripts:
\stackrel{over}{Base}
Underscripts:
\stackrev{Base}{under}
Overbrace:
\overbrace{Base}
Overbrace with inscription:
\overbrace{Base}^{over}
Underbrace:
\underbrace{Base}
Underbrace with inscription:
\underbrace{Base}_{under}
Square root:
Operators:
\ope{U} \ope{U}_{sub} \ope{U}^{sup} \ope{U}_{sub}^{sup}
Common Operators:
∑ ∑_{sub} ∑^{sup} ∑_{sub}^{sup}
Fences:
\left( content \right)
\left[ content \right]
\left{ content \right}
\left| content \right|
\left. content \right|
Fences with indices:
\left( content \right)_{sub}
\left[ content \right]_{s}
\left{ content \right}_{sub}^{sup}
\left| content \right|^{sup}_{sub}
\left. content \right|_0^1
Vectors:
\vector{entry}{entry}{entry}{entry}
Matrices:
\matrix{
\row{cell}{cell}{cell}
\row{cell}{cell}{cell}
\row{cell}{cell}{cell}
}
Determinants:
\det{
\row{cell}{cell}{cell}
\row{cell}{cell}{cell}
\row{cell}{cell}{cell}
}
Cases:
\cases{
\case{value}{scope}
\case{value}{scope}
}
Bold:
\mathbf{Bold}
Italic:
\mathsl{Italic}
Strike:
\strike{Strike}
Overline:
\overline{overline}
Underline:
\underline{underline}
Linebreak:
\\
Softbreak (wrap point):
\wrap
Character escapes:
\{ \} \_ \^ \s (for \$) \b (for \)

Style sheets for XSL-TeX

xsltex.xslt
XSLT style sheet that transforms XSL-TeX into XML and is suitable for rendering XSL-TeX in Opera 9. See examples of using this style sheet. It is not recommended to use this method for delivery (transforamtions are better to be done on author side).
xsltex.js
JavaScript that transforms XSL-TeX into XML. It can be used for rendering XSL-TeX in Opera 9. Some examples of using this JS are available. As in case of XSLT, transforamtions are better to be done on author side.

Converters

XSL-TeX in Action
Online converter for transforming XSL-TeX into XML and/or XHTML. Output can be formatted in Opera 9 and/or Prince 5 using CSS. There are some examples of XHTML output produced by this convertor. Can be used offline.
tex2xml.js
Macro for EmEditor that can be used to convert XSL-TeX to XML during authoring process. It converts selected part of mathematical expression only. Works in EmEditor version 5 (or 4 Professional). Can be enabled from
-[Macros]
--[Select...]
xsl-tex.js
Macro for EmEditor that can be used to convert XSL-TeX to XML during authoring process. It converts selected document fragment containing mathematical expressions. Works in EmEditor version 5 (or 4 Professional). Can be enabled from
-[Macros]
--[Select...]

Markup for radicals in XML MAIDEN 2.0 is not particularly good, nor is formatting. So we plan to improve both and release new version of DTD to address this issue.

Two interoperable implementations

Math WG will probably never stop surprising me. To be fair it is not the only working group in W3C that gets everything wrong, but in case of testsuite they managed to beat everyone. Their testsuite admits AT MOST two interoperable implementations.

That is if browser is MSIE (test="system-property('xsl:vendor')='Microsoft'") they transform testcases in proprietary markup explicitly optimized for particular versions of IE, if browser is Gecko based (test="system-property('xsl:vendor')='Transformiix'") they transform content MathML into presentational one that Mozilla can handle, while any other browser gets complete junk. Very smart.

UserJS for MathML

Generally speaking I don't like script-tease as such. But UserJS is a different story, it gives more power to user. One can use it to fix annoying websites, make browsing more convenient and even add new features to browser.

It is easy to write, easy to install (actually no installation needed) and quite powerful. But in the same time it is slow. One of the examples of using UserJS is our MathML implementation (not sure whether it is feature or fix for broken websites). It is not finished yet as we are waiting for couple of bug fixes in Presto rendering engine, after which formatting of mathematical formulae in Opera is expected to be much easier. Once those bugs will be fixed we will rewrite UserJS to make it more effective.

At this stage it does not handle stretchy delimiters, radicals and accents well, and has some problems with positioning indices. Most of those issues (excluding accents that would require support for Unicode combining diacritical marks in Presto) will be fixed after rewrite.

Formatting ISO 12083 with CSS

ISO 12083 Electronic Manuscript Standard is a ISO/NISO/ANSI standard for electronic articles and books. Among other things it defines markup for mathematical formulae that was used to embed math expressions in SGML documents. It can be used in XML realm as well. It turns out that if we limit ourselves to certain subset of ISO 12083 mathematics DTD then ISO 12083 formulae can be formatted with CSS. As you can see from sample articles, subset can handle nested fractions, indices, fences and simple under/overscripts.

Great! Now we have browser that can render XML MAIDEN documents. Morten Stenshorne recently fixed several CSS related bugs and latest technical preview of Opera 9 (Merlin) can handle math stylesheets pretty well. It is first browser (and second CSS implementation after Prince) that managed to handle rather complex CSS stress and torture demo pages properly. Many thanks to Morten, Moose and Håkon for gifts like this.

XML MAIDEN 2.0

We are getting closer to solving main formatting related problems and almost managed to write default style sheet. The idea behind it was to write single CSS2.1 style sheets that would handle arbitrary complex math formulae obtained by combining and nesting subscripts, superscripts, prescripts, under and over scripts, fractions, operators, matrices, vectors, determinants, cases, fences, radicals and other mathematical expressions.

That would allow any rendering engine with CSS2.1 support to render mathematical expressions. From the first glance it was not obvious whether writing such style sheet is possible at all, but now we almost got it. And it works fine in Prince 5.

The bad news however is that browsers do not have (yet) sufficient CSS2.1 support to process style sheet properly. Flawless support for inline-blocks and inline-tables is crucial for proper functionality of the present style sheet.

In the same time rewrite of DTD is almost finished and XML MAIDEN 2.0 DTD are available from OASIS schema registry (there are plain and annotated versions). We probably need better markup for radicals and fence markers, but at this stage current markup should be sufficient.

Style sheets, markup examples and documentation can be found on XML MAIDEN site.

New domain name

We now have new math stylesheets and almost finished rewrite of DTD. We plan to release both soon. We also plan to redesign project's website and move it to new location.

Australian frog won CSS race

We got new rendering engine that seems to address our needs. Yes, new. It is not Presto, nor Gecko, but new CSS formatter developed by small Australian company.

Recent version of formatter (Prince 5) has proper implementation for inline-blocks, inline-tables and every other part of CSS2.1 that we use for formatting maths. All we miss is frog logo, used in previous version of formatter. Logo is not a problem of course and now it should be much easier to write new style sheets for math formulae and by the way fix our DTD (hopefully nesting limitation will be removed). Many thanks to Michael Day and Xuehong Liu for delivering such a great CSS implementation.

XML MAIDEN 1.1

Experiment continues. We made some changes in DTD to ensure that fence markers are marked explicitly. Originally we planned to rely on adjacent sibling selectors to distinguish indices that apply to fences from the rest, but it seems that those selectors ignore text nodes and thus are not suitable for out purpose. As usually plain and annotated DTDs are available from OASIS XML registry. We still did not resolve issue with nesting limitations. We might need to rewrite DTD from top to bottom to resolve those, but we don't want to do it today as CSS support in browsers is still weak and we can't check whether style sheets designed for new DTD work properly. So DTD rewrite is postponed until better times.

XML MAIDEN 1.0

Yep, in spite of numerous problems on CSS side, we decided to release DTD. It imposes severe nesting limitations, but basic functionality is there and it is better then having nothing. Simple indices, fractions, under/over scripts and matrices should work (at least in Opera, but once other browsers will implement CSS2 it will work cross browser). Smart resizing of fences is not implemented yet, and there are no radicals either (quite a moderate approach, use power notations instead). Annotated version of DTD is also available.

Prove of the concept

No progress on browsers side. But we can't wait forever. So we plan to publish XML MAIDEN 1.0 DTD in near future. Taking into account that CSS support in browsers is too weak at the moment markup will support only basic math expressions like simple indices, simple fractions (nesting of those will be limited), matrices and under/over scripts. Deep nesting will not be allowed, but hopefully we will be able to remove nesting limitations in future. It will be just a prove of the concept.

Matrix revolution

We had problem with formatting matrices, in particular drawing stretchy fences around matrix. Steve White suggested to wrap matrices in extra table and simulate fences by borders. It works, but requires using slightly redundant markup for matrices. Later Mark Schenk published article illustrating how to deal with matrices, without using redundant markup, he relies on generated content and relative positioning. We used Mark's solution in some style sheets, but later found another solution that relies on generated content and border collapse mechanism.

Fighting CSS bugs

Recently we had e-mail conversation with Håkon Wium Lie. We discussed possibility of using XML/CSS for publishing scientific articles on web, he seems to like idea and promised to speed up fixing of appropriate CSS related bugs in Opera to ensure that their rendering engine can handle CSS formatted math formulae properly.

Problems with CSS rendering engine

Damn. We are talking about using XML/CSS for publishing scientific web documents, but it seems that we don't have any browser with decent CSS support. The crucial thing is to have proper implementation of inline-tables as most of nontrivial inline expressions like fractions, matrices, under/over scripted expressions correspond to inline-table in CSS visual formatting model.

Anything based on Gecko fails due to bug 18217, it seems that Netscape developer responsible for that part of layout did not return from vacations. In MSIE you can force HTML table to behave like inline-table, by setting its display property to inline, but it does not work well and in addition we need something that can be used in XML not just HTML. Opera seems to support inline-tables, but not perfectly (there are some vertical alignment problems).

Everything this means that work on XML MAIDEN markup will be postponed until better times, that is until browsers will improve their CSS implementations or maybe until bunch of W3C bureaucrats will fix their beloved MathML, so we could avoid reinventing wheel.

To hell with MathML

Some folks pointed out that there is MathML standard for embedding mathematics in webpages. It would be nice if we could reuse it somehow, but unfortunately it is not something that one could call markup language, it is rather some misconception. Apart of numerous technical flaws (after all being developed by W3C, this standard does not fit in W3C own XML/CSS/DOM framework, that they consider as basis of future web architecture) it also uses awful syntax (each character is enclosed in redundant token elements) that is not only unsuitable for manual coding (yes, some of us prefer to code XML, XHTML, CSS and similar stuff in plain text editor) but is completely unreadable in case you want to inspect source. We may consider using MathML in future if W3C will fix it (that is switch to reasonable syntax and integrate with the rest of W3C standards), but at this stage it is complete nonsense.

From SCALA to XML MAIDEN

Those who tried to use CSS for formatting mathematical formulae would definitely notice two things. First CSS support in current browsers is too weak to handle complex inline layouts, especially deeply nested mathematical formulae. Second existing markup languages such as ISO 12083 are not suitable for CSS formatting as many things in those languages are not marked explicitly in the way suitable for further formatting with CSS, order of subexpression often does not match their expected in-flow sequence and implied visual formatting model does not match that used in CSS.

In Razmadze Mathematical Institute, we were using alternative XML based markup, called SCALA (Scientific Content Authoring Language) for some time, to mark up math formulae in CSS friendly way. That markup is not properly documented and does not work well either. However basic design principles behind language are valuable and there is plan to rewrite it and turn into reasonable mathematical markup for XML/CSS framework. Project will be called XML MAIDEN (XML Manuscript Authoring, Interchange and Delivery Environment) and everybody who is interested in embedding mathematical formulae in web documents is encouraged to join effort.