-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
mathtext: add support for unicode mathematics fonts #31064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: text-overhaul
Are you sure you want to change the base?
mathtext: add support for unicode mathematics fonts #31064
Conversation
8009a09 to
b8c28d2
Compare
|
I opened a new PR, because this one is based on the text-overhaul branch and I messed up rebasing the old one. @QuLogic: Would you kindly take a look at this? Is this something you would consider for merging? For further discussion in case you don't reject this feature alltogether: How should the math font be configured? This also depends a bit on how prominent it should be visible. I could not think of a good way without introducing a new parameter in rcparams. |
| 0x22d3: 0x22d2, | ||
| } | ||
|
|
||
| unicode_math_lut: dict[str, dict[CharacterCodeType, CharacterCodeType]] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these are 1-to-1 mappings of a block; I wonder if there is a more compact representation that could be used? Something like:
# (start, end, new_start)
# up digits
(0x30, 0x39, 0x30),
...
# bf latin lower case
(0x61, 0x7a, 0x1d41a),
maybe plus a small dictionary with some of the exceptions, depending on how they fit into the blocks.
The lookup table could be generated from those if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the way I generated the lookup table, offline:
- map the entire range
- fix missing/moved codepoints based on a smaller lookup table
At some point I did consider writing special mapping functions but I figured that a lookup table might be preferable for performance.
Do you prefer to generate the lookup table (for example on module load) instead of hardcoding the entire table?
| # handle digits | ||
| _alphabet_map = { | ||
| 'rm': 'up', | ||
| 'it': 'up', # convention! digits always upright - not handled in Parser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #29253 (comment) I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I tried to replicate the logic of StixFonts here. This would have to be fixed in the parser and in all font classes at once, I believe. The parser would have to explicitly distinguish between just the math environment and \mathit.
One might even consider to pass that information to _get_glyph and implement the italic vs non-italic logic for math environments there instead of doing that in the parser..
lib/matplotlib/_mathtext.py
Outdated
| # from here on: use the Math font | ||
| new_fontname = 'mathfont' | ||
|
|
||
| def _is_digit(codepoint: CharacterCodeType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hints are missing the return type, though the functions aren't really re-used, so I'm not sure they're any better than inlining in the if with an explanatory comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add the type hints for now, because I believe that the function might be useful in other places, too. Let me know if you prefer to inline it instead.
| } | ||
|
|
||
|
|
||
| class UnicodeMathFonts(TruetypeFonts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, math fonts should have tables with various layout metrics. We currently have those hard-coded in the various FontsConstantsBase subclasses, and they are likely incorrect for an arbitrary math font.
So this will likely need to parse this data out of the font and implement at least get_axis_height that was added in #31046, get_xheight maybe using #31050, and get_quad from #31110. But it is likely that you will want to refactor some of those remaining uses of the constants so that they fetch the information from the fonts as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree. Doing this may involve some refactoring though, because the FontsConstantsBase subclass could not be determined purely from fontname but would be dynamically populated based on the loaded OpenType font.
That said, I made some experiments locally. Unfortunately, Freetype does not parse the MATH table. We could use fonttools, which is a hard dependency anyway.
There are several open questions how to map the OpenType layout metrics to the legacy TeX-inspired variables used in mathtext. Does it make sense to postpone that to a separate PR and focus on the basics here?
Adds support for generic unicode OpenType mathematics fonts such as STIX Two Math or Cambria Math
a8083e3 to
5988fac
Compare
PR summary
supersedes #31048
Add basic support for generic unicode OpenType/TrueType mathematics fonts such as STIX Two Math, Cambria Math, DejaVu Math, etc.
Currently, mathematics text rendering through mathtext in matplotlib supports a hard-coded number of fonts (configured via
mathtext.fontset). Its design presumably predates the specification of mathematics alphabets in the unicode standard. While it is possible to configure custom fonts (mathtext.fontset: custom), this requires to set separate fonts for upright, italic, fraktur, double-struck, etc. variants -- which is fundamentally incompatible with the way modern mathematics fonts are designed.Unicode defines mathematical alphanumeric symbols as unique codepoints (see https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols), in contrast to different fonts all defining different styles for the same ASCII characters/codepoints.
One relatively modern way to render mathematical formulas uses mathematics fonts such as STIX Two Math or Cambria Math, Asana Math, etc.. For LaTeX, this is implemented in the unicode-math package.
Instead of choosing a font based on the style (as it is currently done in matplotlib) to render the same codepoints, this maps alphanumeric characters to different codepoints based on the style, and render them from a single font.
Shortcomings of the status quo:
This change implements functionality to use any installed unicode Opentype/Truetype mathematics fonts for use in mathtext in a portable way. Currently, this can be enabled by setting the rcparams
I could think of different ways to configure this, though.
Internally, I have implemented a separate class
UnicodeMathFonts(TruetypeFonts)to no interfere with the existing fontsets.Running the test currently requires STIX Two Math to be installed on the system. For that reason, I have added it to the test data. One may think about vendoring STIX Two Math or DejaVu Math via mpl-data instead.
Examples
PR checklist