Subscribe to RSS Subscribe to Comments

freesoftwhere.org

Font sidecars: dream or nightmare?

At DebConf this year, I gave a talk about font-management on desktop Linux that focused, for the most part, on the importance of making font metadata accessible to users. That’s because I believe that working with fonts and managing a font collection are done primarily through the metadata — like other types of media content (audio files, video files, even still images, although most people fall back to looking at thumbnails on the latter). For example, you need to know the language support of a font, what features it offers (either OpenType or variable-font options), and you mentally filter fonts based on categories (“sans serif”, “handwriting”, “Garamond”, tags that you’ve assigned yourself, etc.).

That talk was aimed at Debian font-package maintainers, who voluntarily undertake the responsibility of all kinds of QA and testing for the fonts that end users get on their system, including whether or not the needed metadata in the font binaries is present and is correct. It doesn’t do much good to have a fancy font manager that captures foundry and designer information if those fields are empty in 100% of the fonts, after all.

At the end of the session, a question came up from the audience: what do we do about fonts that we can’t alter (even to augment metadata) from upstream? This is a surprisingly common problem. There are a lot of fonts in the Debian archive that have “if you change the font, you must change the name” licenses. Having to change the user-visible font name (which is what these licenses mean) defeats the purpose of filling in missing metadata. So someone asked whether or not a Debian package could, instead, ship a metadata “sidecar” file, much like photo editors use for camera-raw image files (which need to remain unmodified for archival purposes). Darktable, Digikam, RawTherapee, Raw Studio, and others do this.

It’s not a bad idea, of course. Anecdotally, I think a lot of desktop Linux users are unaware of some of the genuinely high-quality fonts available to them that just happen to be in older binary formats, like Adobe Utopia. The question is what format a metadata sidecar file would use.

The wonderful world of sides of cars

Photo editors use XMP, the Extensible Metadata Platform, created by Adobe. It’s XML-based. So it could be adapted for fonts; all that’s needed would be for someone to draw up a suitable XML namespace (or is it schema? Maybe both?). XML is mildly human-readable, so the file might be enlightening early-evening reading in addition to being consumable by software. But it’s not trivial, and utilizing it would require XML support in all the relevant font utilities. And implementing new support for that new namespace/schema. I’d offhandedly give it a 4 out of 10 for solving-the-problemhood.

On the other hand … almost all font-binary internals are already representable in another XML format, TTX, which is defined as part of the free-software FontTools project. There are a few metadata properties that aren’t directly included in OpenType (and, for the record, OpenType wraps both TrueType and PostScript Type 1, so it’s maximal), but a lot of those are either “derived” properties (such as is-this-a-color-font) or might be better represented at a package level (such as license details). TTX has the advantage of already being directly usable by just about every font-engineering tool here on planet Earth.

In fact, theoretically, you could ship a TTX sidecar file that a simple script could use to convert the Adobe Utopia Type-1 file into a modernized OpenType file. For licensing reasons, the package installer probably cannot do that on the user’s behalf, and the package-build tool certainly could not, because it would trigger the renaming clause. But you could leave it as a fun option for the user to run, or build it into a font manager (or a office-application extension).

Anyway, the big drawback to TTX as a sidecar is that creating hand-crafted TTX files is not easy. I know it’s been done at least once, but the people involved don’t talk about it as a personal high point. In addition, TTX isn’t built to handle non-font-binary content. You could stuff extra metadata into unclaimed high-order name table slots, but roundtripping those is problematic. But for the software-integration features, I’d call it at least a 8 out of 10 possibility.

Another existing metadata XML option is the metadata block already defined by the W3C for WOFF, the compressed web-font-delivery file format. “But wait,” you cry, leaping out of your chair in excitement as USB cables and breakfast cereal goes flying, “if it’s already defined, isn’t the problem already solved?” Well, not exactly. The schema is minimal in scope: it’s intended to serve the WOFF-delivery mechanism only. So it doesn’t cover all of the properties you’d need for a desktop font database. Furthermore, the metadata block is a component of the WOFF binary, not a sidecar, so it’s not fully the right format. That said, it is possible that it could be extended (possibly with the W3C on board). There is a per-vendor “extension” element defined. For widespread usage, you’d probably want to just define additional elements. Here again, the drawback would be lack of tool support, but it’s certainly more complete than DIY XML would ever be. 6 out of 10, easily.

There are also text-based formats that already exist and cover some of what is desired in a sidecar. For example, the FEA format (also by Adobe) is an intentionally human-writeable format geared toward writing OpenType feature definitions (think: ligature-substitution rules, but way more flexible. You can do chaining rules that match pre-expressions and post-expressions, and so on). FEA includes formal support for (as far as I can tell) all the currently used name table metadata in OpenType, and there is an “anonymous” block type you could use to embed other properties. Tool support is stellar; FEA usage is industry standard these days.

That having been said, I’m not sure how widespread support for the name table and anonymous stuff is in software. Most people use FEA to write GSUB and GPOS rules—and nothing else. Furthermore, FEA is designed to be compiled into a font, and done so when building the font from source. So going from FEA to a metadata database is a different path altogether, and the GSUB-orientation of the format means it’s kind of a hack. If the tool support is good and people agreed on how to stuff things into anonymous blocks, call it 5.5 out of 10.

Finally, there’s the metadata sidecars used by Google Fonts. These are per-font-family, and quite lightweight. The format is actually plain-text Google protobuf, although the Google Fonts repository has them all using the binary file extension, *.pb. Also, the Google Fonts docs are out of date: they reference the old approach, which used JSON. While this format is quite easy to read, write, and do something with, at the moment it is also minimal in scope: basic font and family name information, weight & style, copyright. That’s it. Certainly the extension mechanism is easy to adopt, however; this is trivial “key”:”value” stuff. And there are a lot of tools capable of reading protobuf and similar structured formats; very database-friendly. 6 out of 10 or so.

Choose a side

So where does that leave us? Easy answer: I don’t know. If I were packaging fonts today and wanting to simply record missing metadata, unsure of what would happen to it down the line, I might do it in protobuf format—for the sake of simplicity. It’s certainly feasible to imagine that protobuf metadata files could be converted to TTX or to FEA for consumption by existing tools, after all.

If I was writing a sidecar specifically to use for the Utopia example from above, as a shortcut to turn an old Type-1 font into an OpenType font with updated metadata, I would do it in TTX, since I know that would work. But that’s not ultimately the driving metadata-sidecar question as raised in the audience Q&A after my talk. That was about filling in a desktop-font-metadata database. Which, at the moment, does not exist.

IF a desktop-font-metadata plan crystalizes and IF relying on an XML format works for everyone involved, it’d certainly be less effort to rely on TTX than anything else.

To get right down to it, though, all of this metadata is meant to be the human-readable kind. It’d probably be easier to forgo the pain of writing or extending an XML namespace (however much you love the W3C) and keep the sidecar simple. If it proves useful, at least it’s simpler for other programs and libraries to read it and, perhaps, transform it into the other formats that utilities and build tools can do more interesting things with. Or vice-versa, like fontmake taking anonymous FEA blocks and plopping them into a protobuf file when building.

All that having been said, I fully expect the legions of XML fans (fxans?) to rush the stage and cheer for their favorite option….

Comments

  1. August 18th, 2018 | 11:00 pm

    The format is less important than coming up with what’s to be described and making the result be machine readable, stable, extensible, interoperable and purple.

    It has to have a ttx representation, of course, and it helps if you can easily check for at least basic errors with widely-available tools.

    Minor pedantic nit, you can’t extend an XML namespace – an XML namespace really just identifies (names) a format, an XML vocabulary, like SVG, MathML, MusicML, etc. A schema defines the language that’s named by the namespace, and you can extend a schema.

    I’d rather see metadata live in the actual font than alongside it, whenever possible. They don’t get separated that way. When we did woff i was really hoping we’d see browsers support displaying Page Font Info and let you follow a link to ‘buy this font” or “download this free font” or “see the font designer’s Web page” whatever. Silly me. I don’t think browsers eve provide JavaScript access to that metadata right now. But if they did, and on the desktop, it’d be really helpful. It’s crazy that EULA conditions aren’t machine readable in any useful way.

    So i think i’d be into extending the woff stuff and defining (if necessary) what it looks like in ttx too.

  2. Nate
    August 19th, 2018 | 4:18 pm

    Well, I might not have explained the core question here as precisely as I’d intended to.

    The question at hand is not “how can we store font metadata?” As you note, the best way (by far) is to keep it in the binary.

    The question was “how can we _ship_ metadata in a package inside of which we’re not allowed to patch the font binary itself?”

    So my first train of thought there was to see what existing font-metadata-storage formats are out there, then assess whether any of them already fit the bill.

    It does sort of split into two paths at that point, however. If I’m including a metadata sidecar in order to let the user patch the binary (which is allowed if they don’t intend to redistribute it), TTX is easy. If I’m including a metadata sidecar in order for an application or utility to index a richer set of properties for the font, that’s a different question and I’d want to hear from the app developers.

    And, of course, I defer entirely to your XML wizardry; as I alluded to in a couple of side quips in the post, I never finished XML school and it shows. I am glad to hear that W3C folks care about the subject though.

  3. August 20th, 2018 | 11:05 am

    Well, i’m no longer at W3C :) but i can still care.

    i think it’s worth exploring the “we’re not allowed to patch the font binary itself” aspect here. For example, if a user can patch a font, maybe a Linux rpm or deb package can apply such a patch too. If not, we need a way to make that “sidecar” available to Web browsers as well as local applications. A third possibility you don’t mention is for the “sidecar” to be a font itself, with just the metadata. That way font applications can already read it.

Based on FluidityTheme Redesigned by Kaushal Sheth