Subscribe to RSS Subscribe to Comments

All font metadata, all the time

A recurring obstacle within the problem of “making fonts work better for desktop Linux users” (see previous posts for context) is how much metadata is involved and, thus, needs to be shuffled around so that it can be exposed to the user or accessed by system utilities. People hunting for a font to install need to see metadata about it in the package manager / software center; people selecting a font from their collection need to see metadata about it in the application they’re using (and within the font-selection dialogs, if not elsewhere in the app’s UI). The computing environment needs to be aware of metadata to set up the correct rendering options and to pass along the right information to other components above and below in the stack.

As it stands now, there are several components on a GNOME-based system that already do track metadata about fonts: Pango, Fontconfig, AppStream, HarfBuzz, GTK+, etc. There are also a ton of metadata fields already defined in the OpenType font format specification. Out of curiosity, I decided to dig into the various data structures and see which properties are tracked where.

I put the results into a Markdown table as a GitLab “snippet” (a.k.a., gist…) and posted it as this font metadata cross-reference chart. For comparison’s sake, I also included the internal data structures of the old Fontmatrix font manager (which a lot of folks still reference as a high-water mark) and the … shall we say “terse” … font ontology defined in Nepomuk. I basically used the OpenType spec as the reference point, although some of the metadata in that column (the left-most) is more of a “derived property” than an actual field. Like “is this font a color font” — you might determine that by looking for one of the four possible color font tables (sbix, CBDT, SVG, or COLR), but there’s not a “color included” bit, exactly.

There are just over 50 properties, which is a soft number because it includes a lot of not-quite aligned items, such as where OpenType stores a separate field for the font vendor name and the font vendor’s URL, and some properties that are overly general in one or more implementations. That, and it includes several of the derived properties mentioned just above.

You’re welcome to suggest additions. I did focus on the user-visible metadata world, which means the table excludes some system-oriented properties like the em size and baseline height. My interest in primarily in things that directly tie in to how users select fonts for use. I’m definitely not averse to pull requests to add that system stuff to the table, though. That’s why I put it on a GitLab snipgist — personally, I find Markdown tables awful to work with. But I digress.

No idea where this goes from here. There was some talk around the Fonts BoF at GUADEC this past July about whether or not GNOME ought to cache font metadata in some sort of user-local database. My recollection is that this would be helpful in several places; certainly as it stands now, for example, the awesome work that Matthias Clasen has done reengineering the GTK+ font explorer seems to require mining each font binary, so saving those results for reuse seems like a clear win. If Pango and the top-level application are also parsing the font binary just to show stuff, that’s overkill.

In my impromptu GUADEC talk about font selection, I noted that other GNOME “content objects” like audio, video, and image files all seems to get their metadata taken care of by Tracker, and posited that Tracker could take that task on for fonts as well. Most developers there didn’t seem to think this was a great fit. Some people have issues with Tracker anyway, but Tracker is also (by design) a “user-level” service, so putting font metadata into it would not solve the caching needs of the various libraries and system components.

Nevertheless, I did toy with the idea of writing up a modern Nepomuk ontology for font objects based on the grand-unified-GitLab-snipgist table (Nepomuk being the standard that Tracker uses). That may just be out of personal self-interest in teasing out structure from the table itself. It might be useful, for example, to capture more structure about which properties are related in which ways — such as where they are nested within other properties, are duplicates, or are mutually exclusive. And a lot more for which Markdown isn’t remotely useful.

No comments yet. Be the first.

Leave a reply

Based on FluidityTheme Redesigned by Kaushal Sheth