Matplotlib project

This page contains the details of a technical writing project accepted for Google Season of Docs.

Project summary

Open source organization:
Matplotlib
Technical writer:
brunobeltran
Project name:
Improving feature discoverability by standardizing documentation of “implicit” types
Project length:
Long running (5 months)

Project description

Motivation

Historically, matplotlib's API has relied heavily on string-as-enum ""implicit types"". Besides mimicking matlab's API, these parameter-strings allow the user to pass semantically-rich values as arguments to matplotlib functions without having to explicitly import or verbosely prefix an actual enum value just to pass basic plot options (i.e. plt.plot(x, y, linestyle='solid') is easier to type and less redundant than something like plt.plot(x, y, linestyle=mpl.LineStyle.solid)).

Many of these string-as-enum implicit types have since evolved more sophisticated features. For example, a linestyle can now be either a string or a 2-tuple of sequences, and a MarkerStyle can now be either a string or a matplotlib.path.Path. While this is true of many implicit types, MarkerStyle is the only one (to my knowledge) that has the status of having been upgraded to a proper Python class.

Because these implicit types are not classes in their own right, Matplotlib has historically had to roll its own solutions for centralizing documentation and validation of these implicit types (e.g. the docstring.interpd.update docstring interpolation pattern and the cbook._check_in_list validator pattern, respectively) instead of using the standard toolchains provided by Python classes (e.g. docstrings and the validate-at-__init__ pattern, respectively).

While these solutions have worked well for us, the lack of an explicit location to document each implicit type means that the documentation is often difficult to find, large tables of allowed values are repeated throughout the documentation, and often an explicit statement of the scope of an implicit type is completely missing from the docs. Take the plt.plot docs, for example: in the ""Notes"", a description of the matlab-like format-string styling method mentions linestyle, color, and markers options. There are many more ways to pass these three values than are hinted at, but for many users, this is their only source of understanding about what values are possible for those options until they stumble on one of the relevant tutorials. A the table of Line2D attributes is included in an attempt to show the reader what options they have for controlling their plot. However, while the linestyle entry does a good job of linking to Line2D.set_linestyle (two clicks required) where the possible inputs are described, the color and markers entries do not. color simply links to Line2D.set_color, which fails to offer any intuition for what kinds of inputs are even allowed.

It could be argued that this is something that can be fixed by simply tidying up the individual docstrings that are causing problems, but the issue is unfortunately much more systemic than that. Without a centralized place to find the documentation, this will simply lead to us having more and more copies of increasingly verbose documentation repeated everywhere each of these implicit types is used, making it especially more difficult for beginner users to simply find the parameter that they need. However, the current system, which forces users to slowly piece together their mental model of each implicit type through wiki-diving style traversal throughout our documentation, or piecemeal from StackOverflow examples, is also not sustainable.

End Goal

Ideally, any mention of an implicit type should link to a single page that describes all the possible values that type can take, ordered from most simple and common to most advanced or esoteric. Instead of using valuable visual space in the top-level API documentation to piecemeal enumerate all the possible input types to a particular parameter, we can then use that same space to give a plain-word description of what plotting abstraction the parameter is meant to control.

To use the example of linestyle again, what we would want in the LineCollection docs is just:

  1. A link to complete docs for allowable inputs (a combination of those found in Line2D.set_linestyle and the linestyle tutorial).
  2. A plain words description of what the parameter is meant to accomplish. To matplotlib power users, this is evident from the parameter's name, but for new users this need not be the case.

The way this would look in the actual LineCollection docs is just python """""" linestyles: `LineStyle` or list thereof, default: :rc:`lines.linestyle` ('-') A description of whether the stroke used to draw each line in the collection is dashed, dotted or solid, or some combination thereof. """""" where the LineStyle type reference would be resolved by Sphinx to point towards the a single, authoritative, and complete set of documentation for how Matplotlib treats linestyles.

Benefits

Some powerful features of this approach include

  1. Making the complete extent of what each function is capable of obvious in plain text (with zero clicks required).
  2. Making the default option visible (with zero clicks). Seeing default option is often enough to jog the memory of returning users.
  3. Make a complete description of the ""most common"" and ""easiest"" options for a parameter easily available when browsing (with a single click).
  4. Make the process of discovering more powerful features and input methods as easy as ""scroll down"" to see more advanced options (with still only one click).
  5. Provide a centralized strategy for linking top-level ""API"" docs to the relevant ""tutorials"".
  6. Avoid API-doc-explosion, where scanning through the many possible options to each parameters makes individual docstrings unwieldy.

Other benefits of this approach over the current docs are:

  1. Docs are less likely to become stale, due to centralization.
  2. Canonicalization of many of matplotlib's ""implicit standards"" (like what is a ""bounds"" versus an ""extents"") that currently have to be learned by reading the code.
  3. The process would highlight issues with API consistency in a way that can be more easily tracked via the GitHub issues tracker, helping with the process of improving our API.
  4. Faster doc build times, due to significant decreases in the amount of text needing to be parsed.

Implementation

The improvements described above will require two major efforts for which a dedicated technical writer will be invaluable. The first is to create one centralized ""tutorial"" page per implicit type. This will require working with the core developer team to identify a concrete list of implicit types whose documentation would be valuable to users (typically, because they contain powerful, hidden features of our library whose documentation is currently only found in difficult-to-stumble-across tutorials). For each implicit type, I will then synthesize the various relevant tutorials, API docs, and example pages into a single authoritative source of documentation that can be linked to anywhere that particular type is referenced.

Once the centralized documentation for a given implicit type is complete, the second major effort begins: replacing existing API documentation with links to the new documentation, with an eye towards making the experience of actually using this new documentation as easy as possible, both for those using Python's built-in help() utility and for those browsing our documentation online.

While the exact format of the documentation proposed here is subject to change as this project evolves, I have worked with the Matplotlib core team during their weekly ""dev calls"" to establish a consensus that the strategy proposed here is the most expedient, useful, and technically tractable approach to begin documenting these ""implicit types"" (notes on these calls are available on hackmd). I will use the existing ""tutorials"" infrastructure for the initial stages of creating the centralized documentation for each implicit type, allowing me to easily reference these pages as follows, without having to create any new public classes (again, using the LineCollection docs as an example):

""""""
linestyles: LineStyle or list thereof, default: :rc:`lines.linestyle` ('-')
    A description of whether the stroke used to draw each line in the collection
    is dashed, dotted or solid, or some combination thereof. For a full
    description of possible LineStyle's, see :doc:`tutorials/types/linestyle`.
""""""

Moving forward, we could then easily change how these references are spelled once the core developer team agrees on the best long-term strategy for incorporating our new ""types"" documentation into bona fide Python classes, for example as proposed by me in Matplotlib Enhancement Proposal 30.

Finally, the preliminary list of implicit types that I propose documenting during this Google Season of Docs are:

  1. capstyle
  2. joinstyle
  3. bounds
  4. extents
  5. linestyle
  6. colors/lists of colors
  7. colornorm/colormap
  8. tick formatters

A living version of this document can be found on our Discourse.