This page contains the details of a technical writing project accepted for Season of Docs.
- Open source organization:
- Technical writer:
- Project name:
- Creating, reading, sharing: Optimizing Bokeh’s documentation
- Project length:
- Standard length (3 months)
Creating, reading, sharing: Optimizing Bokeh’s documentation
Bokeh is an extremely powerful tool for creating interactive, browser-based visualizations with Python. It has a sizeable user base (502k monthly conda downloads, 855k PyPi downloads) and a large number of contributors (455 contributors on GitHub). It comes as no surprise that Bokeh’s extensive documentation is organically grown. And thus, in places, inconsistent, difficult to access, and convoluted.
Google’s Season of Docs provides a unique opportunity for a focused review and revision of the structure and contents of Bokeh’s documentation - and for making sure that the documentation and the associated tools and workflows are future proof.
I have coded and documented open-source projects with Python and Sphinx (most recently: PyZillow and PyPresseportal). But what makes me a unique participant of Google’s Season of Docs is my background in journalism: I worked in newsrooms for more than 13 years, with many years as a managing editor and advocate of digital change. In addition to my journalistic duties, I had increasing responsibilities in designing and documenting new digital tools and style guides, as well as training newsroom staff.
After a recent move from Europe to the US, I decided to delve into new ways of bringing together my enthusiasm for communication and coding. I found technical writing to be the optimal combination of my skills and experiences in writing and tech. In this proposal, I will lay out how I will use Google’s Season of Docs to optimize Bokeh’s documentation: By making creating and contributing to documentation more convenient, by making reading the documentation more straightforward and by making sharing information in the documentation with others easier.
2. Current situation
Bokeh’s official documentation consists of these main units:
- Narrative documentation: Installation Guide, User Guide, Developers Guide, Release Notes
- Gallery and Demos (interactive examples with their source code)
- Reference Guide (documentation based on docstrings)
- Tutorial (extensive collection of Jupyter notebooks, hosted on mybinder.org)
- Docstrings and model help for IDEs
- Examples and READMEs in the project repository
Additionally, a wealth of information is available on the Bokeh support forum and on Stack Overflow, where Bokeh’s developer actively answer user questions, as well as on Bokeh’s Medium blog.
Most of the documentation web pages are created with Sphinx, using several standard and custom Sphinx extensions. The Reference Guide, for example, is generated from docstrings, using extensions such as ‘autodoc’ and the custom ‘bokeh_autodoc’. As is the nature of organically grown documentation, it contains redundancies and inconsistencies.
One of the first things I noticed when analyzing the existing documentation was the lack of clear style guidelines for documentation writing. The Bokeh Developer Guide contains some basic instructions. However, this document does not contain much guidance about style, especially regarding documentation that goes beyond docstrings. As a consequence, stylistic and structural issues make it more difficult to access the information in the documents, especially for newcomers.
- Using nouns, gerunds, and uncommon words instead of clear and strong verbs makes some of the text unnecessarily complicated: “The main observation is that the typical usage involves creating plot objects with the figure() function”. This should be rephrased to make reading easier. For example: “The figure() function is the function most commonly used to create plot objects.”
- Some sentences are very long, making them difficult to comprehend: “Next we can call vbar with the list of fruit name factors as the x coordinate, the bar height as the top coordinate, and optionally any width or other properties that we would like to set”. Longer sentences should be broken up into shorter sentences or bulleted lists. Simplifying sentences will be especially helpful for users with dyslexia or people who do not use English as their first language (see issue #10160).
- Inconsistent use of “you” and “we”, which is confusing and distracting: “There are two basic methods that can be used, depending on your use case” and “We can plot all the year series using separate calls” (two examples from the same page). Some pages address readers in even different ways, such as: “users may have to install additional dependencies” or “one can create more complex layouts”.
- Typos, missing and superfluous words, and grammatical errors break up the flow of reading and damage the credibility of the information: “Bokeh make it simple to create basic bar charts” or “See the Glyphs section od the User’s Guide”.
- Structural inconsistencies can be frustrating for readers: Such as having well-annotated examples on one page and no explanation of the examples on another page.
Bokeh’s documentation landing page is rather short and does not include information on all available resources (no mention of the extensive library of docstrings and model help functions, the support forums, the demos or the Medium blog). This also makes it more difficult for new users to get started with Bokeh.
To utilize the eleven-week doc development phase most efficiently, I will focus on three key elements:
3.1. Improve creating the docs
Most of Bokeh’s documentation is written by contributors who include documentation as part of pull requests for new functionalities and bug fixes. While I will use some of the doc development phase to edit and refactor the existing documents, I will emphasize making the workflows for creating and maintaining the documentation future proof: It should be as easy as possible for contributors to keep a consistently high standard of documentation.
I will ensure this with two approaches:
- I will create a set of practical, simple style guidelines to be included in the existing Developers Guide. These guidelines will address style, grammar, and structure. Additionally, I will use internal communication channels such as Slack to make sure that Bokeh’s contributors are aware of the style guidelines. I will also offer one-on-one training and Q&A sessions for the team.
- I will work with the core team to find an optimal set of tools for documentation quality control, which will be added to Bokeh’s existing workflows for PRs (pull requests) and CI (continuous integration). Depending on the team’s requirements, this could mean adding tools such as pydocstyle, proselint, or sphinxcontrib-spelling spell checking to Bokeh’s testing suite, pre-commit setup or GitHub actions. I have added a working proof of concept to the GitHub actions of one of my own open-source projects.
3.2. Improve reading the docs
The goal of good documentation is to make it easy for current and prospective users to find exactly the right information and to be able to make use of this information as efficiently as possible.
Key factors for a text’s usability are its style and structure: Writing documentation in a clear, consistent style allows for the readers to pick up information quickly, without distractions and superfluous language. A straightforward and transparent structure of the documentation makes it easy to find relevant information quickly.
I will focus on both those areas, with an emphasis on usability for new users. This will include a thorough review of the narrative documentation, centered on the User Guide. I will also overhaul the documentation landing page to more clearly address different target audiences and make sure every user can find the right resources quickly.
Just as with improving the creation of docs outlined above, I will focus on laying a foundation for future improvements and continually high standards for Bokeh’s documentation.
3.3. Improve sharing the docs
More and more discussion around Bokeh is happening on third-party platforms. Many of these platforms use metadata such as Facebook’s OpenGraph to include rich previews of links. OpenGraph is used by services such as Facebook, Twitter, LinkedIn, Slack, and iMessage. Bokeh’s Discourse forum also uses OpenGraph to render link previews.
To make use of this technology, I will add metadata to Bokeh’s Sphinx-generated web pages, as described in issue #9792. The most efficient way would be using a dedicated Sphinx extension. A few days ago, a very first version of a Sphinx extension for OpenGraph data was published on GitHub. I will use some of the docs development phase to help improving this extension for use with Bokeh's documentation.
I have also created a proof of concept that I am successfully using in one of my own open-source projects, PyPresseportal. This extension automatically collects relevant information such as title, description, image, and URL. It then inserts this information into the Sphinx-generated HTML-code as OpenGraph tags.
A next step in developing this extension would be to introduce custom directives to manually define OpenGraph metadata (similar to docutil’s existing ‘meta’ directives). Automatically generated metadata would only be used as a fallback, in case there is no manually entered data available.
Supporting Structured Data is a lot more complex, so I will focus primarily on including high-quality OpenGraph metadata for Bokeh’s documentation. All work that goes into supporting OpenGraph will, at the same time, lay the foundations for Structured Data support.
Members of the Sphinx and ReadTheDocs communities have expressed interest in developing extensions for OpenGraph and Structured Data (in issues #1758 and #5208, for example), and I will make sure to work with them closely.
To summarize, these are the deliverables I expect to come out of Season of Docs:
- Documentation style guidelines for Bokeh contributors
- Revised PR and CI workflows to include automated documentation quality control
- Edited and restructured User Guide
- Revised documentation landing page
- OpenGraph metadata included in the documentation web pages and a working Sphinx extension
In addition, I will keep a “doclog” to document my journey through Google’s Season of Docs on my personal website/Medium or Bokeh’s Medium blog. This will also serve as a basis for the project report for Google. I will do all work transparently, in the form of GitHub issues and pull request, whenever possible.
Before community bonding phase: I am already actively participating in several discussions on Bokeh's GitHub repository and have been in touch with Bryan Van de Ven and Pavithra Eswaramoorthy, Bokeh's mentors for Google's Season of Docs. I will stay active in the conversation about Bokeh and will also further familiarize myself with Bokeh by building and publishing visualizations.
Community bonding phase (08/17 - 09/13):
- Get to know the core team, refine project plan in exchange with mentors
- Setup up a schedule and communication channels for regular reporting and feedback with mentors
- Be active on Bokeh’s Slack to inform all interested Bokeh contributors about Google's Season of Docs and the goals for the doc development phase
- Gather feedback from Bokeh contributors to further refine the plan for the doc development phase
Doc development phase
Week 1, 09/14 - 09/20:
- Begin auditing and editing narrative documentation
- Begin drafting of style guidelines
Week 2, 09/21 - 09/27:
- Continue work on style guidelines
- Continue editing narrative documentation
- Begin overhauling the documentation landing page
Week 3, 09/28 - 10/04:
- Finalize style guidelines
- Finalize documentation landing page
Week 4, 10/05 - 10/11:
- Finalize editing of narrative documentation
- Discuss with Bokeh core team about integrating tools for document quality control in PR/CI workflows
Week 5, 10/12 - 10/18:
- Set up Q&A session with Bokeh contributors on Slack to discuss style guidelines, improvements to narrative documentation, and PR/CI workflows
- Begin developing my existing proof of concept for OpenGraph metadata into a deployable Sphinx extension
- Revise style guidelines based on feedback from Q&A session with Bokeh contributors
Week 6, 10/19 - 10/25:
- Begin testing of tools for document quality control in PR and CI workflows
- Continue development of Sphinx extension for metadata
Week 7, 10/26 - 11/01:
- Testing of Sphinx extension
- Second Q&A session with Bokeh contributors on Slack
- Revise deliverables based on feedback from second Q&A session
Week 8, 11/02 - 11/08:
- Deploy Sphinx extension and publish improved narrative documentation and documentation landing page
Week 9, 11/09 - 11/15:
- Deploy document quality control tools into PR and CI workflows
- Update and publish Developers Guide to include style guidelines and PR and CI workflow additions
Week 10, 11/16 - 11/22:
- Finalize remaining tasks
Week 11, 11/23 - 11/29:
- Begin writing project report
- Begin writing project evaluation
Project finalization phase
Week 12, 11/30 - 12/05:
- Finalize and submit project report
Week 13, 12/03 - 12/10:
- Finalize and submit project evaluation
After conclusion of Google’s Season of Docs:
- I hope to stay involved in the development of Bokeh and continue working on Bokeh’s documentation.
- I plan on continuing the development of a Sphinx extension for OpenGraph/Structured Data metadata.
- I hope to use my background in journalism and my existing network to promote Bokeh as a tool in data journalism. For example, by writing about Bokeh with a journalistic audience in mind or by offering conference talks about using Bokeh in journalistic settings.
6. About myself
I am originally a journalist, with a background in TV, online, and radio news. Working as a managing editor and reporter in TV and digital news has given me years of experience in writing and editing. At the same time, I worked on several projects promoting digital transformation and automation. I wrote numerous manuals covering digital tools and workflows, as well as style guides and communication strategies for digital news brands. I also trained team members in using those tools.
I have always been drawn to the intersections between communication and tech. A whole new world opened up to me when I learned to code in Python: I have been able to do data analysis and visualization for data journalism, for example. Learning to code has also allowed me to actively work together with software engineers to develop digital tools for newsroom workflows.
The manuals and documents I wrote at my previous job are unfortunately non-public. Therefore, I am now focussing on getting more involved with open-source projects (see below for examples). I have based my work in technical writing on style guides such as Google’s developer documentation style guide and the Microsoft style guide. I regularly use GitHub, Slack, and Linux. I have been writing narrative documentations as well as docstrings and type hints, using tools like Sphinx, Mypy, and Sphinx autodoc.
I am currently working freelance. My schedule provides the necessary flexibility to work on Bokeh’s documentation for around 25 hours per week during the doc development phase. I work in the Pacific Time Zone but am happy to accommodate the schedules and needs of the team.
7. Recent open-source documentation examples
PyZillow: PyZillow is a Python wrapper for the API of the real estate website Zillow.com. In addition to providing some code and acting as one of the code maintainers, I wrote the complete documentation. I used Sphinx for the narrative documentation, as well as for the module reference. I created the module reference with the Sphinx extension autodoc, based on the docstrings I added to the code.
PyPresseportal: PyPresseportal is a Python wrapper for the API of the website presseportal.de. The website presseportal.de is one of the biggest distributors of press releases and investor relations announcements in Germany. For example, almost all police and fire departments use this service to distribute their press releases. After using the API for many years as a journalist, I decided to develop a Python interface to access the website’s extensive data resources as Python objects. I wrote the code and the entire Sphinx-based documentation.