Data Commons project

This page contains the details of a technical writing project accepted for Season of Docs.

Project summary

Open source organization:
Data Commons
Technical writer:
KilimAnnejaro
Project name:
Improving DataCommons Getting Started Documentation
Project length:
Standard length (3 months)

Project description

In my career as a software engineer, I’ve found myself repeatedly frustrated by the experience of joining a new team or project, pulling down a code repository, running it, and watching the software break when key steps were missing from the documentation. I quickly realized that I could apply my lifelong passion for writing and composition to these needs, in the process creating a supportive environment for the developers I worked with to focus on technical innovation and creativity, rather than solving problems with known answers.

This technical innovation and creativity is urgently needed in many segments of society, especially by leaders in government and the nonprofit sector seeking to analyze datasets in their problem spaces. By making this data readily available as a service, DataCommons lowers the barrier to entry for analysts seeking data that is easy to access and close to the format they need for their roles. DataCommons does this by creating a Knowledge Graph of the data it ingests, in the process raising interesting questions about data quality and governance in the context of open source. By applying to Google Summer of Docs with a proposal for the DataCommons organization, I hope to support and advance these technical efforts in the public interest open data space.

Current Pain Points in the Documentation with Proposed Solutions While the dataCommons website does contain directions for adding datasets to dataCommons, the directions are very short and unclear, consisting of several bullet points directing the would-be data donor to use schema.org markup. As part of this project, I propose to clean up the ‘Get Involved’ section of dataCommons.org. I will create a tutorial for adding new datasets, explaining how datasets are scraped and incorporated into the dataCommons knowledge graph. I will also add an FAQ section for common solutions suggested when a dataset owner reaches out to the dataCommons maintainers for assistance.

Looking at the current set of examples for querying data, dataCommons only offers four interactive code examples, all of them Python notebooks. As part of this project, I will translate these notebooks into R and also create interactive demo versions of the present examples for Google Sheets and the REST API, embedding these demos into the current documentation.

Finally, the documentation doesn’t really offer any examples of how the dataCommons knowledge graph can be used to build applications in software. As part of this project, I will create, deploy, and document a sample tool in Python that utilizes the dataCommons API to enable the end user to construct graphical visualizations relating any two quantities connected within the knowledge graph. For example, one might be able to use this tool to draw a linear regression relating weather data to common business patterns information. As a stretch goal, I hope to extend this tool to other types of visualizations, such as pie charts and Venn diagrams.

Schedule The season runs from September 14-November 30, so my plan for completing this project looks like this:

September: Begin with rewriting the Get Involved section; mostly finish this work by the end of September.

October: Wrap up the Get Involved section and create the interactive code samples.

November: Create the sample visualization tool.