Structure of a Google Docs document

This guide explaines the internal structure of a Google Docs document: the elements that make up a document and the relationship between these elements.

Top-level elements

The top-level elements of a document include the body and a number of other attributes of the document as a whole:

document: {
    body: ... ,
    documentStyle: ... ,
    lists: ... ,
    documentId: ... ,
    namedStyles: ... ,
    revisionId: ... ,
    title: ...

To manipulate global document features outside of the body content, it is almost always better to use one or more document templates, which you can use as a basis for generating new documents programmatically.

Body content

Most of the things that you can, or would likely want to, use programmatically are elements within the body content:

Structural elements

The body content is essentially just a sequence of StructuralElement objects. Each StructuralElement object is personalized by its content element, as shown in the following diagram:

The structural elements and their content objects contain all the document's text, inline images, and so on.

Paragraphs contain a special type of element called a ParagraphElement that works something like a StructuralElement: it is personalized by its own set of content element types, as shown in the following diagram:

For an example of a complete document structure, see the sample document dump. In that dump you can see many of the key structural and content elements, as well as the use of start and end indexes as described in the following section.

Start and end index

Most elements within the body content have the startIndex and endIndex properties. These indicate the offset of an element's beginning and end, relative to the beginning of its enclosing segment.

Indexes are measured in UTF-16 code units. This means that surrogate pairs consume two indexes. For example, the "GRINNING FACE" emoji, 😄, would be represented as "\uD83D\uDE00" and would consume two indexes.

For elements within a document body, these indexes represent offsets from the beginning of the body content, which is the "root" element.

The "personalizing" types for structural elements—Paragraph, Table, TableOfContents, and SectionBreak—don't have these indexes because their enclosing StructuralElement has these fields. This is also true of the personalizing types contained in a ParagraphElement.

Paragraph structure

A paragraph is made up of the following:

  • elements — A sequence containing one or more instances of textRun
  • paragraphStyle — An optional element that explicitly sets style properties for the paragraph.
  • bullet — An optional element that provides the bullet specification if the paragraph is part of a list.

Text runs

A text run represents a contiguous string of text that all has the same text style. A paragraph can contain multiple text runs; text runs cannot cross paragraph boundaries. Consider, for example, a tiny document like the following:

The following diagram shows how you might visualize the sequence of paragraphs in the above document, each with its own text run(s) and optional bullet settings.

Accessing elements

Many elements are modifiable using the BatchUpdate method. For example, using the InsertTextRequest request type, you can modify the content of any element that contains text; similarly, you can use UpdateTextStyleRequest to apply formatting to a range of text contained in one or more elements.

To read elements of the document, use the get method to obtain a JSON dump of the complete document. (To see a simple way do do this, see the Output doc as JSON sample.) You can then parse the resulting JSON to find the values of individual elements.

Parsing the content can be useful for various use cases. Consider, for example, a document cataloging app that lists documents that it finds. An app like this might want to extract the title, revision ID, and starting page number of a document, as shown in the following diagram:

Because there are no methods for reading these settings explicitly, your app would need to get the whole document, then parse the JSON to extract these values.