Basic Concepts

If you are new to Freebase, this section covers the basic terminology and concepts required to understand how Freebase works.

  1. Graphs
  2. Topics
  3. Types and Properties
  4. Domains and IDs
  5. Compound Value Types
  6. Topic MIDs
  7. Namespaces, Keys, and Topic IDs
  8. More on Properties
  9. Summary

Graphs

Freebase data is stored in a data structure called a graph. A graph is composed on nodes connected by edges. In Freebase, the nodes are defined using /type/object and edges are defined using /type/link. By storing the data as a graph, Freebase can quickly traverse arbitrary connections between topics and easily add new schema without having to change structure of the data.

Topics

Freebase has over 39 million topics about real-world entities like people, places and things. Since Freebase data is represented a graph, these topics correspond to the nodes in the graph. However, not every node is a topic. See the section on CVTs to as an example of nodes that are not topics.

Examples of the types of topics found in Freebase:

Some topics are notable because they hold a lot of data (e.g., Wal-Mart), and some are notable because they link to many other topics, potentially in different domains of information. For example, abstract topics like love, poverty, chivalry, etc. don't have many properties associated with them but they appear often as book subjects, poetry subjects, film subjects, etc. making them more notable.

Types and Properties

Any given topic can be seen for many different perspectives for example:

  • Bob Dylan was a song writer, singer, performer, book author, and film actor;
  • Leonardo da Vinci was a painter, a sculptor, an anatomist, an architect, an engineer, ...;
  • Love is a book subject, film subject, play subject, poetry subject, ...;
  • Any city is a location, potentially a tourist destination, and an employer of civil servants.

In order to capture this multi-faceted nature of many topics, we introduce the concept of types in Freebase. Topics in Freebase can have any number of types assigned to them. The topic about Bob Dylan is assigned several types: the song writer type, the music composer type, the music artist (singer) type, the book author type, etc. Each type carries a different set of properties germane to that type. For example,

  • The music artist type contains a property that lists all the albums that Bob Dylan has produced as well as all the music instruments he was known to play;
  • The book author type contains a property that lists all the books Bob Dylan has written or edited, as well as his writing school of thoughts or movement;
  • The company type contains many property for listing a company's founders, board members, parent company, divisions, employees, products, year-by-year revenue and profit records, etc.

Thus, a type can be thought of as a conceptual container of properties that are most commonly needed for describing a particular aspect of information. (You can think of a type as analogous to a relational table, and each "type" table has a foreign key into the one "identity" table that uniquely defines each topic.)

Domains and IDs

Just as properties are grouped into types, types themselves are grouped into domains. Think of domains as the sections in your favorite newspaper: Business, Life Style, Arts and Entertainment, Politics, Economics, etc. Each domain is given an ID (identifier), e.g.,

The ID of a domain looks like a file path, or a path in a web address.

Each type is also given an ID, and its ID is based on the domain in which it belongs. For example, the Company type belongs in the Business domain, and it's given the ID /business/company. Here are some other examples:

Just as a type inherits the beginning of its ID from its domain, a property also inherits the beginning of its ID from the type it belongs to. For example, the Industry property of the Company type (used for specifying which industry a company is in) is given the ID /business/company/industry. Here are some other examples:

Thus, even though types are not arranged into hierarchies in Freebase; domains, types, and properties are given IDs conceptually arranged in a file directory-like hierarchy.

Compound Value Types

A Compound Value Type is a Type within Freebase which is used to represent data where each entry consists of multiple fields. Compound value types, or CVT's are used in Freebase to represent complex data. It may be a little confusing at first, but CVT's are a very important part of the Freebase schema and allow it to more accurately model complex relationships between topics.

Think about the following example: Population for a city is something that changes over time. That means, whenever you query Freebase for population, you are at least implicitly asking for a population at a certain date. Two Values are involved, a number of people, and the date. Here's a situation where a CVT becomes extremely useful. Without one, to model population data, you would need to make a topic, and name it something like "Vancouver's population in 1997", and submit the information over there.

A CVT can be thought of as a topic that does not require you to make a display name. CVT's, like normal topics, have a GUID that can be referenced independently. However, the Freebase client treats them much differently than topics. In most cases, every property of the CVT should be a disambiguation property.

Topic MIDs

While a topic might or might not be identifiable by namespace/key IDs, it can always be identified with a MID — a Machine Identifier, which consist /m/ followed by a base-32 unique identifier. MIDs are assigned to topics at creation time, and are managed throughout the topic's lifetime. They play a critical role when topics are merged or split, allowing external applications to track the logical topic even though the physical Freebase identity (the topic's GUID) may change. Machine-generated ids differ from other human-readable Freebase ids (returned by the "id" property) in that they are:

  • Guaranteed to exist
  • Machine-generated
  • Designed to support offline comparison
  • Not designed to convey meaning to humans
  • Short (possibly fixed length)
  • Ideal for quick exchange of keys between external systems and components (external, exchange)

MIDs are the recommended identifier to use to address topics in Freebase

Namespaces, Keys, and Topic IDs

The file directory-like hierarchy of domain, type, and property IDs is just one application of a more general concept: namespaces and keys. A namespace is like a file directory, and a key is like a file name. Just as all file names within a particular file directory must be unique among themselves, all keys within a particular namespace must also be unique among themselves.

As a more specific example, /business is the namespace corresponding to the Business domain. Within it, Business-related types are given keys (e.g., company) that are unique among themselves. Each type's ID is formed by appending its key to the namespace's ID (e.g., /business/company).

There are several kinds of namespaces beside namespaces that correspond to domains and types. Most important and frequently encountered is the /en namespace. This is the English namespace in which most well-known topics can be given unique keys to form human-readable English IDs. For example, the prolific Bob Dylan is so well-known that his topic in Freebase is given the key bob_dylan in the /en namespace, and so the topic's ID is /en/bob_dylan. This ID allows you to access his topic in the web client with the simple URL

More on Properties

The last basic concept to discuss involves a major difference between Freebase properties and their analogy in relational database technologies, namely relational table columns. For each row, a relational table column can only hold one value. For example, consider a typical "book" relational table with a column named "author". For each row in the "book" table, the "author" column can only hold one foreign key to an "author" table. If a book happens to have several authors, then this simple relational schema design does not work, and we would have to make a new table to model the authorships. That is, we would need one "book" table, one "author" table, and one "authorship" table to store the n-to-n relationships between books and authors. And the way you retrieve data changes quite radically as you switch from one schema design to the other.

In contrast with conventional database technologies, Freebase considers multi-value properties to be so desirable in modeling real-life data that it supports multi-value properties by default. That is, when the /book/written_work/author property was created, it was assumed to allow for multiple authors per book, and you can query for a multi-value property and for a single-value property in exactly the same way. There is no need to think if you need to join with a third table that models the n-to-n relationship.

Summary

  • A type is a conceptual container of related properties commonly needed to describe a certain aspect of a topic.
  • A topic can be assigned one or more types (the default type being /common/topic)
  • As properties are grouped into types, types are grouped into domains.
  • Domains, types, and properties are given IDs in a namespace/key hierarchy.
  • Common well-known topics are given IDs in the /en namespace, which are human-readable English strings.
  • Topics are uniquely identified within Freebase by GUIDs.
  • Properties are multi-value by default, and multi-value properties and single-value properties can be queried in the same way.