Module 1: Ask

1. Typology of stakeholders

Before you embark on your dataset-documentation transparency efforts and create Data Cards, it's important that you identify and invite stakeholders from across the dataset lifecycle. This makes it easier to create Data Cards because it equips you with everything that you need to make stronger considerations as you create content.

To help you explore and understand how cross-functional stakeholders engage in a dataset's lifecycle process, we created a typology that lets you unearth assumptions often made about individual stakeholders. Our typology is divided into three stakeholder groups that are involved in a dataset's lifecycle: producers, agents and users.

This typology represents a continuum of constantly shifting needs and expectations from datasets and their documentation. There is no one-size-fits-all solution.

Producers

Producers are creators of datasets and documentation, and responsible for dataset collection, ownership, launch, and maintenance.

At its core, you can think of producers as those responsible for the production and publication of datasets, and launch, adoption and/or success.

Producers could also be the individuals or groups recruited to collect or label the data, and provide advice on methods or interpretation at various points during the data lifecycle.

Depending on the context, producers could also represent your current and future team members, partners, clients, or data-hosting platforms—all responsible for dataset maintenance or upkeep, deployment, and monitoring.

Agents

Agents are stakeholders who read your dataset documentation or Data Card and other machine-learning (ML) model-related documentation, and have the agency to use or determine how they or others might use the described datasets or AI systems.

Depending on their domains, agents could have an operational or reviewer role, such as a researcher in an academic setting who wants to gauge the appropriate usage of a dataset or a data scientist on a product team who wants to determine the overall fit of the dataset as it relates to product integration.

This distinction is important because reviewers include stakeholders who might never directly use the dataset, but still engage with the Data Card, such as industry consultants, investigative journalists, community representatives, and legal entities. Agents might or might not possess the technical expertise to navigate information presented in typical dataset documentation, but often have access to expertise as required.

Users

Users are individuals and representatives who interact with products that rely on models trained on datasets.

Users might consent to provide their data as a part of the product experience, but they typically require a significantly different set of explanations and controls grounded within product experiences, even when it comes to datasets.

Summary

The following table summarizes the stakeholder groups by their descriptions, responsibilities, examples, and common tasks:

Stakeholder group

Description

Responsibilities

Examples

Common tasks

Producers

Create datasets and/or documentation.

Design, create, quality test, document, launch, adopt, maintain, and update datasets.

Researchers, data scientists and analysts, software engineers, and product and program managers

Dataset adoption, disclosure, future-proofing, fairness and security, and improvements

Agents

Evaluate and use the dataset for their work, products, organizations, or communities.

Use the Data Card, but might not interact with the dataset itself.

ML or product engineers, researchers, third-party vendors, subject-matter experts, industry, consultants, policy experts, data service providers, and leadership or management

Manage complexity, be accountable, make trade-offs, deploy to production, archive

Users

Interact with the products, devices, and apps created by agents who use the producer's datasets.

Possibly contribute their data through products, and provide helpful signals for producers and agents.

Data contributors, product users, and representatives of user cohorts

Use products, understand data and privacy, provide feedback, and raise concerns

2. Map your stakeholders

Now that you have some familiarity with our typology, you can review your dataset's lifecycle to identify your stakeholders through this basic mapping activity. As you go through the activity, take note of who might interact with the dataset or its documentation. Also, consider how stakeholders might contribute to the Data Cards.

To map your stakeholders, follow these steps:

  1. List the producers who will create the Data Cards.

9019cf76931e3ae5.png

  1. List the agents who will read and use the Data Cards.

a6c5bfc2fadd8cb5.png

  1. List the users who will use or be affected by the dataset described in the Data Card.

210d18c6ec533955.png

  1. Use the following template to create a map of your stakeholders, their roles in the creation of Data Cards, and the purpose of their Data Cards. This map gives you an intuition for the downstream needs of dataset documentation, and the ability to assign priorities and responsibilities throughout the dataset-documentation process.

d24cf1a113189a25.png

3. Agent information journeys (AIJs)

With your stakeholders mapped out, you can determine what's essential to convey to agents—your primary stakeholders—in your Data Card so that you can set them up for success.

Typically, the experience that a person has when interacting with technology is called a user journey. However, we're talking about an agent who needs to acquire enough information about a dataset to make an informed decision, so we call these experiences an Agent information journey (AIJ).

The goal of an AIJ is to understand the following:

  • The tasks for which agents might want a dataset.
  • The information that agents need to complete their tasks.
  • The process by which agents deduce information.

AIJs include the following:

51ce23c7a9aaa9e4.png

Example

For example, suppose one of your agents is a data scientist. An AIJ for a data scientist could look like the following:

As a data scientist, I want to know the structure of the dataset, so I ask...

... what is the data format?

... what is the modality of the dataset?

... how many features are there in the dataset?

... how many features are engineered?

... which features are strongly correlated?

... if there are any dependencies in the structure?

Here's another example for an agent who might work in product policy and sets guidelines related to the production and development of a product:

As a policy aide, I want to know how the data might be misused, so I ask...

... what was the intended use of the dataset?

... what application prompted the dataset creation?

... what are known dangerous or risky applications of the dataset?

... what is the risk to specific groups?

... how do intended uses of this dataset impact constituencies?

... how can one ask for recourse?

4. Write your AIJs

  1. Write a few AIJs based on the following prompts:

ab594f2e5ce86029.png

  1. Notice how you not only have your stakeholders in mind, but also some initial questions that you think they'd like answered from reading your Data Card. This means that you're a step closer to the final set of questions that you should include in your Data Card.

5. Optics

You might have noticed the use of the terms perspective, lens, and scope to frame AIJs. While these terms were defined earlier, they're actually part of a guiding metaphor that we call optics. We created them to help you think about how your agents might arrive at an understanding of your dataset.

Scopes

In optics, scopes use lenses and mirrors to spot, observe, magnify, reflect, and even test materials. In the context of datasets, it's a great metaphor because you focus and frame questions to reveal obvious, non-obvious, visible, and invisible aspects.

We refer to this as scopes, a means by which to ask a series of questions in succession to make sense of datasets. By stacking scopes of different granularities, you can create content that helps your agents establish a cohesive understanding of datasets through transparency reports.

The following table contains the three types of scopes in our framework, along with a description, an example, and the purpose of each:

Scope

Description

Example

Purpose

Telescopic

Questions about attributes commonly found across multiple datasets. They tag characteristics.

Does this dataset contain Personal Identifiable Information (PII)?

Introduce and set context for additional information that helps your agents navigate your Data Card or transparency artifact.

Periscopic

Questions about attributes specific to the producer's dataset. They describe observations.

How many features contain PII?

Generally reserved for provision of operational information, such as the dataset's shape and size, or functional information, such as sources or intentions.

Microscopic

Questions about unobservable aspects of datasets, such as decisions, processes, and impacts. They demand explanations.

How was PII anonymized in this dataset?

Elicit detailed explanations of decisions or summarize longer process documents that govern responses to the corresponding periscopic and telescopic questions.

It's important that you consider these three types of scopes throughout your Data Card creation process. A Data Card with only telescopes only describes obvious information about your dataset and doesn't add any distinct value. A Data Card with only periscopes can get overly technical without any details about context, relevance, or importance. A Data Card with only microscopes could cause agents to easily get lost in the details and lose sight of the big picture.

This is why we find that the interpretations of a Data Card are greatly influenced by the presence or absence of these levels of scopes. These questions let agents and producers assess risk, plan mitigations, and, where relevant, identify opportunities for better dataset creation. Together, telescopes, periscopes, and microscopes provide useful details so that numerous stakeholders can navigate your Data Card without getting disoriented and lost.

Example

In the Agent information journeys (AIJs) section, you saw some examples of AIJs, including one for a data scientist. If you look closely at that example, you might find that you can group some of those questions by scopes, including the following questions:

As a data scientist, I want to know the structure of the dataset, so I ask...

Telescopic

... what is the data format?

... what is the modality of the dataset?

Periscopic

... how many features are there in the dataset?

... how many features are engineered?

Microscopic

... which features are strongly correlated?

... if there are any dependencies in the structure?

It's very likely that you might have already come up with some telescopic, periscopic, and microscopic questions with your agents in mind.

6. Restructure your AIJs with scopes

  • To restructure your AIJs with scopes, use the following sample prompt:

2b6e2a7a041060f4.png

7. Congratulations

Congratulations! You began to create a Data Card. Now you're ready to evaluate your questions.