Process to update OCD IDs

Introduction

Open Civic Data Identifiers (OCD IDs) are a common identifier format which define political geographies. To create a CDF feed, you need to provide these identifiers as part of a GpUnit entity. This document aims to provide guidance and best practices to add OCD IDs to the opencivicdata Github repository.

How to update OCD IDs in Open Civic Data GitHub repository

Prerequisites

  1. Understand how to contribute to an open source project

Before you start:

  1. Prepare your workstation:

    Fork and clone the package from the OCD ID repo.

  2. Familiarize yourself with the repo:

    In the OCD ID repo, each supported country is represented with a directory and a CSV file that shares the same name: country-<2 letter country code> (Example: identifiers/country-de and identifiers/country-de.csv for Germany).

    Inside the directory of the country you want to modify, you can find the CSV files (Example) that include parts of the top-level, country-specific CSV file. These are the files that you need to modify.

  3. How to create new OCD IDs:

    Structure and sources

    Take a look at the open civics data document to familiarize yourself with the OCD ID structure. In general, a valid OCD ID is in the following format: ocd-division/country:<country_code>(/<type>:<type_id>)

    Identifier naming prefers standard identifiers from ISO, or other standards, like FIPS and NUTS, if ISO isn't available.

    General policies

    The following are general policies:

    Hierarchy

    OCD ID hierarchy must dictate by the administrative level that controls the boundaries of the OCD IDs, not necessarily the containment relationship of the OCD IDs.

    • Example: In the United States, congressional districts are used in elections for the national House of Representatives, but their boundaries are determined by the States. So the congressional districts hang off of the states: ocd-division/country:us/state:pa/cd:2
    • Example: Murrysville is a municipality in Pennsylvania and is contained within Westmoreland County. However, the towns are administered by the State, so the OCD ID hangs off of the state: ocd-division/country:us/state:pa/place:murrysville Extra hierarchy might be used in cases where disambiguation is needed.
    • Example: There are 16 places in Pennsylvania named "Franklin Township." Ordinarily, these would each have the OCD ID ocd-division/country:us/state:pa/place:franklin, but that would be ambiguous. So instead, we can add the county to the OCD ID so each gets its own unique OCD ID. Ex: ocd-division/country:us/state:pa/county:adams/place:franklin
    Type
    • OCD ID types are typically specific to countries.

    • Some OCD IDs are common throughout the repository like country, region, and place.

    • However, the general guidance is to err on the side of specifying the types in a more specific way that would make sense in the context of that country.

      • Example: For Admin Area 1 in the US, the types state, district, and territory are used.
      • Example: For Admin Area 1 in CA, the types province and territory are used.
      • Example: For Admin Area 1 in PT, the types region and autonomous_region are used.
    ID
    • In general, we want to use the same OCD ID for a district across redistricting. When choosing a type_id to use for a new set of OCD IDs, choose one that is most stable. Some questions to ask when determining which identifier to use are:
    • How likely is the identifier for a given district to change due to redistricting?
    • If the same officeholder holds district X before and after its boundaries have changed from redistricting, would I represent their term as continuous?
    • Are districts with identical boundaries or names across redistricting represented by the same identifier?
      • Example: In the US, congressional district numbers are used for US house districts because even though their boundaries change with redistricting, the identity is strongly attached to the number and you would refer to someone as holding office for the Nth seat for X years even when that amount of years crosses redistricting boundaries.
      • Example: In Canada, we want to use district names to represent federal electoral districts because although federal electoral codes exist, this identifier isn't stable because identical districts across redistricting are represented with different ids. (such as district 47012 before 2012 redistricting isn't the same district after).
    Redistricting

    Generally, when updating OCD IDs due to redistricting, use the set of OCD IDs that exist rather than creating a new set.

    • If a district's identity (numeric ID, name, etc.) doesn't change after redistricting, use the same OCD ID.
    • For the creation of new districts after redistricting, create new OCD IDs.
    • For districts that no longer exist, update the ValidThrough field with the date redistricting went into effect.
    • If a district's ID is based on its name, and the district is renamed after redistricting, create a new ID based on the new district name and add an alias where id = oldId and sameAs = newId. This canonicalizes the newId as usage of oldId maps to newId.
    • In the case where there is a collision between historical OCD IDs, such as a new id is identical to an abolished ID, append the ValidFrom year to the new ID. For more about this naming policy, see Create new files for current, abolished and renamed OCD-IDs.

Update your local copy

Any update needs to be done under the country specific directory: ocd-repository/identifiers/country-<2 letter country code>. If one doesn't exist, create it.

  • If the OCD IDs CSV file already exists, and they represent old electoral boundaries, you need to update this file to include the new constituencies. To do this, add a ValidFrom and a ValidTo column which contains the date in YYYY-MM-DD format.
    • ValidFrom should be the election date for new constituencies. it can be left blank for constituencies that already exist.
    • ValidTo for outdated constituencies must be the day before the election.

Update Aliases

Aliasing can be used for marking OCD IDs as a representation of the same piece political geography.

  • For example, if a place is both a town and a county, this could make sense.

The general principles we're trying to push:

  • If two pieces of political geography are coterminous, they shouldn't necessarily be aliased with each other. Ex: at-large US congressional are coterminous with states, but this is by chance due to current US population numbers.

Aliasing can also make sense where significant changes in laws/constitutional amendments would be required to split districts.

  • For example, the Washington state senate and state house districts are set to be the same by constitutional law.

If you need to, add a CSV file aliases.csv in which we add both old OCD IDs and their aliases. Canonical IDs can use division types that have a local meaning and can have aliases with more familiar representations (such as, canonical: ocd-division/country:de/land alias: `ocd-division/country:de/state). See issue #170 for more information.

To update aliases.csv file:

  • Respect the order of columns in the aliases.csv file: id,sameAs,sameAsNote
Type Description
id This column must have the aliases to the OCD IDs.
sameAs This column must have the actual OCD IDs for which we add the aliases.
sameAsNote A note that describes how or why the division has multiple identifiers.

Example: identifiers/country-in/aliases.csv

Run the script - compile.py

Run the python script at location opencivicdata/ocd-division-ids/scripts:

  • You have to specify the 2 letter country code as an argument to the script so that it knows which country’s identifiers need an update.
  • Example: python3 scripts/compile.py in (for updating OCD IDs of constituencies in India).
  • Python 2.x doesn't receive support. Have to use Python 3 or later versions. You can download the latest version of Python 3 here.

The script takes data from the CSV files updated in the previous step (ocd-division-ids/identifiers/country-in/*.csv), validates the values in the files, checks for any data errors (use of special characters, etc), or data duplication, and writes the new OCD IDs to the top level country CSV file.

The script throws out error and warning messages if any issues arise in the updated CSV files. Resolve the issues and run the script again.

Add a readme file

When you add a new country or a new level of coverage (for example, previously we only had an OCD ID for the country, but are now adding the first-level administrative districts), add or update a README.md file. This file must contain a quick outline of the political geography, including:

  • The types that we will use and their roles (such as "districts are the first-level administrative area in this country");
  • The relationships between types (such as "legislative districts are assigned on a per-district basis and do not cross district boundaries");
  • Any notable exceptions (such as, "there is one legislative district to cover all overseas citizens"); and
  • Links to any useful Wikipedia pages to help provide context for a reviewer or user.

Create a pull request

To create a pull request, use the following guidance:

  • Once this process completes without errors, check the newly written top level country-<2 letter country code>.csv file to make sure it now includes the updated/new set of OCD IDs.
  • Create a pull request. and add reviewers. This pull request must include modifications done in all of the following CSV files.
  • The CSV files in the country specific directory. Example
  • The top level country-<2 letter country code>.csv file. Example
  • When the pull request is reviewed and approved by two of the country's committers, it's merged by one of the owners/collaborators of the package.