Introduction
Open Civic Data Identifiers (OCD IDs) are a common identifier format which define political geographies. To create a CDF feed, you need to provide these identifiers as part of a GpUnit entity. This document aims to provide guidance and best practices to add OCD IDs to the opencivicdata Github repository.
How to update OCD IDs in Open Civic Data GitHub repository
Prerequisites
Before you start:
Prepare your workstation:
Fork and clone the package from the OCD ID repo.
Familiarize yourself with the repo:
In the OCD ID repo, each supported country is represented with a directory and a CSV file that shares the same name: country-<2 letter country code> (Example:
identifiers/country-de
andidentifiers/country-de.csv
for Germany).Inside the directory of the country you want to modify, you can find the CSV files (Example) that include parts of the top-level, country-specific CSV file. These are the files that you need to modify.
How to create new OCD IDs:
Structure and sources
Take a look at the open civics data document to familiarize yourself with the OCD ID structure. In general, a valid OCD ID is in the following format:
ocd-division/country:<country_code>(/<type>:<type_id>)
Identifier naming prefers standard identifiers from ISO, or other standards, like FIPS and NUTS, if ISO isn't available.
General policies
The following are general policies:
Hierarchy
OCD ID hierarchy must dictate by the administrative level that controls the boundaries of the OCD IDs, not necessarily the containment relationship of the OCD IDs.
- Example: In the United States, congressional districts are used in
elections for the national House of Representatives, but their boundaries
are determined by the States. So the congressional districts hang off of
the states:
ocd-division/country:us/state:pa/cd:2
- Example: Murrysville is a municipality in Pennsylvania and is
contained within Westmoreland County. However, the towns are administered by the State, so the OCD ID hangs off of the state:
ocd-division/country:us/state:pa/place:murrysville
Extra hierarchy might be used in cases where disambiguation is needed. - Example: There are 16 places in Pennsylvania named "Franklin
Township." Ordinarily, these would each have the OCD ID
ocd-division/country:us/state:pa/place:franklin
, but that would be ambiguous. So instead, we can add the county to the OCD ID so each gets its own unique OCD ID. Ex:ocd-division/country:us/state:pa/county:adams/place:franklin
Type
OCD ID types are typically specific to countries.
Some OCD IDs are common throughout the repository like
country
,region
, andplace
.However, the general guidance is to err on the side of specifying the types in a more specific way that would make sense in the context of that country.
- Example: For Admin Area 1 in the US, the types
state
,district
, andterritory
are used. - Example: For Admin Area 1 in CA, the types
province
andterritory
are used. - Example: For Admin Area 1 in PT, the types
region
andautonomous_region
are used.
- Example: For Admin Area 1 in the US, the types
ID
- In general, we want to use the same OCD ID for a district across redistricting. When choosing a
type_id
to use for a new set of OCD IDs, choose one that is most stable. Some questions to ask when determining which identifier to use are: - How likely is the identifier for a given district to change due to redistricting?
- If the same officeholder holds district X before and after its boundaries have changed from redistricting, would I represent their term as continuous?
- Are districts with identical boundaries or names across redistricting represented by the same identifier?
- Example: In the US, congressional district numbers are used for US house districts because even though their boundaries change with redistricting, the identity is strongly attached to the number and you would refer to someone as holding office for the Nth seat for X years even when that amount of years crosses redistricting boundaries.
- Example: In Canada, we want to use district names to represent federal electoral districts because although federal electoral codes exist, this identifier isn't stable because identical districts across redistricting are represented with different ids. (such as district 47012 before 2012 redistricting isn't the same district after).
Redistricting
Generally, when updating OCD IDs due to redistricting, use the set of OCD IDs that exist rather than creating a new set.
- If a district's identity (numeric ID, name, etc.) doesn't change after redistricting, use the same OCD ID.
- For the creation of new districts after redistricting, create new OCD IDs.
- For districts that no longer exist, update the
ValidThrough
field with the date redistricting went into effect. - If a district's ID is based on its name, and the district is renamed
after redistricting, create a new ID based on the new district name and
add an alias where
id = oldId
andsameAs = newId
. This canonicalizes the newId as usage of oldId maps to newId. - In the case where there is a collision between historical OCD IDs, such as a new id is identical to an abolished ID, append the
ValidFrom
year to the new ID. For more about this naming policy, see Create new files for current, abolished and renamed OCD-IDs.
- Example: In the United States, congressional districts are used in
elections for the national House of Representatives, but their boundaries
are determined by the States. So the congressional districts hang off of
the states:
Update your local copy
Any update needs to be done under the country specific directory: ocd-repository/identifiers/country-<2 letter country code>. If one doesn't exist, create it.
- If the OCD IDs CSV file already exists, and they represent old electoral boundaries, you need to update this file to include the new constituencies. To do this, add a
ValidFrom
and aValidTo
column which contains the date in YYYY-MM-DD format.ValidFrom
should be the election date for new constituencies. it can be left blank for constituencies that already exist.ValidTo
for outdated constituencies must be the day before the election.
Update Aliases
Aliasing can be used for marking OCD IDs as a representation of the same piece political geography.
- For example, if a place is both a town and a county, this could make sense.
The general principles we're trying to push:
- If two pieces of political geography are coterminous, they shouldn't necessarily be aliased with each other. Ex: at-large US congressional are coterminous with states, but this is by chance due to current US population numbers.
Aliasing can also make sense where significant changes in laws/constitutional amendments would be required to split districts.
- For example, the Washington state senate and state house districts are set to be the same by constitutional law.
If you need to, add a CSV file aliases.csv
in which we add both old OCD IDs
and their aliases. Canonical IDs can use division types that have a local
meaning and can have aliases with more familiar representations (such as,
canonical: ocd-division/country:de/land
alias: `ocd-division/country:de/state).
See issue #170
for more information.
To update aliases.csv file:
- Respect the order of columns in the aliases.csv file:
id
,sameAs
,sameAsNote
Type | Description |
---|---|
id | This column must have the aliases to the OCD IDs. |
sameAs | This column must have the actual OCD IDs for which we add the aliases. |
sameAsNote | A note that describes how or why the division has multiple identifiers. |
Example: identifiers/country-in/aliases.csv
Run the script - compile.py
Run the python script at location opencivicdata/ocd-division-ids/scripts:
- You have to specify the 2 letter country code as an argument to the script so that it knows which country’s identifiers need an update.
- Example: python3 scripts/compile.py in (for updating OCD IDs of constituencies in India).
- Python 2.x doesn't receive support. Have to use Python 3 or later versions. You can download the latest version of Python 3 here.
The script takes data from the CSV files updated in the previous step (ocd-division-ids/identifiers/country-in/*.csv), validates the values in the files, checks for any data errors (use of special characters, etc), or data duplication, and writes the new OCD IDs to the top level country CSV file.
The script throws out error and warning messages if any issues arise in the updated CSV files. Resolve the issues and run the script again.
Add a readme file
When you add a new country or a new level of coverage (for example, previously we only had an OCD ID for the country, but are now adding the first-level administrative districts), add or update a README.md file. This file must contain a quick outline of the political geography, including:
- The types that we will use and their roles (such as "
districts
are the first-level administrative area in this country"); - The relationships between types (such as "legislative districts are assigned on a per-district basis and do not cross district boundaries");
- Any notable exceptions (such as, "there is one legislative district to cover all overseas citizens"); and
- Links to any useful Wikipedia pages to help provide context for a reviewer or user.
Create a pull request
To create a pull request, use the following guidance:
- Once this process completes without errors, check the newly written top level country-<2 letter country code>.csv file to make sure it now includes the updated/new set of OCD IDs.
- Create a pull request. and add reviewers. This pull request must include modifications done in all of the following CSV files.
- The CSV files in the country specific directory. Example
- The top level country-<2 letter country code>.csv file. Example
- When the pull request is reviewed and approved by two of the country's committers, it's merged by one of the owners/collaborators of the package.