Data linkage is a dominant challenge to developers and users of decision-making information — how to get data from one source linked to data from another source for integrated data viewing or analysis. A simple example is to link census tract data from Census 2010 with census tract data from American Community Survey 2012 5 year estimates. These data are differently sourced, so the starting place is having a dataset from each source program. To link the datasets for analysis involves using the common key field, or geocode, between the datasets and creating one merged dataset.
Geocodes are structured handles that uniquely identify a geographic area. While geocodes might be sometimes be viewed as an alternative name for a geographic area, without them data linkage between differently sourced data can quickly become impossible. This section reviews the structure and use of some commonly used geocodes developing and using decision-making data.
A geocode is defined as a structured character string containing no spaces that uniquely associates with a geographic area. In the case of the two census tract datasets, the geocode would be the 11 character string comprised of the FIPS state code (2), FIPS county code (3) and Census 2010 census tract code (6).
The graphic below shows an area in Honolulu County (county FIPS code 003), Hawaii (state FIPS code 15). Census 2000 tracts are shown with a dark blue boundary. Census 2010 tracts are shown with a red boundary. Census 2000 tract 001902 is split into Census 2010 tracts 001903 and 001904. The tracts are labeled with only the 6-character tract code; but the 6-character tract codes are only unique within a county.
The full Census 2000 census tract geocode for the area labeled “001902” is 15003001902. The full Census 2010 census tract geocode for the area labeled “001904” is 15003001904. Note the geocode has no identification or attribute identifying the vintage of the area (e.g., 2000 versus 2010). The following graphic illustrates the relationship between these geocodes.
FIPS Codes and Census Geocodes
Federal Information Processing Standard (FIPS) codes have long been a standard for associating/defining geocodes for geographic areas widely in use in Federal programs. The FIPS codes have been widely adopted outside of government applications, becoming somewhat universal. FIPS codes do not cover key Census geography including census tracts, block groups and census blocks.
American National Standards Institute codes (ANSI codes) are standardized numeric or alphabetic codes issued by the American National Standards Institute (ANSI) to ensure uniform identification of geographic entities through all federal government agencies. ANSI has taken over the management of geographic codes from the National Institute of Standards and Technology (NIST). Under NIST, the codes adhered to the Federal Information Processing Standards (FIPS). ANSI now issues two types of codes. They continue to issue the commonly used FIPS codes, although the acronym has now changed to Federal Information Processing Series, because it is no longer considered the standard. They also issue the Geographic Names Information System (GNIS) Identifiers, which were established by the United States Geological Survey (USGS). Links to FIPS codes, which are the codes most commonly used by the Census Bureau, most other Federal statistical agencies, and thus most non-government data developers/users.
Relating Geography: SF1 header
Census geocodes for the decennial census can be determined in a geographic-relationship manner using the Summary File 1 (SF1) geographic header record/file. This datasest contains a record for each census tabulation block. In each record, the corresponding array of associated geocodes, pertaining to where that block is located, are provided (e.g., for that block the census tract, city/place, school district, county, ZIP code area, congressional district, metro/CBSA, etc.). As an example, it is possible to determine the geocodes for the city/places intersecting with a county or school district using this file.
The SF1 has the limitation that the codes are as of the decennial SF1 tabulation vintage. As a result, the initial Census 2010 SF1 contains codes for the 112th Congressional Districts, 2009 vintage metros/CBSAs, and cities/places as of 2010.
Merging Subject Matter Data into Shapefiles
Shapefiles are often/typically developed in a manner that includes no subject matter. For example, most Census Bureau shapefiles do not include any statistical data beyond geographic area (size) and latitude-longitude. To merge demographic-economic data into a shapefile, thus enabling use of the shapefile to develop thematic pattern maps, requires linking the shapefile dbf (e.g., Census 2010 census tract boundary file) with a subject matter dbf (e.g., Census 2010 demographics or ACS 2011 demographics). The subject matter dbf is linked to the shapefile dbf using a common key which is the geocode. The geocodes must be identically defined and of the same vintage.
Join in one of the upcoming Web sessions on Using Shapefiles where applications that illustrate use of geocodes are reviewed.