Welcome to California California Home   CDHS Home   CDHS Comments     CDHS Search    CDHS Organization    CDHS IS REORGANIZED  Tuesday, January 06, 2009
Welcome to California - images of Golden Gate Bridge, ocean sunset, waterfall, flowers, and city skyline
CDHS Home
DHCS Home
CDPH Home
Printer Friendly Version

EPIC links

EPIC Homepage
Publications and Directories

EPICenter links

EPICenter homepage
Data Summaries
Injury Data Topics
Custom Data Tables
Data Help
Department of Health Services logo
epicenter california injury data online
Additional Detail About EPIC's Linked Homicide Data

California Department of Health Services
Epidemiology and Prevention for Injury Control (EPIC) Branch
Violent Injury Surveillance Program
Linked Homicide File, 1990-1999

Updated August 2, 2002

 Acknowledgements

The Violent Injury Surveillance Program (VISP) is funded in part by a grant from The California Wellness Foundation (TCWF). Created in 1992 as an independent, private foundation, TCWF’s mission is to improve the health of the people of California by making grants for health promotion, wellness education and disease prevention programs.

VISP is also funded in part by The California Endowment, established in May 1996 with the mission to expand access to affordable quality health care for under-served individuals and communities and to promote fundamental improvements in the health status of all Californians.

Overview

The California Department of Health Services (DHS), EPIC Branch has completed the Linked Homicide File for 1990-1999. This file contains information on victims and circumstances of the 34,584 homicides investigated by law enforcement agencies in California. This information comes from the Department of Justice (DOJ) Homicide File. The thing that makes the Linked Homicide File different from the DOJ Homicide File is that the linked file contains additional information from DHS Vital Statistics Death Record File on 32,122 of the 34,542 (93.0%) records. The two files were linked using information common to both data sets (victim’s name and sex, date of injury and death, county of homicide and death).

By linking these two files, we combine the strengths of law enforcement reporting and medical reporting in one data set. The DOJ Homicide File contains more information on the suspect and incident, whereas the Death Record File contains more information on the victim. The Linked Homicide File is intended for researchers to study homicide and provide evidence and support for the development of strategies for reducing this problem in California. 

The Criminal Justice Statistics Center of the DOJ provided the electronic Homicide File and the DHS provided the death records on a death statistical master file. The Homicide File contains information from Supplemental Homicide Reports, which are received monthly by DOJ from all local California law enforcement agencies as part of the National Uniform Crime Reporting system. Additional information in the Homicide File is obtained from local agency crime reports and newspaper clippings of homicides occurring in California. DHS receives death records from the Medical Examiner or Coroner of each county after they conduct an investigation of the death. 

General Approach to Linking the Two Files

We used the software Integrity® (formerly known as AUTOMATCH®) to perform the linkage. The linkage process consisted of the following steps: file preparation, selection of blocking variables, selection of matching variables, frequency analysis, linkage clerical review, and file extraction. Integrity® is a probabilistic linkage program that uses the selected variables and calculates one score for each pair of records. Each variable used in the linkage process is assigned a weight. This weight takes into account the reliability of the variable (the conditional probability that this variable will be a match given that the pair of records are a matched pair) and the probability of a random agreement for this variable. The probability of a chance agreement of variables is computed by the program using a frequency analysis of both data sets for each variable, except social security number. If a variable is effective at discriminating between matched and unmatched pairs, the agreement weight for the variable will be large and positive, whereas the disagreement weight will be large and negative.

During file preparation, we standardized names for cases where nicknames had been used. For example, Tim was changed to Timothy. We then transformed names using NYSIIS (New York State Identification and Intelligence System) codes. All fields for NYSIIS names have a fixed length of eight characters. 

After the blocking and matching strategy was determined, we performed a frequency analysis to estimate the probabilities of random agreement for each variable. Then we performed the linkage, followed by a clerical review of possible matches. After the clerical review, we processed the next blocking pass and then performed another clerical review. We continued this until all passes were complete. 

Details of the Linkage Process

The match we performed was a many-to-one matching. We treated each record on the homicide file independently and we allowed it to match to any record on the vital statistics file. This means that more than one record from the homicide file could match to the same record on the vital statistics file. We used this strategy because many of the names were similar, especially among Hispanics, and if the first match was incorrect, the record was not excluded from further passes.

The variables we used in the linkage process were social security number, last name, first name, middle name, sex, age, date of homicide, date of injury, date of death, and county. We used five blocking passes to narrow down the almost infinite number of potential matches. If a pair of records met the blocking criteria, Integrity® assigned a weight to the pair according to matching variables used in the pass. The five passes were:

  • Pass 1: Social Security Number and Sex
  • Pass 2: NYSIIS of last name and NYSIIS of first name
  • Pass 3: Year and month of incident (Homicide file) and Year and month of death (Vital Statistics file) and Sex
  • Pass 4: Year and month of incident (Homicide file) and Year and month of injury (Vital Statistics file) and Sex
  • Pass 5: NYSIIS of last name with NYSIIS of first name, NYSIIS of first name with NYSIIS if middle name, County of homicide and County of death

We then examined further the matches determined by Integrity®. Matches had to fall under one of the following criteria:

  1. Exact match on social security number AND 2 out of 3 names match
  2. Two out of 3 names AND date of injury/death
  3. Two out of 3 names AND (date of injury/death within 10 days) or (county codes match or are contiguous and age within 10 years)
  4. Exact match on date of injury/death AND first or last name match AND (county codes match or are contiguous or age within 5 years)

If the day in any date variable was unknown but the month and year match with another, we considered the two dates to be within 10 days of each other. We used two out of three names as criteria instead of a match on first and last name because on many records the names were in the wrong order.

Results of the Linkage

Since our goal was to link a death record to as many Homicide File records as possible we did not limit the Death Record File to just homicides -- we used all 170,011 injury deaths from 1990-1999. From the 34,629 records from the original Homicide File, we matched 32,236, and 161 records were possible matches requiring further review. Upon review, we decided to exclude these cases to preserve the integrity of the matches and simplify the process for future replication.

We found 69 death records that matched with two Homicide File cases. We reviewed these 138 cases to determine which of each pair was more likely to be an actual match. After deleting death record data from the 69 duplicates we had 32,167 matches.

We then deleted 45 cases that had a death year before 1990 and could not be contradicted by a matching death record with a death year of 1990 or later. Four of these cases were matched to a death record with a death year before 1990. That left us with 34,584 total cases and 32,163 matches. 

Later we realized the file contained cases not considered "actual homicides".  We deleted these 42 cases (41 of which had been linked with death records) of "Death in Custody - not reported on SHR".  This brought the new total of cases in the Linked Homicide File to 34,542 with 32,122 matches, for a matching rate of 93 percent.  The documentation, download file, and online dataset were all updated on August 2, 2002.

Using the Data Set

The data are provided in a SAS data set readable by version 7 or later. Also included is a list of Frequently Asked Questions and a codebook. Take time to give the codebook a close look. Some variables are not available for every year. For instance, “domestic violence” as a precipitating event was introduced in 1992 so for 1990 and 1991 other codes would have been used as appropriate. “Drive by shooting” was introduced in 1996. “Gang member” was introduced as a relationship in 1992.

Another reason to familiarize yourself with the codebook is that certain fields may overlap in the information they provide. For example, when studying intimate partner violence, do not look only at cases that are coded with “domestic violence” as the precipitating event (PrecipEvent1, PrecipEvent2, PrecipEvent3). Some cases coded as domestic violence involve people who may live together, but are not intimate partners (e.g., child kills parent). Use the relationship variables (Relation1, Relation2, Relation3, Relation4) to select the relationships you wish to define as intimate partner violence. Similarly, when studying child abuse do not rely on either the precipitating event field or the relationship field alone as separately they may not capture exactly what you wish to study. These are just a couple of the many nuances you may discover in this file. Please alert us of information you think would be useful to other users or questions you think should be added to our list of Frequently Asked Questions.

Suggested Citation:

California Department of Health Services, Epidemiology and Prevention for Injury Control (EPIC) Branch, Violent Injury Surveillance Program. Linked Homicide File, 1990-1999. October 2001.”


Back to Top of Page
© 2004 State of California | Conditions of Use | Privacy Policy