|
California Department of Health Services
Epidemiology and Prevention for Injury Control (EPIC) Branch
Violent Injury Surveillance Program
Linked Homicide File, 1990-1999
Updated August 2, 2002
Acknowledgements
The Violent Injury Surveillance Program (VISP) is funded in part by a grant from The California Wellness Foundation
(TCWF). Created in 1992 as an independent, private foundation, TCWF’s mission is to improve the health of the people
of California by making grants for health promotion, wellness education and disease prevention programs.
VISP is also funded in part by The California Endowment, established in May 1996 with the mission to expand access to
affordable quality health care for under-served individuals and communities and to promote fundamental improvements in
the health status of all Californians.
Overview
The California Department of Health Services (DHS), EPIC Branch has completed the Linked Homicide File for 1990-1999.
This file contains information on victims and circumstances of the 34,584 homicides investigated by law enforcement
agencies in California. This information comes from the Department of Justice (DOJ) Homicide File. The thing that makes
the Linked Homicide File different from the DOJ Homicide File is that the linked file contains additional information
from DHS Vital Statistics Death Record File on 32,122 of the 34,542 (93.0%) records. The two files were linked using
information common to both data sets (victim’s name and sex, date of injury and death, county of homicide and death).
By linking these two files, we combine the strengths of law enforcement reporting and medical reporting in one data
set. The DOJ Homicide File contains more information on the suspect and incident, whereas the Death Record File contains
more information on the victim. The Linked Homicide File is intended for researchers to study homicide and provide
evidence and support for the development of strategies for reducing this problem in California.
The Criminal Justice Statistics Center of the DOJ provided the electronic Homicide File and the DHS provided the
death records on a death statistical master file. The Homicide File contains information from Supplemental Homicide
Reports, which are received monthly by DOJ from all local California law enforcement agencies as part of the National
Uniform Crime Reporting system. Additional information in the Homicide File is obtained from local agency crime reports
and newspaper clippings of homicides occurring in California. DHS receives death records from the Medical Examiner or
Coroner of each county after they conduct an investigation of the death.
General Approach to Linking the Two Files
We used the software Integrity® (formerly known as AUTOMATCH®) to perform the
linkage. The linkage process consisted of the following steps: file preparation, selection of blocking variables,
selection of matching variables, frequency analysis, linkage clerical review, and file extraction. Integrity® is a probabilistic linkage program that uses the selected variables and
calculates one score for each pair of records. Each variable used in the linkage process is assigned a weight. This
weight takes into account the reliability of the variable (the conditional probability that this variable will be a
match given that the pair of records are a matched pair) and the probability of a random agreement for this variable.
The probability of a chance agreement of variables is computed by the program using a frequency analysis of both data
sets for each variable, except social security number. If a variable is effective at discriminating between matched and
unmatched pairs, the agreement weight for the variable will be large and positive, whereas the disagreement weight will
be large and negative.
During file preparation, we standardized names for cases where nicknames had been used. For example, Tim was changed
to Timothy. We then transformed names using NYSIIS (New York State Identification and Intelligence System) codes. All
fields for NYSIIS names have a fixed length of eight characters.
After the blocking and matching strategy was determined, we performed a frequency analysis to estimate the
probabilities of random agreement for each variable. Then we performed the linkage, followed by a clerical review of
possible matches. After the clerical review, we processed the next blocking pass and then performed another clerical
review. We continued this until all passes were complete.
Details of the Linkage Process
The match we performed was a many-to-one matching. We treated each record on the homicide file independently and we
allowed it to match to any record on the vital statistics file. This means that more than one record from the homicide
file could match to the same record on the vital statistics file. We used this strategy because many of the names were
similar, especially among Hispanics, and if the first match was incorrect, the record was not excluded from further
passes.
The variables we used in the linkage process were social security number, last name, first name, middle name, sex,
age, date of homicide, date of injury, date of death, and county. We used five blocking passes to narrow down the almost
infinite number of potential matches. If a pair of records met the blocking criteria, Integrity®
assigned a weight to the pair according to matching variables used in the pass. The five passes were:
- Pass 1: Social Security Number and Sex
- Pass 2: NYSIIS of last name and NYSIIS of first name
- Pass 3: Year and month of incident (Homicide file) and Year and month of death (Vital Statistics file) and Sex
- Pass 4: Year and month of incident (Homicide file) and Year and month of injury (Vital Statistics file) and
Sex
- Pass 5: NYSIIS of last name with NYSIIS of first name, NYSIIS of first name with NYSIIS if middle name, County
of homicide and County of death
We then examined further the matches determined by Integrity®. Matches had to
fall under one of the following criteria:
- Exact match on social security number AND 2 out of 3 names match
- Two out of 3 names AND date of injury/death
- Two out of 3 names AND (date of injury/death within 10 days) or (county codes match or are contiguous and age
within 10 years)
- Exact match on date of injury/death AND first or last name match AND (county codes match or are contiguous or
age within 5 years)
If the day in any date variable was unknown but the month and year match with another, we considered the two dates to
be within 10 days of each other. We used two out of three names as criteria instead of a match on first and last name
because on many records the names were in the wrong order.
Results of the Linkage
Since our goal was to link a death record to as many Homicide File records as possible we did not limit the Death
Record File to just homicides -- we used all 170,011 injury deaths from 1990-1999. From the 34,629 records from the
original Homicide File, we matched 32,236, and 161 records were possible matches requiring further review. Upon review,
we decided to exclude these cases to preserve the integrity of the matches and simplify the process for future
replication.
We found 69 death records that matched with two Homicide File cases. We reviewed these 138 cases to determine which
of each pair was more likely to be an actual match. After deleting death record data from the 69 duplicates we had
32,167 matches.
We then deleted 45 cases that had a death year before 1990 and could not be contradicted by a matching death record
with a death year of 1990 or later. Four of these cases were matched to a death record with a death year before 1990.
That left us with 34,584 total cases and 32,163 matches.
Later we realized the file contained cases not considered "actual homicides". We deleted these 42
cases (41 of which had been linked with death records) of "Death in Custody - not reported on SHR". This
brought the new total of cases in the Linked Homicide File to 34,542 with 32,122 matches, for a matching rate of 93
percent. The documentation, download file, and online dataset were all updated on August 2, 2002.
Using the Data Set
The data are provided in a SAS data set readable by version 7 or later. Also included is a list of Frequently Asked
Questions and a codebook. Take time to give the codebook a close look. Some variables are not available for every year.
For instance, “domestic violence” as a precipitating event was introduced in 1992 so for 1990 and 1991 other codes
would have been used as appropriate. “Drive by shooting” was introduced in 1996. “Gang member” was introduced as
a relationship in 1992.
Another reason to familiarize yourself with the codebook is that certain fields may overlap in the information they
provide. For example, when studying intimate partner violence, do not look only at cases that are coded with “domestic
violence” as the precipitating event (PrecipEvent1, PrecipEvent2, PrecipEvent3). Some cases coded as domestic violence
involve people who may live together, but are not intimate partners (e.g., child kills parent). Use the relationship
variables (Relation1, Relation2, Relation3, Relation4) to select the relationships you wish to define as intimate
partner violence. Similarly, when studying child abuse do not rely on either the precipitating event field or the
relationship field alone as separately they may not capture exactly what you wish to study. These are just a couple of
the many nuances you may discover in this file. Please alert us of information you think would be useful to other users
or questions you think should be added to our list of Frequently Asked Questions.
Suggested Citation:
California Department of Health Services, Epidemiology and Prevention for Injury Control (EPIC) Branch, Violent
Injury Surveillance Program. Linked Homicide File, 1990-1999. October 2001.”
|