Imputation of Area-Level Covariates by Registry Linking

J. Sunil Rao, Jie Fan

Research output: Chapter in Book/Report/Conference proceedingChapter


Social epidemiological research has long studied the impact of social determinants of place on health outcomes. Geocoding is a well-known technique for extracting such information by mapping geographical location to census tract and then extracting relevant information from tract-level databases. However, sometimes location information is unknown. This is often the case when using many of today's public databases (e.g., genomic data repositories). For some diseases such as cancer, statewide registries exist which provide a strategy for building a linking model between analysis observations and a reference sample drawn from the registry using variables in common to both. We detail this methodology and then show how to use this linking model together with classified mixed model prediction to impute area-level covariates for analysis observations. We study empirical performance via a series of simulations, and then perform predictive geocoding on colon cancer patients drawn (both analysis and reference samples) from the Florida Cancer Data Systems registry.

Original languageEnglish (US)
Title of host publicationHandbook of Statistics
PublisherElsevier B.V.
Number of pages19
StatePublished - 2017

Publication series

NameHandbook of Statistics
ISSN (Print)0169-7161


  • Area-level covariates
  • Cancer registries
  • Census tracts
  • Geocoding
  • Imputation
  • Mixed models
  • Prediction
  • Spatial data

ASJC Scopus subject areas

  • Statistics and Probability
  • Modeling and Simulation
  • Applied Mathematics


Dive into the research topics of 'Imputation of Area-Level Covariates by Registry Linking'. Together they form a unique fingerprint.

Cite this