Abstract:
In this paper, we illustrate that combining ecological data with subsample data in situations in which a generalized linear model (GLM) is appropriate provides two main benefits. First, by including the individual level subsample data, the biases associated with ecological inference in GLMs can be eliminated. Second, available ecological data can be used to design optimal subsampling schemes, so as to maximize information about parameters. We present an application of this methodology to voter turnout studies showing that small, optimally chosen subsamples can be combined with ecological data to generate precise estimates relative to a simple random subsample, and we discuss possible applications in epidemiology.