Large Datasets and You: A Field Guide


Blackwell, Matthew, and Maya Sen. 2012. “Large Datasets and You: A Field Guide.” The Political Methodologist 20 (1): 2-5. Copy at
blackwell_sen_tpm.pdf212 KB


The last five years have seen an explosion in the amount of data available to social scientists. Although a blessing, these extremely large sources of data can cause problems for political scientists working with standard statistical software programs, which are poorly suited to analyzing big data sets. In this essay, we describe a few approaches to handling extremely large datasets within the R programming language, both at the command line prior to R and after we fire up R. We show that handling large datasets is about either (1) choosing tools that can shrink the problem or (2) fine-tuning R to handle massive data files.

Last updated on 12/04/2012