Some links on useful datasets and miscellaneous items.  I have not updated these materials in recent years.


Data on China

Basic situation


Lohmar, Zhang and Gale (2004) provide some introductory words on household data in China. As for large-scale survey work, there has been data collection on agricultural production since 1949, but that data is scant and prone to biases. Since 1977 an annual survey of 68,000 households (with a 3-year rotation) has been organized by the National Bureau of Statistics. The Ministry of Agriculture surveys 21,000 households yearly in a panel since 1986.

Lohmar, B., Zhang, L. and Gale, F. (2004) "New Opportunities for Economic Analysis with Rural Household Data in China" (accessed February 5, 2006)


Data links


China Data Center at the University of Michigan

  • GIS data on various levels, statistical yearbooks, some historical data 

China Survey Data Network

  • Joint effort by CCER at Beijing University and the China Data Center at the University of Michigan
  • Hosts a number of useful surveys

China Dimensions at the Socioeconomic Data and Applications Center (SEDAC)

  • Mainly GIS-coded data down to the county level
  • Generally data from the early 1990s, though some longer series are available
  • Holds basic maps, and some on economics and public health

China Health and Family Life Survey (CHFLS) at the University of Chicago

  • Survey on sexual behaviors in China 1999-2000
  • Covers 18 provinces, 83 communities with 5,000 individuals

China Health and Nutrition Survey (CHNS) at the University of North Carolina at Chapel Hill

  • Includes panel data from waves in 1989, 1991, 1993, 1997, 2000, and 2004 (2006 is funded at this point)
  • One of the most widely used surveys for micro-level research on China
  • Still ongoing; revisions and new data are posted occasionally

China Population Information and Research Center (CPIRC)

  • List of various datasources starting in 1982. Mainly census data and supplements
  • Data needs to be purchased

Gansu Human Capital Project at Penn

  • Links to several recent or ongoing surveys: Gansu Survey of Children and Families, Rural School networks, and Human Capital in China’s Urban Labor Market

World Bank China LSMS

  • Basic LSMS for Hebei and Liaoning provinces 1996-1997
  • 780 households in 31 villages
See also: China, Data, Economics

Data on Germany

There is plenty of microdata on Germany, but much of it is limited to particular applications. The best sources I know of are the GSOEP and the Campus files. There are moves to make more data available as public or scientific use files; some datasets are available on-site at the national and state statistical offices. Let me know if you have additions.


  • German Socio-Economic Panel (GSOEP)
    • Main longitudinal socio-economic survey, has a comprehensive English user guide
    • Started in 1984 with now 12,000 households. Nationally representative
    • Includes information on household consumption, occupations, employment, earnings, health and satisfaction
    • (Information on health and insurance status is limited)
  • Luxemburg Income Study (LIS)
    • Combines national surveys on income, wealth and employment that can be used for cross-section studies (i.e. these are not panels)
    • Covers 30 countries in 4 continents. LIS harmonizes and standardizes the data
    • Registered users can send in programs to use the data remotely
  • Statistische Aemter des Bundes und der Laender
    • Wide range of microdata from the official data collection agency, available as public-use files or scientific-use files and on-site use
    • The CAMPUS data (a 3.5% sample of the 1998 microcensus) is directly available for download. You can also get a microcensus panel from 1996-1999 as scientific-use file or for on-site analysis
    • Some of the data has information on health; there is also good information on hospitals         

German health system

Germany’s health insurance serves as model for social health insurance projects world wide. In fact, as institution the social health insurance has been surprisingly successful and resilient over time – it celebrated its 125th birthday in 2008. The system has its problems and since the early 1990s a wave of progressively more fundamental reforms was enacted. A less well known fact is that Germany also has a private health insurance sector which covers about 10 percent of the population. The private system is fundamentally different from the social health insurance in aspects from financing to quality, and covers mostly civil servants and high-income earners. That system was also affected by the recent round of reforms. 

Up-to-date literature on the German health care system is somewhat rare, especially with regards to details. The recent reforms have changed fundamental elements of the system and so far I have not found a good summary of them all, in either German or English. Below some useful background, comments welcome.



Recent reforms




Background on the system




Empirical research


See also: Germany, Health

Household datasets for development economic research

This list contains links to micro and macro-level data, mainly related to economics. I have not updated this list for a while.


Micro-level data


  • Demographic and Health Surveys (DHS)
    • Probably the best source for demographic and health data; comparable surveys for some 70 countries. Very good coverage for Africa
    • Survey data on request; summary statistics and good documentation openly accessible on website
    • Some countries have specialized surveys on HIV, gender etc. Some newer ones also have GIS coordinates
  • RAND Family Life Surveys
    • In-depth surveys on Malasia, Indonesia, Guatemala and Matlab (a region of Bangladesh)
    • Also holds other household, firm and national economic surveys for Indonesia
  • International Food Policy Research Institute (IFPRI)
    • Mainly household and community-level surveys related to food security and production
    • Also has some institutional data, social accounting matrixes and geospatial data
    • Need to request data through an online form


Not-only-economic data



Macro-level data


  • World Bank PovcalNet
    • Interactive tool to calculate your own poverty lines and compare them across countries
  • United Nations Common Database (UNCDB)
    • UN's statistical database with more than 300 statistical series from UN and other International Organizations
    • Link is to access point for Harvard affiliates


Other sources


  • DPLS Internet Crossroads
    • Data depository at the University of Wisconsin Madison
    • Long and uncategorized list of links includes statistical offices of several smaller countries and organizations
  • HU-MIT Virtual Data Centre System
    • Great source for social sciences (including and beyond economics), keyword searchable
    • Holds the World Value Surveys, among other things
    • Need to login in top-right corner with Harvard ID

Intestinal worms

Intestinal worms are very common in developing countries, particularly among children. They have some of the largest burden of disease in those countries although they are cheap and effective to treat.

Here some links on the issue. My Master's thesis on "Scaling-Up and Mainstreaming: Increasing Impact of International Child Supports (ICS) De-Worming Program in Western Province, Kenya" is available on request. I also have an extensive bibliography which was last updated in March 2005

Articles on de-worming


  • Brooker, S. (2006). "Spatial Epidemiology of Human Schistosomiasis in Africa: Risk Models, Transmission Dynamics and Control." Transactions of the Royal Society of Tropical Medicine and Hygiene 101(1): 1-8

    • Great paper on spatial epidemiology and mapping; also has a map of schistosomiasis intensities in East Africa

  • Fenwick, A. (2006). "New Initiatives against Africa's Worms." Transactions of the Royal Society of Tropical Medicine and Hygiene 100(3): 200-207.

    • Policy-paper on deworming, with a focus on the Schistosomiasis Control Initiative

  • Kabatereine, N. B., E. Tukahebwa, et al. (2006). "Progress Towards Countrywide Control of Schistosomiasis and Soil-Transmitted Helminthiasis in Uganda." Transactions of the Royal Society of Tropical Medicine and Hygiene 100(3): 208-215.<

    • Details on the Ugandan national deworming program

  • Wang, L., Utzinger, J. and Zhou, X.N. (2008). "Schistosomiasis Control: Experiences and Lessons from China." The Lancet 372(9652): 1793-1795.

    • Worms in China


Information on de-worming and Western Kenya


  • Other materials on worms

    • All about schistosomiasis and helminth infections at the Schistosomiasis Control Initiative). Check out their publications list which includes some important papers on implementation and policy approaches. SCI is involved in the successful Uganda program.

    • A flyer linking de-worming to the Millenium Development Goals

    • Relation of school-aged to community-level infections: Guyatt, H. L., S.Brooker, et al. (1999). "Can prevalence of infection in school-aged children be used as an index for assessing community prevalence?" Parasitology 118:257-268.

    • On evaluation, and possible monitoring indicators: Gyorkos, T. W. (2003). "Monitoring and evaluation of large scale helminth control programmes." Acta Tropica 86: 275-282.

Research productivity tricks

  • Running Stata on a Unix/Linux server and getting automatic email updates
  • Outputting tables from Stata
  • Automatically generating pdfs of Stata tables and graphs
  • Better do-file editors for Stata


Running Stata on a Unix/Linux server and getting automatic email updates


For large datasets or long calculations, Stata can run substantially faster on a server. It also frees up your computer and minimizes the impact of power failures since servers tend to have backup power. To run a Stata do-file on a Unix/Linux server you can type

stata -b 'yourdofile.do' &

The -b tells Stata to run the do-file in batch mode. The ampersand & tells the server to keep running the job even if you are logged out – you can turn off your desktop while the server keeps working. Use statamp if your server offers the multi-processor version of Stata.


Email when Stata job is completed


A useful twist is to ask the server for an email once the job is done, so you don't have to check all the time. Here one approach that puts "done subject" in the email's subject line and "done body" in the message body. Note the double ampersand && and the pipe |.

stata -b 'yourdofile.do' && echo "done body" | mail -s "done subject" [youremail @ yourhost .com] &

This can be combined with other technology. For instance, many cell phone carries allow you to send a SMS message by email. For T-mobile in the US this works by emailing [yourcellnumber @ tmomail .net]. You can use that email address to receive a text message from the server when your Stata job is done. To stop a Stata job that is already running first get a list of your current jobs under your user name and then terminate the job using its process ID number.

  • If you are still logged in: type jobs to see the list and kill % followed by the job number.
  • If you were disconnected: type ps ux to see the list and kill -9 followed by the job/PID number. 

You can run multiple do-files either by re-running the Unix command for each file, or run them sequentially by using Unix to call a single do-file which in turn calls additional files (i.e. calling do-files from within a do-file). Unix can also attach a file when sending the email, for instance the log-file from your Stata output. Unix has various mail programs (mail, mailx etc) that are quite flexible, just search around for the correct syntax.


Email updates while Stata job is still running


You can also invoke the Unix mail program from within your Stata do file. Use the following line in your Stata code (no ampersand & needed but note the "!" shell escape):

!echo "done body" | mail -s "done subject" youremail@yourhost.com

Now Stata will ask Unix to send you an email. This is useful for getting messages when your code reached a certain point in the Stata code but is not completely finished. You could also ask for multiple emails at various points to monitor progress, maybe by changing the body or subject line accordingly.


Outputting tables from Stata


Generating use-able tables from Stata can be a painful experience. A particularly useful user-writte programs is Ben Jann's -estout- which allows to produce tables in text, table or tex format. Check the estout website for examples and details. Aside from regression tables, estout has subroutines to produce tables of almost everything – for instance estpost can tabulate summary statistics from summarize, tabulate and tabstat.

Latex users can fully leverage estout's flexibility. You can specify your own table headers and footers, and in this way use multicolumns and additional column headers. It is very worthwhile to spend time writing the code for producing exactly the table you want. If you have to make any changes you save the manual formatting.


Automatically generating pdfs of Stata tables and graphs


Stata doesn't output pdf files directly but you can use the shell escape to compile latex files as pdf documents. Normally you would include tex files produced by Stata in your main file and then compile that file manually. But you can ask Stata to call pdflatex and do this for you. You end up with a readily pdf'ed file.

For this trick you need a functioning latex setup on your computer or server. The following example shows how to automatically compile a pdf with a tex table from Stata that was outputted with -estout-, and an eps graph. First, set up the tex file you want to compile, using \input to include your table and \includegraphics for your graph. 

    % Latex file myoutput.tex




    \section{My table}


    \section{My graph}




    Then write your Stata program to produce the table mytable.tex and the graph mygraph.eps. At end of the do-file you will call pdflatex via the shell escape "!" to compile the document. If you save your graphs as eps files you will have to convert them to pdf first via epstopdf. The option -no-shell-escape is for pdflatex, not Stata. 

      * Stata do-file

      sysuse auto

      eststo mytab: reg mpg weight length

      esttab mytab using "mytable.tex", style(tex) replace

      scatter mpg length

      graph export mygraph.eps, replace

      !epstopdf mygraph.eps

      !pdflatex myoutput.tex -no-shell-escape


      Of course you can add anything your tex file handle and pick your favorite ways of including graphics and the like. You could even recompile your entire paper from within Stata although you might want to check your numbers carefully.


      Better do-file editor


      Until Stata 10 the do-file editor was just a basic text editor; from Stata 11 this has been much improved. You can also use an external editor to interact with Stata which has the additional advantage is that the editor will remain open if Stata crashes. The editor can also double for other programs (e.g. latex editing) and functions (comparing multiple versions of do-files).

      There are many good editors out there. Friedrich Huebler maintains instructions for integrating various editors with Stata. I previously used WinEdt ($30 for students) and am switching to the open-source Notepad++ (NPP). Both editors have many other functions and can be programmed with macros.

      Both provide syntax highlighting and automatic backup and can run Stata code straight out of the editor. NPP is free and has a number of useful plugins (most easily installed from within NPP) such as NppFTP which allows you to open remote documents directly in the editor. It also allows code-folding. On the other hand, WinEdt can automatically close pdf documents in Acrobat Reader which is useful if you're recompiling latex documents. It integrates easily with MikTEX and has a toolbar with latex symbols.