Background

Matte Hartog is a Research Fellow of the academic team at the Center for International Development's Growth Lab. The center works to advance the understanding of development and deploy breakthrough research, in collaboration with external stakeholders, to address the world’s most pressing challenges. 

Teaching

Projects

  • Document intelligence and large scale probabilistic matching

    We use deep learning for optical character recognition (denoising CNN, LSTM neural net for historical fonts) and natural language processing (fine-tuned BERT, and Llama 1/2, for NER) - in combination with multimodal Transformer models (e.g. LayoutLMV3) - to analyze ~500,000 book pages on patents and R&D labs from 1850 onwards, and develop parallel computing libraries on Harvard’s supercomputing cluster to link these to billions of US Census records (using XGBoost), to study the professionalization of technological progress.
    Output: work in progress, finish June 2023.

  • The impact of COVID-19
  • Using global investment data to identify Ukraine's European economic integration opportunities

    In collaboration with the World Bank, we use data on 120 million firms worldwide combined with international trade data to analyze Ukraine's opportunities to participate in European value chains. We apply traditional gravity models and economic complexity analysis to analyze trade and foreign direct investment (FDI).
    OutputHartog, M., Lopez-Cordova, J.E. & Neffke, F., (2021) Assessing Ukraine's Role in European Value Chains: A Gravity Equation-cum-Economic Complexity Analysis Approach 
    Media coverage: This paper and follow-up work was covered by Christian Science Monitor and Bloomberg.

  • Exponential random graph models

    We develop and apply exponential random graph models to analyze the evolution of collaboration networks.
    Output
    Broekel, T. and Hartog, M. (2014), Determinants of cross-regional R&D collaboration networks: an application of exponential random graph models. In: Advances in Spatial Science. The geography networks and R&D collaborations. New York: Springer, pp. 49-80.
    Broekel, T. and Hartog, M. (2013), Explaining the structure of inter-organizational networks using exponential random graph models. Industry and Innovation, 20 (3), pp. 277-295. Available here.

  • GMM estimators for causal inference

    We develop and apply GMM estimators for causal inference of the impact of industrial compositions on employment growth.
    Output: Hartog, M., Boschma, R. and Sotarauta, M. (2012), The impact of related variety on regional employment growth in Finland 1993-2006: High-tech versus medium/low-tech. Industry and Innovation, 19 (6), pp. 459-476. Available here.

  • Proportional hazard models

    We use survival models on all banks in the Netherlands that existed between 1850 and 1993 to explain their role in acquiring other banks and driving spatial clustering.
    Output: Boschma, R. and Hartog, M. (2014), Merger and Acquisition Activity as Driver of Spatial Clustering: The Spatial Evolution of the Dutch Banking Industry, 1850-1993. Economic Geography, 90 (3)pp. 247 - 266. Available here.

  • Agents of structural transformation of regional economies

    We use social security data and administrative records covering the full population of Sweden from 1974 onwards, accessible through Statistic's Sweden supercomputing environment, to study the agents (firms, entrepreneurs) of structural economic transformation of regions.
    Output: Neffke, F., Hartog, M., Boschma, R. and Henning, M. (2018), Agents of Structural Change. The role of entrepreneurs and expanding firms in regional diversification. Economic Geography 1 (94), pp. 23 – 48. Available here.

  • Mexican Ministry of Health: Atlas of Economic Complexity

    Using data from the Mexican Ministry of Health, we created the Atlas of Economic Complexity of Mexico that serves as a diagnostic tool - for policy makers, entrepreneurs and firms - to analyze the productivity of departments, cities, and municipalities.
    Output: Mexican Atlas of Economic Complexity at http://complejidad.datos.gob.mx/

  • The role of the diaspora in the internationalization of the Colombian economy

    Using the ORBIS database, covering over 100 million establishments worldwide, in conjunction with other datasets to identify the Colombian diaspora, its resulting brain drain and consequences.
    OutputNedelkoska, L., Assumpcao, A., Grisanti, A., Hartog, M., Hinz, J., Lu, J., Muhaj, D., Protzer, E., Saxenian, A., Hausmann, R. (2021) The Role of the Diaspora in the Internationalization of the Colombian Economy. CID Faculty Working Paper Series 2021.397, Harvard University, Cambridge, MA.

  • The importance of tenure and experience for wages in Saudi Arabia

    Through on-site work in Riyadh, Saudi Arabia, with government officials and the statistical office, we analyze administrative data covering the Saudi population merged to Saudi export data, allowing us to track workers' job mobility and analyze skill relatedness between industries and the impact on tenure and wages using standard Topel models in labor economics.
    Output: Work in progress.

Software / computing

  • Polars is a very promising data analysis framework (in Python / Rust) to handle large datasets - particularly in terms memory efficiency and expressive syntax - I wrote its Emacs implementation for Jupyter consoles and notebooks here.
  • I also contribute to the EconGeo package (in R), which computes geospatial indices and complexity (network) metrics.
  • The following notebooks I wrote in Python and R as a step-by-step guide to create matrices on the co-occurence of activities and proximities, and subsequently calculate and analyze economic complexity and product complexity indices, revealed comparative advantage (RCAs) indices and product space visualizations using D3Plus.
  • With Frank Neffke I wrote the following STATA .do files which can be applied to large scale social security / administrative data to identify agents of structural transformation of regional economies.
  • For network analysis purposes, Tom Broekel and I developed exponential random graph models, which can be used to study networks where longitudinal data is of poor quality. The corresponding software is included in the online appendices of these studies:
    • Broekel, T. and Hartog, M. (2013), Explaining the structure of inter-organizational networks using exponential random graph models. Industry and Innovation, 20 (3), pp. 277-295.
    • Broekel, T. and Hartog, M. (2014), Determinants of cross-regional R&D collaboration networks: an application of exponential random graph models. In: T. Scherngell (ed.) Advances in Spatial Science. The geography networks and R&D collaborations. New York: Springer & Tensorscience, pp. 49-80.
  • HPC, To publish: parallel computing libraries for CentOS / SLURM (e.g. PyOMP), developed on Harvard's supercomputing environment Cannon, in conjunction with Python libraries on OCR and NLP to analyze (historical) (handwritten) documents.