Research



My research can be roughly categorized under the umbrella theme of mathematical and computational social sciences. My mind is trained mathematically for a long time, but I am also deeply motivated by problems that involves the human psyche and culture. There is a dialectical relationship between the behaviors of individuals and their shared ideas. These shared ideas include culture, beliefs, etc.. We can subsume these ideational components under the overarching term software. The said dialectical relationship is sometimes eloquently put in the following structure: individuals impact software impacts individuals. This is also closely related to the so-called structure-agency problem that social scientists and philosophers have been grappling with for a long time. Another semantic fun is that some of the components of our software have been described as "structured structures predisposed to function as structuring structures".  I also like to study this individual-software feedback loop that is constantly running in our life, even when we sleep.


One part of our software is our social networks. True that some elements of our networks have physical and biological basis, such as genetic relation or physical proximity. But most of our networks, such as friendships, are actually shared beliefs. If one of the two friends suddenly stops acting according to the friendship script, there is no friendship. That is even true for kinship. True that there is biological basis for the definition of these links, but it is in software that the existence of such a relation is incorporated into decision making and conduct.


When I was a master's student in physics, I learned about the existence of Network Science, and that some of its scholars are trained in physics and math but work on socially-relevant problems. Glad of this realization, I spent my PhD working on the mathematics of social dynamics on networks. I also took a great course from the department of sociology, 'Social Networks', in which I learned much about the theoretical aspects of social networks, social capital, and also social stratification. I began attending courses from that department and began collaborating with sociologists. Immediately after my PhD, I obtained an MA in sociology to strengthen the social-scientific component of my research. I also expanded my collaboration network.


The following summaries provide upshots into some of the projects I have been working on.
 

 

Computational test of biases in the mass media
 

Since 2015, I have been collaborating with Eran Shor at McGill University and Arnout van der Rijt at Utrecht University. We are working on the problem of quantifying biases in the mass media in the fields of sports, entertainment, politics, and science. Many studies in the literature investigated biases in these contexts along the dimensions of race, gender, and others, but most of them only compared media coverage. We is that we also account for audience demand. We study the coverage of over 2000 US newspapers on over 50,000 individuals. We operationalize audience demand using data from the social media and Wikipedia.

The following is an example depiction of the coverage distribution for a matched sample of about 10,000 individuals in the entertainment industry (e.g., actors, singers, etc.). I have plotted three example levels of audience demand for illustrative purposes.

Entertainment: Coverage distribution, matched on Demand&Age

 

 

 

 

Evolution of human collective cooperation
 

A question that pervades many social and behavioral disciplines is how human cooperation emerged and evolved and was sustained? How does the coexistence of the altruistic and the selfish tendencies of humans ramify into macro-scale societies? Different fields have contributed offering different insights, each shedding light on an aspect of the problem from its own angle. This is a multi-faceted problem, ripe with profundity and challenges for scholars with almost any background. I have been working on this problem since I joined PED in 2016.

The first problem I worked on involved an interesting multidisciplinary collaboration. It was a collaboration between the postdocts of Shing-Tunng Yau (as in, e.g., the 'Calabi–Yau manifold'), who won the Fields Medal for his prominent work in differential geometry, and my adviser, Martin A. Nowak, who is a leading mathematical biologist. As a result of this joining of minds, we developed a new framework for the analytical study of the evolutionary game theory on any network. This development was built on a duality between the problem at hand with that of coalescing random walks on networks. The story of this paper can be found in this article by my collaborator and friend Ben Allen. While I and Naghmeh Momeni were investigating the cooperative properties of many generative network models, we discovered network structures that we dubbed 'super-promoters' of cooperation. These networks have the rare property that their critical benefit-to-cost ratio is smaller than their average degree, which makes them extra-suitable for the evolution of cooperation. One of them is the 'clique of stars':

 

starClique



We also classified small graphs in terms of their merit for cooperative outcomes. For example, the figure below depicts every simple graph of size 6. As can be seen on the top, there are four graphs (black) who does not promote cooperation regardless of the benefit-to-cost ratio. There are also those who do promote cooperation over defection but only at a very high benefit-to-cost ratio (purple). The blue graphs are good for cooperation. For size 6, the chain graph (where the six nodes are situated on a line) is the best promoter of cooperation (but in our explorations in the realm of simple graphs we later found that this result does not hold for general network size). There are also graphs who, not only not promote cooperation, but they promote spite, in which individuals are willing to pay a cost to reduce the payoff of others. So the red graphs in this figure are in a sort of a hell of spite. 

 

n6

 

 

In the second problem I worked on, I juxtaposed the above-discussed coalescent theory framework with my knowledge from the social sciences regarding social capital and social network theory. As a result of this work, in a recent paper we systematically demonstrate the efficacy of bridging ties between communities in the promotion of collective cooperation. We found that cooperation thrives as a consequence of the presence of network brokers and individuals who are referred to as boundary spanners in the organizational behavior literature. These nodes establish long-range ties which sew the patches of the network together and thereby promote collective cooperation. A society devoid of such long ties will face paucity of civic engagement, economic growth, and generalized trust, depending on the context. This article in Scientific American gives an overview of our findings. We found exact expressions for the critical benefit-to-cost ratio of different interconnection schemes for various structures. For example, the following example settings involve the interconnection of uncooperative cliques.

Conjoining

 


 

Challenges of sampling offline social networks
 

Any empirical study that involves networks needs to observe and measure the network structure of interest. This comes up in diverse contexts; from the Internet architecture to that of the gene regulatory network within cells. Like other data gathering procedures, network sampling is concerned with inferring the network structure from a set of observations. Different contexts have different challenges for network sampling. Offline social networks have many economic and practical constraints. Usually the network of interactions and relations between people is observed via interviews and surveys. The more data one seeks to gather from the respondents, the more time needed. Respondent fatigue is a major obstacle because it engenders nosie and other biases in the data. Moreover, socio-centric data is preferred to ego-centric. That is, inferring the properties of the network structure would be more reliable if we track the names that each respondent mentions, find them and interview them also, then track their list, until saturation. This is prohibitively costly. So for reasons of economy and to circumvent respondent fatigue, many social network studies have employed the so-called Fixed Choice Design (FCD), where each respondent is only asked to name a fixed number of ties.

For example, the following figure illustrates the FCD procedure in a simple schematic nutshell. Suppose we randomly select q=2% of the population and ask each of these respondents (egos) to nominate two friends (alters). In this synthetic example setup, the original underlying network has 300 nodes, from which 6 are chosen for interview. We can construct a network based on the names of the egos and their mentioned alters. One e-e tie is observed (i.e., a respondent mentioned an alter who turned out to be among the respondents), 11 e-a ties (ties to alters who are not chosen as respondents), there are 9 open triars (Λ structures), and one triangle. How do we infer the structural properties of the underlying network (on the left) from the crude network that is observed (on the right)?

I solved this problem analytically, and found closed-form solutions to estimate several important network statistics (including the first and second moments of the degree distribution, network size and density, and average clustering coefficient). The paper is submitted. A rather old preprint can be found here.  A more recent preprint can be found here, and the supplementary material is here.

fcd

 

 

 

Statistical methods for disease co-morbidity networks
 

As the human sciences get more data-oriented, network science opens its way into new terrains, along with other approaches (such as machine learning). Medicine is of the fields of study in which network approaches are  being adopted (see here and here, as instances of these ). Some of these network-based approaches focus on the networks of diseases, that is, networks whose nodes are disease and whose links signify some relation between the diseases. Examples include genetic/interactomic association, and similarity of symptoms, etiological factors, or pertinent drugs. One network-based approach is to construct the disease network using the longitudinal in-patient data that hospitals retain. This allows the study of temporal co-morbidity patterns to know how developing a certain disease impact the risk of developing other diseases. But different studies in the literature adopt different methods to construct the inter-disease network, upon which the subsequent analyses rest. So I decided to write a survey paper to apply different statistical methods in the network science literature to this problem and compare their strengths and weaknesses. Since 2015, in a collaboration with David Buckeridge from the McGill Clinical and Health Informatics group, we worked on a longitudinal dataset of about one million people spanning over 16 years. The paper can be found here.

 

The figure below depicts the local network of a dense cluster of pregnancy-related diseases (blue) and their statistically-significant co-morbidity partners (orange). The dictionary for the three-digit disease codes (ICD9) can be found here.

 

disease

 

 


 

Social networks and health
 

Social networks describe the pathways of interaction and influence between people in the society. A practical application that motivates scholars from the social and health sciences to study social networks is how to utilize our knowledge of the social network structure to promote health-related awareness. Since 2010, I have been collaborating on this project with Jack Sandberg from George Washington University, Steven Rytina from McGill University, and other colleagues. Located in the Niakhar region of the African country of Senegal, this project aims to study the effect of social networks on social learning and diffusion regarding health behavior and beliefs. In the data gathering phase, I developed an search algorithm to be used in the field by the interviewers for the real-time identification of individuals in a noisy database using the respondents' noisy and incomplete inputs. We had a government-retained database of names and it was fun to develop automated methods to go through the prodigious amount of noise in order to excavate meaningful results. The following is a network of names, where each link means that the Jaro-Winkler similarty index for the two names or name spellings is above 0.9.

 names

After years of development, the data collection phase is finally over. The social network data has uniquely rich and promising features. It is a socio-centric multiplex network with 15 layers for about 1000 individuals. More details can be found on the project website nsnhp.org.

 

 

 

 

Growing networks
 

One of the seminal problems that led to the rapid growth and popularity of network science was that of network growth. Much attention was attracted to models that were propounded to offer intuition into how the structural properties of networks evolve over time as new incoming nodes are added to the network. These analyses were confined to the `steady-state' limit of the system, that is, the long-time equilibrium. In this limit, many effects of the initial substrate network has also vanished. For my PhD thesis, I set out to solve these problems for arbitrary times.

Suppose we have a network with degree distribution P(k), and it starts growing via the successive addition of new nodes. Each incoming node attaches to m existing nodes in the network. What is the expected degree distribution of the network at (arbitrary, not necessarily large) time t? This was the first problem I solved in this project.

The solution for preferential attachment is:

 

pkt3

 

We solved this problem for uniform attachment, too. We also solved the generalized version of this problem for multiplex networks




 

Degree correlations
 

Another important quantity in the mathematical modeling of dynamical processes on complex networks is the nearest-neighbor degree distribution (NNDD). Denoted by p(ℓ|k), it characterizes the connectivity of neighbors of nodes. That is, suppose we select a random node and its degree is k. We ask the question: if we pick a random neighbor of this node, what is the probability that the selected neighbor has degree ℓ? The answer is given by p(ℓ|k). Consider a growing network, in which nodes sequentially enter the network and establish m links to existing nodes with probabilities proportional to their degrees. This is the basic setup of preferential attachment. I found the NNDD of networks generated by preferential attachment:

       p(L|k)



I also solved this problem for arbitrary time. The full expression for p(ℓ|k) as a function of time is presented in this paper.

 

 

 

Inter-layer degree distribution in multi-layer networks
 

In social networks, there are more than one type of relationship between people. We can envisage different networks between the same set of people, each representing a distinct type of relationship. For example, for a given set of people in a lab, there can be networks of co-authorship, kinship, friendship, cohabitation, geographical propensity, co-supervision of students, and citation (citing the papers of one another).
 
Multiplex networks are multi-layer depictions of systems with multiple types of ties. Each layer stores the same set of nodes, but a different set of links. Real networks evolve, and the evolution of layers are coupled to one another. When scholar X write a paper, the likelihood that X cites a paper by scholar Y is influenced by their connectivity some other layers. That is, in addition to the reputation of Y (which pertains to Y's connectivity in the layer of citations), it also matters if X and Y are connected in the friendship layer, or the co-supervision layer. Models of evolving and growing multiplex networks emulate this coupling feature of the connections.
 
In this paper, we obtained closed-form solutions for growing multiplex networks under uniform and preferential growth. Suppose there are M layers, and each incoming node on average connects to βmexisting nodes in layer m. We can generalize the notion of degree to the multi-layer setting, and define the degree vector of node x, denoted by kx, whose m-th component is the degree of node x in layer m. Let us denote the fraction of nodes whose degree vector is k in the long-time limit as n(k). Our solution for n(k) under preferential attachment is as follows:
 
multi-pref

 

If the attachment mechanism of incoming nodes is uniformly at random, then the solution is:

multi-unif

 

 

 

 

Models of social diffusion and influence on networks
 

A class of problems that scholars from mathematics, physics, and computer science have been working on seriously for about two decades is dynamical processes taking place on graphs. Nodes are assumed to have states, and the states of nodes affect one another, modeling some sort of interaction. These models are used for diverse applications. Some of them are motivated by socially-relevant problems such as diffusion of information and influence across social networks. During my PhD, I worked on these models. In the early days of my immersion in the social sciences, I published papers on the effects of social networks on migration decisions, and the effect of social hierarchies in the dynamics of influence. I also worked on generalizations of the so-called voter model (which is a zeroth-order intuitive model of social influence) to account for individual differences in levels of clout, and to account for stubborn individuals and the effect of exogenous stimulation (e.g., partisan media).