On the day I met Brett Ostrum, in a conference room in Redmond, Wash., he was wearing a black leather jacket and a neat goatee, and his laptop was covered with stickers that made it appear you could glimpse its electronic innards. That was logical enough, because those circuits were his responsibility: He was the corporate vice president at Microsoft in charge of the company’s computing devices, most notably Xbox and the Surface line of laptops and tablets.
It was early 2018, and things were going pretty well for him. Despite Microsoft’s lineage as a software company, and as a brand not exactly synonymous with good design, it was making the most of its late start in the hardware business. Mr. Ostrum and his team were winning market share and high marks from critics.
But he saw a problem on the horizon. It came in the form of extensive surveys Microsoft used to monitor employees’ attitudes. Mr. Ostrum’s business unit scored average or above average on most measures — except one. Employees reported being much less satisfied with their work-life balance than their counterparts elsewhere at the company.
Employers are monitoring their workers more often and using more tracking tools than ever. What's surprising is that a growing number of employees don't mind.
Advancements in technologies―including sensors, mobile devices, wireless communications, data analytics and biometrics―are rapidly expanding monitoring capabilities and reducing the cost of surveillance, and that's prompting more employers to use these tools.
In 2015, about 30 percent of large employers were monitoring employees in nontraditional ways, such as analyzing e-mail text, logging computer usage or tracking employee movements, says Brian Kropp, group vice president of HR practice for Gartner, a research and advisory firm. By 2018, that number had jumped to 46 percent, and Gartner projects it will reach well over 50 percent this year.
Barbara Grosz has a fantasy that every time a computer scientist logs on to write an algorithm or build a system, a message will flash across the screen that asks, “Have you thought about the ethical implications of what you’re doing?”
Until that day arrives, Grosz, the Higgins Professor of Natural Sciences at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS), is working to instill in the next generation of computer scientists a mindset that considers the societal impact of their work, and the ethical reasoning and communications skills to do so.
“Ethics permeates the design of almost every computer system or algorithm that’s going out in the world,” Grosz said. “We want to educate our students to think not only about what systems they could build, but whether they shouldbuild those systems and how they should design those systems.”
At a time when computer science departments around the country are grappling with how to turn out graduates who understand ethics as well as algorithms, Harvard is taking a novel approach.
ON MARCH 18, 2018, at around 10 P.M., Elaine Herzberg was wheeling her bicycle across a street in Tempe, Arizona, when she was struck and killed by a self-driving car. Although there was a human operator behind the wheel, an autonomous system—artificial intelligence—was in full control. This incident, like others involving interactions between people and AI technologies, raises a host of ethical and proto-legal questions. What moral obligations did the system’s programmers have to prevent their creation from taking a human life? And who was responsible for Herzberg’s death? The person in the driver’s seat? The company testing the car’s capabilities? The designers of the AI system, or even the manufacturers of its onboard sensory equipment?
“Artificial intelligence” refers to systems that can be designed to take cues from their environment and, based on those inputs, proceed to solve problems, assess risks, make predictions, and take actions. In the era predating powerful computers and big data, such systems were programmed by humans and followed rules of human invention, but advances in technology have led to the development of new approaches. One of these is machine learning, now the most active area of AI, in which statistical methods allow a system to “learn” from data, and make decisions, without being explicitly programmed. Such systems pair an algorithm, or series of steps for solving a problem, with a knowledge base or stream—the information that the algorithm uses to construct a model of the world.
Ethical concerns about these advances focus at one extreme on the use of AI in deadly military drones, or on the risk that AI could take down global financial systems. Closer to home, AI has spurred anxiety about unemployment, as autonomous systems threaten to replace millions of truck drivers, and make Lyft and Uber obsolete. And beyond these larger social and economic considerations, data scientists have real concerns about bias, about ethical implementations of the technology, and about the nature of interactions between AI systems and humans if these systems are to be deployed properly and fairly in even the most mundane applications.
Consider a prosaic-seeming social change: machines are already being given the power to make life-altering, everyday decisions about people. Artificial intelligence can aggregate and assess vast quantities of data that are sometimes beyond human capacity to analyze unaided, thereby enabling AI to make hiring recommendations, determine in seconds the creditworthiness of loan applicants, and predict the chances that criminals will re-offend.
But such applications raise troubling ethical issues because AI systems can reinforce what they have learned from real-world data, even amplifying familiar risks, such as racial or gender bias. Systems can also make errors of judgment when confronted with unfamiliar scenarios. And because many such systems are “black boxes,” the reasons for their decisions are not easily accessed or understood by humans—and therefore difficult to question, or probe.
One day this fall, Ashutosh Garg, the chief executive of a recruiting service called Eightfold.ai, turned up a résumé that piqued his interest.
It belonged to a prospective data scientist, someone who unearths patterns in data to help businesses make decisions, like how to target ads. But curiously, the résumé featured the term “data science” nowhere.
Instead, the résumé belonged to an analyst at Barclays who had done graduate work in physics at the University of California, Los Angeles. Though his profile on the social network LinkedIn indicated that he had never worked as a data scientist, Eightfold’s software flagged him as a good fit. He was similar in certain key ways, like his math and computer chops, to four actual data scientists whom Mr. Garg had instructed the software to consider as a model.
The idea is not to focus on job titles, but “what skills they have,” Mr. Garg said. “You’re really looking for people who have not done it, but can do it.”
This is a book about models. It describes dozens of models in straightforward language and explains how to apply them. Models are formal structures represented in mathematics and diagrams that help us to understand the world. Mastery of models improves your ability to reason, explain, design, communicate, act, predict, and explore.
This book promotes a many-model thinking approach: the application of ensembles of models to make sense of complex phenomena. The core idea is that many-model thinking produces wisdom through a diverse ensemble of logical frames. The various models accentuate different causal forces. Their insights and implications overlap and interweave. By engaging many models as frames, we develop nuanced, deep understandings. The book includes formal arguments to make the case for multiple models along with myriad real-world examples.
The book has a pragmatic focus. Many-model thinking has tremendous practical value. Practice it, and you will better understand complex phenomena. You will reason better. You exhibit fewer gaps in your reasoning and make more robust decisions in your career, community activities, and personal life. You may even become wise.
Twenty-five years ago, a book of models would have been intended for professors and graduate students studying business, policy, and the social sciences along with financial analysts, actuaries, and members of the intelligence community. These were the people who applied models and, not coincidentally, they were also the people most engaged with large data sets. Today, a book of models has a much larger audience: the vast universe of knowledge workers, who, owing to the rise of big data, now find working with models a part of their daily lives.
Organizing and interpreting data with models has become a core competency for business strategists, urban planners, economists, medical professionals, engineers, actuaries, and environmental scientists among others. Anyone who analyzes data, formulates business strategies, allocates resources, designs products and protocols, or makes hiring decisions encounters models. It follows that mastering the material in this book—particularly the models covering innovation, forecasting, data binning, learning, and market entry timing—will be of practical value to many.
Thinking with models will do more than improve your performance at work. It will make you a better citizen and a more thoughtful contributor to civic life. It will make you more adept at evaluating economic and political events. You will be able to identify flaws in your logic and in that of others. You will learn to identify when you are allowing ideology to supplant reason and have richer, more layered insights into the implications of policy initiatives, whether they be proposed greenbelts or mandatory drug tests.
These benefits will accrue from an engagement with a variety of models—not hundreds, but a few dozen. The models in this book offer a good starting collection. They come from multiple disciplines and include the Prisoners’ Dilemma, the Race to the Bottom, and the SIR model of disease transmission. All of these models share a common form: they assume a set of entities—often people or organizations—and describe how they interact.
The models we cover fall into three classes: simplifications of the world, mathematical analogies, and exploratory, artificial constructs. In whatever form, a model must be tractable. It must be simple enough that within it we can apply logic. For example, we cover a model of communicable diseases that consists of infected, susceptible, and recovered people that assumes a rate of contagion. Using the model we can derive a contagion threshold, a tipping point, above which the disease spreads. We can also determine the proportion of people we must vaccinate to stop the disease from spreading.
As powerful as single models can be, a collection of models accomplishes even more. With many models, we avoid the narrowness inherent in each individual model. A many-models approach illuminates each component model’s blind spots. Policy choices made based on single models may ignore important features of the world such as income disparity, identity diversity, and interdependencies with other systems.1 With many models, we build logical understandings of multiple processes. We see how causal processes overlap and interact. We create the possibility of making sense of the complexity that characterizes our economic, political, and social worlds. And, we do so without abandoning rigor—model thinking ensures logical coherence. That logic can be then be grounded in evidence by taking models to data to test, refine, and improve them. In sum, when our thinking is informed by diverse logically consistent, empirically validated frames, we are more likely to make wise choices.
Without models, making sense of data is hard. Data helps describe reality, albeit imperfectly. On its own, though, data can’t recommend one decision over another. If you notice that your best-performing teams are also your most diverse, that may be interesting. But to turn that data point into insight, you need to plug it into some model of the world — for instance, you may hypothesize that having a greater variety of perspectives on a team leads to better decision-making. Your hypothesis represents a model of the world.
Though single models can perform well, ensembles of models work even better. That is why the best thinkers, the most accurate predictors, and the most effective design teams use ensembles of models. They are what I call, many-model thinkers.
"We have charts and graphs to back us up. So f*** off.” New hires in Google’s people analytics department began receiving a laptop sticker with that slogan a few years ago, when the group probably felt it needed to defend its work. Back then people analytics—using statistical insights from employee data to make talent management decisions—was still a provocative idea with plenty of skeptics who feared it might lead companies to reduce individuals to numbers. HR collected data on workers, but the notion that it could be actively mined to understand and manage them was novel—and suspect.
Today there’s no need for stickers. More than 70% of companies now say they consider people analytics to be a high priority. The field even has celebrated case studies, like Google’s Project Oxygen, which uncovered the practices of the tech giant’s best managers and then used them in coaching sessions to improve the work of low performers. Other examples, such as Dell’s experiments with increasing the success of its sales force, also point to the power of people analytics.
But hype, as it often does, has outpaced reality. The truth is, people analytics has made only modest progress over the past decade. A survey by Tata Consultancy Services found that just 5% of big-data investments go to HR, the group that typically manages people analytics. And a recent study by Deloitte showed that although people analytics has become mainstream, only 9% of companies believe they have a good understanding of which talent dimensions drive performance in their organizations.
What gives? If, as the sticker says, people analytics teams have charts and graphs to back them up, why haven’t results followed? We believe it’s because most rely on a narrow approach to data analysis: They use data only about individual people, when data about the interplay among people is equally or more important.
People’s interactions are the focus of an emerging discipline we call relational analytics. By incorporating it into their people analytics strategies, companies can better identify employees who are capable of helping them achieve their goals, whether for increased innovation, influence, or efficiency. Firms will also gain insight into which key players they can’t afford to lose and where silos exist in their organizations.
Most people analytics teams rely on a narrow approach to data analysis.
Fortunately, the raw material for relational analytics already exists in companies. It’s the data created by e-mail exchanges, chats, and file transfers—the digital exhaust of a company. By mining it, firms can build good relational analytics models.
In this article we present a framework for understanding and applying relational analytics. And we have the charts and graphs to back us up.
You want to know which teams are at the forefront of analytics? Just look around at the teams still playing.
Once upon a time, there was the Oakland Athletics and a sacred tome called "Moneyball." It was about baseball teams winning with statistics. Only it wasn't about that at all. It was about market inefficiency. Then John Henry bought the Boston Red Sox, hired Bill James, made Theo Epstein his general manager, and Moneyball spread to a big market.
We're several iterations past all of that. Things move fast in technology, so fast it can even carry a tradition-based industry like baseball into the digital age. These days, every team is playing Moneyball. All of them, as in 30 for 30.
"At this point, I think everyone assumes that their counterpart is smart," Brewers general manager David Stearns said. "And everyone is doing what they can do to unearth competitive advantages." To call it Moneyball is not right, either. Michael Lewis is still turning out ground-breaking work, but to fully capture what is happening in big league front offices, circa 2018, the next inside look at analytics and baseball would need to be authored by someone like the late Stephen Hawking. It's hard to say what you'd call it. "The Singularity" has already been taken.
In machine learning and deep learning we can’t do anything without data. So the people that create datasets for us to train our models are the (often under-appreciated) heroes. Some of the most useful and important datasets are those that become important “academic baselines”; that is, datasets that are widely studied by researchers and used to compare algorithmic changes. Some of these become household names (at least, among households that train models!), such as MNIST, CIFAR 10, and Imagenet.
We all owe a debt of gratitude to those kind folks who have made datasets available for the research community. So fast.ai and the AWS Public Dataset Program have teamed up to try to give back a little: we’ve made some of the most important of these datasets available in a single place, using standard formats, on reliable and fast infrastructure. For a full list and links see the fast.ai datasets page.
fast.ai uses these datasets in the Deep Learning for Coders courses, because they provide great examples of the kind of data that students are likely to encounter, and the academic literature has many examples of model results using these datasets which students can compare their work to. If you use any of these datasets in your research, please show your gratitude by citing the original paper (we’ve provided the appropriate citation link below for each), and if you use them as part of a commercial or educational project, consider adding a note of thanks and a link to the dataset.
An algorithm that was being tested as a recruitment tool by online giant Amazon was sexist and had to be scrapped, according to a Reuters report. The artificial intelligence system was trained on data submitted by applicants over a 10-year period, much of which came from men, it claimed.
Reuters was told by members of the team working on it that the system effectively taught itself that male candidates were preferable. Amazon has not responded to the claims.
Reuters spoke to five members of the team who developed the machine learning tool in 2014, none of whom wanted to be publicly named. They told Reuters that the system was intended to review job applications and give candidates a score ranging from one to five stars.
"They literally wanted it to be an engine where I'm going to give you 100 resumes, it will spit out the top five, and we'll hire those," said one of the engineers who spoke to Reuters.
In today's world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the datarequired for their work and their stories, or simply to satisfy their intellectual curiosity.
Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page. To create Dataset search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. Our approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem.
When colleges try to understand their students, they resort to a common tool: the survey.
And surveys are fine, says Dayna Weintraub, director of student-affairs research and assessment at Rutgers University at New Brunswick. But she also recognizes their drawbacks: poor response rates, underrepresentation of particular demographic groups, and, in certain instances, answers that lack needed candor.
And so, to assess and change student conduct in a more effective way, Weintraub and her colleagues have tried a new approach: find existing, direct, and detailed data on how Rutgers students conduct themselves, and combine them.
Leading the effort was Kevin Pitt, director of student conduct at the New Jersey university. Working alongside Weintraub, he and his team analyzed, with granular specificity, the behavior patterns of students in a variety of contexts: consuming excessive alcohol or drugs, in questionable sexual situations, and others. Pitt and his team examined student-level trends within those areas, combining a variety of previously siloed databases to sketch a more-informative picture of student life at Rutgers.
It seems like every business is struggling with the concept of transformation. Large incumbents are trying to keep pace with digital upstarts., and even digital native companies born as disruptors know that they need to transform. Take Uber: at only eight years old, it’s already upended the business model of taxis. Now it’s trying to move from a software platform to a robotics lab to build self-driving cars.
And while the number of initiatives that fall under the umbrella of “transformation” is so broad that it can seem meaningless, this breadth is actually one of the defining characteristic that differentiates transformation from ordinary change. A transformation is a whole portfolio of change initiatives that together form an integrated program.
And so a transformation is a system of systems, all made up of the most complex system of all — people. For this reason, organizational transformation is uniquely suited to the analysis, prediction, and experimental research approach of the people analytics field.
People analytics — defined as the use of data about human behavior, relationships and traits to make business decisions — helps to replace decision making based on anecdotal experience, hierarchy and risk avoidance with higher-quality decisions based on data analysis, prediction, and experimental research. In working with several dozen Fortune 500 companies with Microsoft’s Workplace Analytics division, we’ve observed companies using people analytics in three main ways to help understand and drive their transformation efforts.
Walk up a set of steep stairs next to a vegan Chinese restaurant in Palo Alto in Silicon Valley, and you will see the future of work, or at least one version of it. This is the local office of Humanyze, a firm that provides “people analytics”. It counts several Fortune 500 companies among its clients (though it will not say who they are). Its employees mill around an office full of sunlight and computers, as well as beacons that track their location and interactions. Everyone is wearing an ID badge the size of a credit card and the depth of a book of matches. It contains a microphone that picks up whether they are talking to one another; Bluetooth and infrared sensors to monitor where they are; and an accelerometer to record when they move.
“Every aspect of business is becoming more data-driven. There’s no reason the people side of business shouldn’t be the same,” says Ben Waber, Humanyze’s boss. The company’s staff are treated much the same way as its clients. Data from their employees’ badges are integrated with information from their e-mail and calendars to form a full picture of how they spend their time at work. Clients get to see only team-level statistics, but Humanyze’s employees can look at their own data, which include metrics such as time spent with people of the same sex, activity levels and the ratio of time spent speaking versus listening.
The University of Arizona is tracking freshman students’ ID card swipes to anticipate which students are more likely to drop out. University researchers hope to use the data to lower dropout rates. (Dropping out refers to those who have left higher-education entirely and those who transfer to other colleges.)
The card data tells researchers how frequently a student has entered a residence hall, library, and the student recreation center, which includes a salon, convenience store, mail room, and movie theater. The cards are also used for buying vending machine snacks and more, putting the total number of locations near 700. There’s a sensor embedded in the CatCard student IDs, which are given to every student attending the university.
“By getting their digital traces, you can explore their patterns of movement, behavior and interactions, and that tells you a great deal about them,” Sudha Ram, a professor of management information systems who directs the initiative, said in a press release.
The most illuminating moment of the Eagles’ enchanted season was a Week 3 play ridiculed in Philadelphia but celebrated here by a small cadre of people who recognized its significance almost immediately.
What fueled the excitement among members of the EdjSports crew was not the outcome of the play — a 6-yard sack of Carson Wentz on fourth-and-8 that gifted the Giants good field position — but rather the call itself. Leading by 7-0 on the Giants’ 43-yard line a few minutes before halftime, the Eagles opted not to punt.
By keeping Philadelphia’s offense on the field in a situation almost always played safe in the risk-averse N.F.L., Coach Doug Pederson did not buck conventional wisdom so much as roll his eyes at it.
An intern at EdjSports, responding to a flurry of text messages from his colleagues about the play, ran the numbers at home. The Eagles, by going for it, improved their probability of winning by 0.5 percent. Defending his decision (again) at a news conference the next day, Pederson cited that exact statistic.