For Immediate Release

The milestones of data journalism

Sound Graph 3, Sarah Morris (2017).

In an increasingly technological world challenged by the spread of misinformation, calls for more robust, authenticated information are more pronounced than ever. It was in 2010 when Sir Tim Berners-Lee, father of the World Wide Web, said that “data-driven journalism is the future”. Fast-forward to 2019, and data-driven journalism has evolved to be a norm. Loosely defined as a “workflow” that involves analyzing, filtering, and visualizing data into a story, data-driven journalism is about telling stories based on ethically-sourced, reliable data in a legitimate way.

As newsrooms become more accustomed to the practice and have increased access to the tools and techniques needed to analyze data, it is worth noting that the concept of authenticated information is not new. Starting with the first time data was published in newspapers, in this blog post we hope to trace the hidden history of data journalism and commemorate the milestones by some of the key figures towards its development today.

Data enters the newsroom

While health scientists and statisticians were the first to use data-driven methods, it was in 1821 when data first appeared in newspapers. On 5 May 1821, a four-page table containing a list of schools in Manchester and Salford was published in the first issue of The Guardian (then named ‘The Manchester Guardian’), detailing how many students were enrolled in each school, as well as their average annual spending.

While it may be viewed as simplistic today, the table was reported to have caused a sensation at the time. By revealing the number of students who received free education, and how many disadvantaged children were living in these cities, this data was the first of its kind to illustrate urban inequalities. Leaked by a source only known as ‘NH’, the motivations behind sharing this data with the Guardian were straightforward:

“At all times such information it contains is valuable; because without knowing … the best opinions which can be formed of the condition and future progress of society must be necessarily incorrect.”

– NH, The Guardian, 1821

Similarly, data appeared in U.S. news in the 1849 issue of the New York Tribune. In 1832, 3,515 of New York’s 250,000 residents at the time died as a result of a Cholera outbreak. Its second epidemic, in the summer of 1849, was brought by a ship coming from Europe after infected passengers fled a dockside quarantine. In response, the New York Tribune published a line chart following the death toll caused by Cholera in New York City. During the 1840s, including graphs in newspapers was unheard of to many readers. To facilitate the readers’ understanding of the figure, the Tribune added a 300-word annotation explaining how to read a line graph, ranging from the axis labels, the slope of the curve, and coordinates.

Although the use of data to understand our environment dates back to social scientists such as William Playfair and Florence Nightingale, its appearance in the newsroom should not be undermined. During the 19th century, newspapers were printed by hand, meaning that each image had to be hand-crafted onto a stencil. Since these newspapers were printed on a daily basis, printing such data not only demanded a high level of expertise, but also a time-sensitive deadline. In addition, figures such as the aforementioned used to appear in textbooks which came with a hefty price tag. The concept of embedding data in news articles marked a step towards the democratization of data, where information was no longer restricted to the privileged.

Co-opting computers into journalism

Not published in LIFE. Detroit, July 1967. A Detroit police officer stands guard over a grocery store looted during race riots.
Caption from LIFE. “Determined to protect their own property at any cost, both Negro and white store owners brought out weapons and stood ready to use them.”

Gradually, data-driven methods were applied more frequently, and newspapers became more experimental with their approaches to distributing information. During the Long, Hot Summer of 1967, racial tensions and civil rights movements peaked in the United States. Nearly 160 riots took place nationwide, of which the Detroit Riots were among the most violent: 43 people were reported dead and over 7,000 arrested. In a piece published by Nieman Lab, Philip Meyer at the Detroit Free Press noted that many reporters who were covering the incident could not understand who the rioters were, or what motivated them. Assumptions about these perpetrators ranged from believing they were part of the low-class, uneducated population, to being rural migrants in the South who had failed to integrate. To get to the heart of the issue, Meyer engaged in a project to identify the rioters by employing social scientific methods and “Computer-Assisted Reporting” (CAR) – the use of computers to collect and analyze data to produce news articles.

In collaboration with the Detroit Urban League and the University of Michigan’s Institute for Social Research, Meyer’s team used a computer to survey black residents within the parameters of the Detroit Riots. Contrary to popular theories, the research found that there was no correlation between economic status and participation in the riots: college graduates were just as likely to have been involved in the riots as high school dropouts. Rather than immigrants in the South, the data proved that residents in the North were three times as likely to have rioted. In terms of motivation, the survey found that the main concerns of the protestors were police brutality, overcrowded living conditions, poor housing, and unemployment.

Source: Special survey report published by the Detroit Free Press on August 20, 1967 (University of Michigan, Institute of Social Research).

In parallel, the Missouri School of Journalism hosted seminars known today as the National Institute for Computer-Assisted Reporting (NICAR) both in the U.S. and worldwide together with the Investigative Reporters and Editors (IRE). Gradually, communities for investigative journalistic practices such as the Global Investigative Journalism Network (GIJN) solidified, making the co-optation of data more widespread on a global scale.

Computer-assisted reporting (CAR) gradually flourished over the next few decades; in 1989 the Atlanta Journal-Constitution received a Pulitzer Prize for its series on racial disparities in home loan practices in “The Color of Money”, earning worldwide recognition of this method of journalism. In addition to highlighting the importance of data in journalism, Meyer’s piece was able to expose the racialized system that was embedded in the white-dominated institutions of Detroit. Not only did this piece win a Pulitzer Prize, but it also led to Meyer’s book, “Precision Journalism”, which transformed journalistic practices by encouraging newspapers to employ social scientific methodologies and data collection in journalism.

Tracing the roots of data journalism over the past two centuries, these milestones prove that one thing remains unchanged: the motivations behind applying data methods to journalism are consistent in their attempt to depict a true and fact-based image of our societies. In aggregate, these breakthroughs cultivated the prime conditions for data journalism in the 21st century, where rapid technological advancement and the rise of open data repositories refined the practice to become the “data journalism” we are familiar with today. (Source: OpenNews)

Pioneering data journalism in the 21st century

Data Journalism hit another milestone in 2005 through the success of U.S. developer Adrian Holovaty who worked at the Washington Post and started the hyperlocal community news platform EveryBlock (now known as Nextdoor). Holovaty fused crime data from the Chicago Police Department with Google Maps. His platform,, specifies details about crime reports down to every block in the city by type of crime, ZIP code, address, date, and arbitrary route. Winning a Batten Award, the project was described as a public service which acts as a tool for investigative journalists and residents alike, showing how data can be used to both tell stories and help people understand the world around them.

In 2009, The Guardian became the first newspaper in the world to establish a section dedicated to data – the Datablog. Created and edited by Simon Rogers, the platform encourages its users to analyze and visualize its raw datasets to popularize the concept of data-driven journalism in contemporary newsrooms. Rogers is also recognized for his coverage of the WikiLeaks Afghanistan and Iraq war logs, and crowdsourcing 450,000 MP expenses records. Actively addressing data in his personal blog, Rogers has become a beacon for data journalism and is also the director of the Data Journalism Awards. Organized by the Global Editors Network, with support from the Google News Lab, the John S. and James L. Knight Foundation, and in partnership with Chartbeat and Microsoft, the Data Journalism Awards is the first international awards recognizing innovative data journalism worldwide.

Simon Rogers

By 2011, the European Journalism Centre published the Data Journalism Handbook, and had also begun its data-driven journalism center which organizes workshops throughout Europe. In parallel, The Guardian published its series on the Reading racial riots where their investigative team applied the CAR techniques of Philip Meyer, and seminars about data journalism were held worldwide from Wits University in South Africa, to similar institutions in Asia and Australia. The result is a sophistication of data analysis techniques in modern newsrooms powered through communities and tools such as open government data repositories, helping journalists gather data to produce engaging new articles.

In today’s newsrooms, integrating data into journalistic pieces have elevated the ways in which articles tell compelling stories. The development of sections dedicated to data in newspapers ranging from the Guardian’s Datablog, to Der Spiegel, Le Nación, Le Monde’s Les Décodeurs, and even Nate Silver’s FiveThirtyEight, prove how not only integral data journalism is, but also that newspapers should not be intimidated by finding new ways to incorporate large datasets into their articles. As the New York Tribune proved in 1849, readers are willing to embrace data so long as they receive the proper guidance to understand it. By visualizing data in a way that is easy to understand, these articles support the backbone of a fact-based discourse, communicate complex subjects to a broader audience, and support investigations which depict reality in groundbreaking ways.

From applying social science methods and statistical analyses, to today’s use of spreadsheets, data mapping, web scraping, and crowd-sourcing among other tools, the increased transparency in open data has resulted in a domino effect which helps us better understand our societies. Whether it is called “data journalism”, “data-driven journalism”, “precision journalism”, “computer-assisted reporting”, or otherwise, it is clear that journalists are exploring new ways to convey data-driven stories through new, innovative approaches. As the Data Journalism Handbook notes, data journalism “went from being the province of a few loners to an established part of every newsroom.” In a time where the concept of “truth” is challenged on a daily basis, data journalism has never been so vital. Instead of being swayed by emotion, data-driven content provides us with context and the details to help us make sense of reality. Perhaps most importantly it reminds us of the essence of data journalism: far beyond the numbers and codes, and just as Simon Rogers said himself, data journalism is about telling a great story in the best way possible.

Share this post

Share on facebook
Share on twitter
Share on linkedin
Share on email

Contact Us