How data gathering practices support service-oriented journalism in restricted contexts
When we talk about "reporting" at Gazzetta, we refer to the process of mapping, gathering, and analyzing information to provide context that people otherwise would not have for their lives. Often, and especially in restricted information environments, we find this new information buried in data.
This Reporting phase of the Gazzetta process is distinct from our Product Ideation phase, in which we create a reported concept. Reporting is about finding the story before we create that product, and it follows on our Audience Research phase.
Now that you understand your intended audience and have identified information gaps among them, it's time to transition from understanding what kinds of information people need, to working to seek, aggregate, and analyze data from various sources of information into useful findings and conclusions that address the discovered information needs.
The goal of the data gathering work in the Reporting phase isn't just to produce a single story, but to build an infrastructure that will serve you long into the future by bringing you relevant data that can track trends over time.
By actively seeking out and structuring this information, you can fill voids in public knowledge and uncover information that will be useful to your intended audience on their own paths to improving their livelihoods.
Jump: WHY everyone should gather data | HOW data gathering supports service-oriented journalism | WHY restricted contexts have a greater need for better data approaches
The Gazzetta process can be adapted to any newsroom, regardless of size or resources, by tailoring the scope to fit what is possible. By following this process, you will build empirically-grounded reporting that is directly aligned with your intended audience’s most pressing information needs.
In this post, we outline why using various tools for gathering data is an efficient use of limited resources to uncover otherwise hidden insights that can fill audience information needs, particularly in restricted information environments and for exile media working remotely from intended audiences.
Data gathering is for everyone
The idea of "data-driven journalism" can sound intimidating, conjuring images of large newsrooms with dedicated data teams and expensive software. But the reality is that a data-informed approach to reporting is a powerful tool for newsrooms of all sizes, including small, resource-constrained ones.
Data gathering should not be set aside on our wish lists for if we had more resources of time, funds, know-how, or personnel; our resource constraints actually demand that we find better and more efficient ways of gathering the information we need from increasingly complex data environments.
The key is not the sophistication of your tools, but the mindset: finding and harnessing publicly available data to better understand and serve your community. As we describe in detail in a related post, this can include crowdsourcing the data from others, compiling it yourself from disparate sources, or using software and tools to scrape it for you.
This mindset also requires the view that data should drive the reporting from the start and build on insights gained through audience research; this is in contrast to viewing it solely as evidence used to support pre-existing conclusions, even those drawn from your audience research.
Data gathering isn’t just for large organizations with established data and research teams. At Gazzetta, we're a small, distributed team ourself, and often we only have two to three researchers per project.
You can even start with one or two people organizing sources in simple databases, using Google Sheets, Notion, or other platforms.
Data unlocks service opportunities for journalism
Journalism was not always rooted in quantitative information. But modern data-informed journalism, as a framework, draws from a rich history of integrating data and science into reporting:
- From record to report: Early newspapers served as "records" of official information, such as market prices and government proceedings. A fundamental shift occurred in the 19th century as journalism moved toward "reporting" on current events, a transformation driven by technology and social change. Pioneering figures like Florence Nightingale showed how data visualization could enhance reporting and drive social reform.
- The social science turn: The Progressive Era in the late 19th and early 20th centuries saw the emergence of the social survey movement, where journalists and reformers used systematic fieldwork and data visualization to document social problems like poverty. This period demonstrated the power of blending data gathering, storytelling, and advocacy for social change.
- Precision journalism: In the 1960s, Philip Meyer introduced "precision journalism," an explicit attempt to incorporate rigorous social science methods into daily news work. This movement championed the use of sample surveys and statistical analysis to move journalism closer to science and instill a new standard of methodological rigor and objectivity.
- Data journalism goes mainstream: Data journalism has become a significant development in news production, reshaping how journalists gather, analyze, and present information. Tools like web scraping, open-source intelligence (OSINT), and crowdsourcing have made it possible to collect, process, and verify vast amounts of data to uncover hidden facts and systemic patterns.
An independent project by journalist-programmer Adrian Holovaty, Chicago Crime scraped the Chicago Police Department’s crime incident reports and placed them on Google Maps.
The site allowed the public to search crimes by type, location, and date.
Although the crime data was already publicly available in the incident reports, the site turned the data into an easy-to-access and neighbourhood-friendly resource.
ChicagoCrime.org won the Knight-Batten Award in 2005 and was described as "setting a new standard for interactive journalism."
- Our current moment: Since the early 2020s and intensifying in 2025, news organizations have laid off reporters en masse and ad revenues are falling. As the industry struggles to navigate the current era on lower budgets and with smaller teams—and at a time when access to accurate and timely information is arguably more important than ever in the face of disinformation, mainstream conspiracy theories, and huge social and political shifts—streamlined newsrooms can use new and emerging tools like AI to shape their reporting.
This evolution of the use of data in the field of journalism highlights how hidden facts can be turned into identifiable patterns. And it also reveals an ongoing negotiation between telling individual stories and analyzing systemic issues or trends.
Data gathering has greater utility in restricted contexts
In restricted information environments, the need for journalists to gather data that drives reporting becomes even more urgent. In these contexts, official information is often fragmented, controlled by the state, or simply not readily available on mainstream channels. Meanwhile, traditional reporting—relying on official sources, press releases, or open dialogue with government officials—is often impossible.
In one of the most restrictive information environments in the world, citizens in Myanmar are subject to complete and partial internet shut-downs, heavy internet access restrictions, state-sponsored misinformation, restrictions on local media, and a strong reliance on Facebook as an unmoderated internet ecosystem.
These problems are compounded by political instability after the 2021 military coup and ongoing conflict, low technological infrastructure, low educational levels and graduation rates, and about half the population living below the national poverty line.
Accurate information exchange into and out of the country is difficult, at a time when greater access to timely and relevant information could have an immense impact on people’s daily lives and decision points.
The problem isn’t simply the lack of information infrastructure, but that information itself is poor and fragmented. For example, after the March 2025 earthquake that killed thousands, an information flurry on Facebook contained false speculation, and finding accurate information or verifying reports in a timely manner was challenging.
The service opportunity in this context is to provide clarity, comprehensiveness, and accuracy. Snowballing Telegram chat groups, for example, would be a useful data gathering practice for small newsrooms. Big models, in contrast, would not be successful at doing this because their information is not centered on utility.
This is where data gathering, particularly through web scraping, becomes an act of information discovery and verification.
By systematically collecting data from non-traditional sources like social media conversations, online forums, and public but un-indexed databases, we can build a more accurate picture of reality than what is presented through official channels.
We know from our work that even in a highly-controlled environment, a significant amount of data is still publicly available—it's just messy, unstructured, and often hidden.
For example, in a country without accurate data on the cost and admission requirements of public versus private secondary education, you could go to schools’ websites individually to see if tuition and admission requirements are listed.
If school websites do not list this information publicly, check online forums where parents discuss these topics, and see if any figures are given. If you have access, interview parents with children in these schools.
The data you find may be enough to make findings and conclusions about the value of academic experiences based on geography or ease of admission. Or you may uncover a hidden system of tuition subsidies, weighted factors in admission, or other information that leads to compelling reporting.
In one of our projects at Gazzetta, information about job availability, wages, and benefits is not systematically available.
In the audience research phase, we identified a strong need for information on how to increase income and livelihoods, including finding work that was less physically demanding.
Therefore, we identified some job roles that meet these needs, and searched websites listing job ads that included salary and benefits. We also scoured social media posts for workers in these roles talking about their wages.
After we scraped and cleaned this data, we had a clearer picture of what kinds of jobs matched our intended audience’s needs and preferences, including geographic details like where wages were higher and cost of living lower, for comparison.
From there, we identified one specific job that met these needs and had sufficient data to move forward with reporting.
Your job is to act as a public curator, transforming the noise into a signal to provide valuable information, and also demonstrate a commitment to serving the public interest in a way that builds trust and resilience in the face of censorship.
In fact, the need for curating data in restricted information contexts is especially pressing, since state actors work to conceal information, erase public discussions, and limit information dissemination.
While domestic journalists may face restrictions and threats and more in doing this kind of data gathering work, exile journalists can more safely pursue these "acts of journalism" from afar.
Some data holders may assert that scraping or aggregating their data is illegal, constituting a national security threat or an intellectual property violation.
But the lack of useful information in restricted contexts shows the need for "civic scraping"—the public interest need for the information likely outweighs any private right over it, and powerful actors with resources at their disposal do not need the additional legal protections afforded by property rights and other assertions.
We were inspired by The New School J+D Lab’s acts of journalism framework (still unpublished; subscribe to their emails to be notified when their new framework is released) and adapted it to restricted environments to illustrate our role:
- Listening: Not only surfacing community needs, but also monitoring censored social media discussions to understand what topics are most relevant to the community.
- Documenting: Not only capturing civic information, but also archiving information that is likely to be disappeared, such as social media posts or online articles, to create a permanent record.
- Inquiring: Not only investigating on behalf of community, but also investigating systemic issues using data to uncover facts and patterns that are otherwise hidden from view.
- Sense-making: Not only creating an understanding from complexity, but also analyzing large, unstructured datasets to find meaningful connections and insights.
- Amplifying: Not only distributing relevant information, but also redistributing reliable information at scale to communities that lack it.
- Mobilizing: Not only organizing collective action, but also facilitating mutual aid networks.
- Enabling: Not only providing supporting infrastructure, but also countering and dodging sophisticated state information structures.
These adapted acts of journalism guide us to build a living database, or information map, that is resilient to disruption and is responsive to the needs of your intended audience.
In the next posts of our Reporting phase, you will find guides for mapping the information environments of your intended audience, conducting data gathering using online tools as well as manual methods, and verifying and analyzing that data for product ideation and creation of news products.
Join us on our process in the Audience Research, Reporting, and other phases. If you haven’t already, sign up to our newsletter so you don’t miss out.
If you have feedback or questions, don’t hesitate to get in touch at hello@gazzetta.xyz.