7 min read

Advice on analyzing data for recurring news products

After gathering and cleaning your data, it's finally time to analyze it, verify it, and create reporting hypotheses to test

In our previous posts, we covered service-oriented data gathering and walked through our process for creating an information map and systematically collecting data from those sources using hybrid data collection methodology.

After cleaning and structuring that data, it’s finally time to take a hard look at it, verify it, and use it to come up with reporting hypotheses that you can use to create news products.

This post covers verifying and analyzing the data, and describes how we ultimately created news products for our intended audience from this reporting project.

Jump: HOW to discover patterns | HOW to verify your data | HOW to create and test a reporting hypothesis | HOW to use qualitative data | HOW we created news products | WHAT we learned

Analyzing the data, coming up with reporting hypotheses, and verifying the data are not always a linear process. You may have to verify some data before coming up with a hypothesis about it. Or, you may spot a trend, form a hypothesis, and then verification leads to an unexpected result.

In addition, researchers must be aware of cognitive biases that may lead us to seek out and verify data that selectively support our hypotheses. Working in teams and having a verification process in place will help with this.

As you read this post, keep in mind that analysis, verification, and forming research hypotheses all go hand-in-hand.

Now that you have your clean data, you can finally get to work using it.

For initial insights, use simple data visualization tools to explore your data. We created simple charts and graphs that quickly showed us the major trends and led us to ask further questions.

💡
Tools you can use, no matter your resource constraints

You can use a simple Google sheet to track your sources and a verification checklist. Use the sheet to manually tag each piece of information with its source and a confidence score (e.g., 1-5). Use simple graphs and pivot tables to visualize your data.

If you have the resources, you can use more sophisticated tools like PowerBI or Tableau for data visualization and SQL for complex queries. The goal is to automate as much of the process as possible so you can spend more time on the qualitative, human side of the verification process.

Next, find the outliers in your data. Look for data points that don’t fit the initial insights and ask why or what that means. Our structured database allowed us to easily filter for the outliers and examine their sources.

By doing this, for instance, in a project about low wage jobs in a major city, we identified scam job listings listing high salaries posted by some recruitment agencies. This reinforced the role of these players in the job market and what workers have to watch out for.

We used other tools to spot patterns. For example, we use word frequency analysis to see which job skills are the most in demand, and which regions had the highest number of job postings.

💡
What our initial analysis showed

To produce our reporting hypotheses, we put together our audience research and our data set and brainstormed what kinds of analysis of the data would fill the needs of a low wage worker in a major city.

We found that we had enough to data to track wages:
- Over time
- Across districts of the city
- Among job platforms
- According to required education and experience level

We also had data about job benefits, including:
- Accommodation
- Meals
- Insurance
- Changes in benefit offerings over time
- The relationship between job location and benefits

Understanding what data we have and what it can show us led us to form several reporting hypotheses to pursue. We dug deeper into these using our data, testing and verifying our hypotheses, arriving at one initial concept to take to the new product creation stage.

Verify data through triangulation for accuracy and to build audience trust

Especially when collecting data from different sources, you need to verify the information you have found. The goal is to ensure the accuracy and reliability of every fact you present.

When people know you have a methodical, transparent process for verifying your information, it builds trust. This trust is what allows your journalism to have impact, especially when you are reporting on sensitive or controversial topics.

💡
Recommended reading

The Verification Handbook is a great resource for anyone working with user-generated or crowdsourced content.

It provides a systematic guide for verifying digital media, and it is free to download.

We use the triangulation method, in which we cross-reference information with at least three independent sources. Triangulation is effective for reducing the risk of relying on a single, potentially unreliable source.

We used internal and external triangulation to verify our data:

  • Internal triangulation: Use the data you’ve already collected to verify a fact. If our reporting hypothesis is that non-monetary benefits mask low wages, we can look for other job listings that mention the same benefits and see if they have similar wage patterns. We can also cross-reference our scraped data with our manual data from interviews with workers. Do workers' accounts corroborate what we see in the job postings?
  • External verification: Seek out new, independent sources. This might mean conducting new interviews, consulting with experts, or searching for academic research that discusses elements of your hypothesis. For our project, we sought out interviews with labor rights advocates and economists who study our intended audience.

Another best practice we use is to implement a formal verification checklist for every major claim. This ensures consistency and rigor. The checklist includes questions like: "Is this fact supported by at least three independent sources?" and "Are there any conflicting reports?"

Use the data to form a testable reporting hypothesis

Based on initial data analysis and verification, you can begin to form research hypotheses. These hypotheses are not the story itself, but rather a testable claim that you will attempt to prove or disprove through further verification.

This is the time to bring to the forefront your audience research and what you know about your audience segments. What are your audience needs? What questions does your audience have? How can your data fill those gaps?

By forming simple, testable statements, you will bring focus to your reporting. Instead of reporting a list of facts, you now have a narrative to prove or disprove. This is where data-driven journalism becomes investigative journalism.

💡
From a pattern to a hypothesis

Our data showed a pattern that jobs offering accommodation and meals had lower wages. One hypothesis we tested was: “Higher wage jobs without accommodation or meals may lead to lower monthly income due to cost of living and transportation.”

To do this, we researched and compared cost of living in different districts nearby the job locations to show how to best take advantage of a higher wage job through commuting.

By methodically analyzing the data, you can identify the most promising story leads and avoid wasting time on dead ends. It helps focus limited resources on the stories that are most likely to have an impact.

As you go through this process, keep in mind the principle of data integrity. Be transparent about the limitations of your data. For example, if your scraped data is from a specific time period, be clear that your findings may not represent an entire year or show a broader trend. Being honest about your methodology builds credibility.

It is easy to fall into the trap of only looking for information that confirms your initial hypothesis. A good verification process forces you to look for contradictory evidence and consider alternative explanations.

Combine your data with qualitative sources for a more complete picture

Numbers alone can't tell a complete story; you need to add the human element. Combine your quantitative analysis with the qualitative data from your interviews, forum discussions, and other manual sourcing.

Qualitative data adds nuance. The data might show that wages are trending lower, and interviews with workers can explain the why. A worker might tell you that they accept lower pay because of the promise of free housing is a significant relief, even if it comes with trade-offs. This kind of detail adds a layer of nuance and shows the complex realities behind the numbers.

Qualitative data makes your reporting more human. Use quotes, anecdotes, and personal stories to illustrate your data-driven points. This makes your reporting more relatable and compelling to your audience. For example, a quote from a worker who was scammed by an agency helps to contextualize the data we collected on fraudulent listings.

Report the story through data

Machines need data, humans need stories. The "story" isn't the data itself; it's the human insight derived from the data, then contextualised and narrated as a compelling story.

We found an insight about an information asymmetry: a disconnect between what migrant workers were being told and the reality of the market. This is the story we wanted to tell.

We found this story through the data analysis. We created a map showing the distribution of jobs across districts and a chart comparing the average wages for specific job roles across districts.

In the data, we found a lack of correlation between education level and wages for certain job roles. This surprising lack of correlation was a crucial insight because it contradicted common assumptions and provided an actionable takeaway.

After the process of narrowing in on this reporting topic, we scripted our narrative for video presentation. Because the insights from our data were complex and might not resonate with a general audience if presented as a report, we created a series of short videos with fictional characters representing the personas we identified in our audience research.

What we learned

In all information environments, official information is fragmented, controlled, or partially inaccessible through mainstream channels. This is especially the case in autocratic contexts. Meanwhile, traditional reporting that relies on official sources, press releases, or open dialogue with government officials is often and increasingly impossible.

Therefore, there's a growing opportunity for news products to provide concrete utility to those disadvantaged by information gaps.

Working remotely, we are in a more secure position to do remote data gathering, but we still must take digital security precautions to protect ourselves, our work, and others.

This data-driven reporting process, from start to finish, is not just about reporting on what's newsworthy, but about building systems that proactively address the information needs of your intended audience.

By combining research methodology with a commitment to data-based reporting, we are building a resilient, trustworthy, and effective model for addressing information needs, especially in distorted information environments.

In the reporting phase, we move from understanding our audience's needs to creating content that directly addresses them.

Join us on our process in the Audience Research, Reporting, and other phases. If you haven’t already, sign up to our newsletter so you don’t miss out.

If you have feedback or questions, don’t hesitate to get in touch at hello@gazzetta.xyz.