The Challenges of Artificial Intelligence in Investigative Journalism * Anna Bruno

The integration ofartificial intelligence (AI) into journalistic investigations is opening new avenues for discovering and telling high-impact stories. However, like any powerful tool, it also presents significant challenges that require a deep understanding and careful management. In this in-depth article, we will examine the main challenges related to the use ofAI in investigative journalism, focusing on crucial issues such as model accuracy, bias and fairness, explainability and interpretability, as well as the resources and skills needed.

Summary

Practicality: resources, skills, and infrastructure

One of the first challenges investigative journalists face when approaching AI is practicality. Implementing technologies like machine learning requires a specific combination of skills, time, and resources, as well as the availability of high-quality and sufficiently large datasets to train models effectively.

Many news organizations have opted to outsource part or all of the AI project development process, or to establish partnerships with third parties. The New York Times, for example, used a third-party object detection platform for its investigation into bomb craters in Gaza, partly due to the enormous computing power required to process satellite images.

Information security, copyright, and data protection

The use of third-party tools like Google Pinpoint, Cloud Document AI e Gemini o ChatGPT, raises issues related to independence, power, and, in cases where editorial information is entered, information security, copyright, privacy, and data protection. In fact, many corporate guidelines explicitly prohibit the entry of confidential information, trade secrets, or personal data into these tools.

Accuracy: model training and human oversight

The quality and quantity of data used to train AI models has a direct impact on the quality of the results obtained, a principle summed up in the mantra “Garbage in, garbage out” (GIGO). This includes the classification of training data: in the “Missing in Chicago” investigation, which won the Pulitzer, a machine learning tool called Judy was used to classify the city’s police misconduct records, helping to identify 54 missing persons allegations in just four years. Crucially, however, the training data for the tool was created by 200 community volunteers, who manually labeled the records.

Achieving 100% accuracy in a model is rare—in fact, very high accuracy can be a sign of “overfitting“—that is, when a model fits its training data too closely and thus performs poorly when tested on new data. An “overfitted” model is often one that has become too complex, perhaps because it has been over-trained and/or because irrelevant “noise” in the data is shaping the algorithm.

On the other hand, an “underfitted” model performs poorly on both training and test data, often because the model is too simple, based on too little data, and/or insufficiently trained. A successful algorithm will be neither one nor the other.

Feature engineering and human oversight

Feature engineering will need to be regulated to shape the accuracy of any machine learning model: this involves selecting or extracting the aspects of the data that the model will use. Some features may need to be extracted from existing data, such as converting or splitting text data into categorical data, or using two figures from the data to calculate a new third measure. Specialized field knowledge can be crucial for choosing the most relevant features.

As a result, human oversight is fundamental for accurate AI use, and is a recurring theme both in newsroom guidelines for AI use and in concerns expressed by journalists themselves.

Interpretability and explainability

The opacity involved in automated decisions can also present problems for explaining or even understanding the results of AI models. These two qualities, explainability and interpretability, are separate: a model could be explainable (it is possible to explain what it does and why it arrives at a particular output) but not interpretable (it is not clear how it does it).

Interpretability and explainability can determine the choice of technology: many investigations opt for a “decision tree” or “random forest” algorithm over more powerful approaches like “neural networks” or “deep learning” because of interpretability (they also require less data).

One investigation by ProPublica on political emails, for example, used a decision tree-based algorithm “because they produce a human-readable tree of data partitions,” and an article in the Financial Times that analyzed how individual voter characteristics correlated with voting behavior used this approach as a basis to visualize a series of branching models, with accuracies ranging from 56 to 72 percent.

Diversity, impartiality, and fairness

AI’s tendency toward bias and lack of diversity is often at the center of reporting on algorithmic accountability, but it also poses a challenge for reporters who use the technology themselves. The categories of bias identified include:

Biased labeling
Biased features (also known as “curation bias”)
A biased target
Homogenization bias (where the output of one model is used for future models)
Active bias (where data is fabricated, such as fake news)
Unexpected machine decisions (where lack of context leads to “unsustainable responses”)

Bias is also a consideration when collecting data – minorities are typically underrepresented in datasets, leading to selection bias and lower accuracy for those groups – and in testing (if a model is not tested with diverse inputs or monitored for bias).

Large language models, for example, perform much worse with non-English languages and non-Western contexts, since these make up a much smaller portion of both training and test data, as well as material written by women.

Conclusions

What stands out from this exploration of AI in investigative journalism is both the wide range of technologies used and the ways in which they have been employed. And this does not include the vast array of ways in which investigative journalists are using generative AI tools in particular for more routine tasks, such as idea generation and research, planning, editorial feedback, publishing, and distribution.

While there are a number of challenges for news organizations using AI in investigations, from accuracy and fairness to resources and explainability, one area that requires further research is the more subtle impact on workflow – and the new work created alongside the efficiencies: the work of finding specific AI tools and learning to use them; breaking down tasks into AI-suitable stages or preparing material for AI tools; effective prompt writing; editing and checking results.

This broad range of applications and contexts suggests that the idea of “artificial intelligence-assisted journalism” will eventually come to be seen as too vague a term to be useful—just as “computer assisted reporting” was seen as outdated and redundant at the start of this century. As literacy in this field grows, industry research and discussion may revolve around more specific terms and fields: “machine learning-based journalism,” for example, or “investigating with NLP”—or “custom GPTs in visual investigations.” We should expect a deeper and more critical understanding of artificial intelligence in general, as the power of AI comes under closer scrutiny in all aspects of our lives—a process in which journalists will play a central role.Set featured image