The Lenfest Local Lab, The Brown Institute, and The Philadelphia Inquirer are building open-source content audit tools for local newsrooms

Our collaborative project won the Google GNI Innovation Challenge, securing $300,000 to build and test machine learning-based tools that help newsrooms analyze equity and representation in their work at scale.

We are pleased to announce new support from Google’s GNI Innovation Challenge to expand our partnership with The Brown Institute and The Philadelphia Inquirer creating open-source content audit and analysis tools for local newsrooms. The tools are aimed at making content audits faster and less expensive, which could speed the advancement of diversity, equity and inclusion (DEI) in local coverage. Our cross-disciplinary partnership between a lab, a local newsroom and a university will leverage each of our organizations’ strengths in product and UX innovation, journalism and data and computation to address shared questions about the relationship between equity, quality, geography and service in local news.

What is a content audit?

Broadly speaking a content audit is the process of systematically analyzing and assessing all of the content you have created. The goal is typically to reveal strengths and weaknesses in your content and discover opportunities to improve. Today many newsrooms are undergoing content audits to assess the equity and representation of past news coverage in order to inform future work, and to gain insights that allow them to better serve, reflect and include communities.

Why are newsrooms undertaking content audits?

Journalism seeks to serve communities through fair reporting. To offer fairer reporting in the future, local news organizations need to have a firm grasp on a number of things about the communities in their coverage area, including the community’s needs, concerns and makeup, as well as a clear-eyed view of the communities they’ve excluded, misrepresented or underreported in the past. Content audits can start to give organizations context for which communities have been covered and how community life has been characterized in the news. This context can be the jumping off point for a practice of community listening paired with needs assessments, which can provide deeper and meaningful insights into the effect coverage may have had on residents over time.

Why do newsrooms need better tools for auditing content?

Traditional newsroom content audits have been done manually, meaning they can be costly, time-consuming, and potentially dated from the moment results are available. If not done internally, organizations can hire outside groups that assemble researchers to annotate content, which can include manually highlighting everything from the description of a community to the placement of a photo in a story.

The open-source tools we will develop leverage machine learning (ML) and natural language processing (NLP) technologies to help newsrooms automate the parts of the process that can be automated, allowing them to survey a far broader set of sources, content and practices. This will free up time for the more difficult task of analysis, including assessments of the culture, workflow and decision-making that results in published coverage.

Why speed and flexibility is key to this new wave of content audits

By being able to more efficiently identify issues of equity and representation in coverage through the use of these tools, news organizations will be able to start developing strategies sooner that address problems that are revealed.

It’s also important that any approaches to improving and operationalizing diversity, equity and inclusion practices be dynamic, able to continually adapt to changes in language and topics of concern among newsrooms and audiences. Our project will be developed with this flexibility in mind, allowing news organizations’ practices to grow alongside their communities.

Background on how our cross-disciplinary partnership started

Over the past year, the Lenfest Institute and the Brown Institute have collaboratively developed an automated approach to identifying and mapping locations found in news stories using a mix of natural language processing (NLP), deep learning and geolocation techniques. On top of that technology we built a proof-of-concept analysis tool for helping newsrooms audit their content and better understand which local communities are reported on and how. The tool provides insights into the geographic equity of coverage and the knowledge to pursue opportunities to fill gaps, fix problems and serve audiences with new products.

We have also partnered with The Philadelphia Inquirer to apply this location analysis prototype in two ways. First we used the underlying location identification model to build and test a new product, which is a page that organizes COVID-19 coverage by the counties mentioned in the stories, allowing readers to quickly find the coronavirus-related stories about the places that matter to them.

We’ve also worked with two Inquirer news desks to conduct early content audits, assessing the geographic representation of real estate and visuals coverage in 2020. More details on those initial audits are included later in this post.

What we plan to do with support from Google

Move beyond location analysis. With this new support from the Google GNI Innovation Challenge, we will begin by fine-tuning our location analysis tool and exploring how to support its use in newsrooms of all sizes and localities, from bigger cities and regions to smaller towns. We will also move beyond location analysis, looking at many more facets of content auditing, including the diversity of sources, image analysis, and which images and stories are published depending on the topic, author, or location of a story.

With this data-informed approach, newsrooms who use these auditing tools can begin to understand how communities are reflected in their coverage by examining how editorial decisions manifest themselves in language, location, sources and visuals in stories. The tools will assist newsrooms assess fairness by uncovering gaps in coverage, be it in a town or neighborhood, or with a specific community, related to gender, race, ethnicity or socio-economic background. They will highlight any topical or other coverage disparities measured relative to population, income, geographic distribution and other demographic benchmarks we develop in collaboration with newsrooms and researchers with expertise in equity and representation. All of these insights should point to opportunities for the newsroom and the business to address.

Keep humans in the loop. We will also use a human-in-the-loop approach, meaning that researchers as well as people across hopefully all departments of news organizations, not just editors and reporters in the newsroom, will be involved in the training, tuning and testing of the algorithms, and the interpretation and application of the results.

While implementations of machine learning (ML) in newsrooms have been developed for optimizing subscriptions and automatically generating news stories, little has been done to automate tasks that support DEI efforts. Our project will develop an ML-based analysis suite that helps newsrooms reduce the time it takes to perform audits, identify problems faster and expand scope.

Support the transition from one-off audits to continuous accountability. The primary outcome we hope to achieve is a transition from one-off equity audits to automated computational processes that assist newsrooms in shaping inclusive coverage and products to engage readers. We hope that a secondary outcome is a re-imagination of news products that build on this effort of more inclusive and representative news coverage. We imagine the launches of new products that are direct responses to the insights and data provided by the tools, and we also imagine opportunities to help reporters identify new and better story opportunities, reader-facing products, and business opportunities.

Our work in Philadelphia

Philadelphia is one of many cities in the US where large-scale inequities exist, and where local news coverage would benefit from this type of analysis. According to Philadelphia’s City Council’s 2020 Poverty Action plan, Philadelphia has the “highest overall poverty rate among the nation’s ten largest cities.” Also 2018 Census Bureau data shows that 24.5% of its population lives below the poverty line, nearly double the 13.1% national average. Studies that have looked into the city’s inequality indicator show that Philadelphia falls among the top 10 most unequal cities in the country and in the wake of a second wave of COVID-19, these issues will continue to be exacerbated.

Data points such as these are critical to analyzing the economic, health and general information availability for residents, and can be used to develop local benchmarks for equity and representation. By comparing one county to neighboring counties, or in comparing disparate census tracts, newsrooms have new opportunities to visualize and understand their own coverage.

To date, the mapping prototype we built has already allowed our partners at The Philadelphia Inquirer to analyze coverage and start making observations about its geographic equity.

Two geographic equity assessments at The Inquirer

🏠 The Philadelphia Inquirer Built Environment Desk

The Built Environment desk at The Inquirer includes coverage of real estate, architecture and transportation in the Philadelphia region. These types of stories often center around the places mentioned in the text, ensuring that for this first pass, the locations mapped would directly relate to analysis of equity and representation.

Here is a map showing one year’s worth of Built Environment story locations plotted on a map:

After reviewing the map Cynthia Henry, the Built Environment desk editor, shared thoughts about how the tool could inform coverage going forward:

“We already have conversations like ‘Hey, we haven’t written about this neighborhood or area in a while. I wonder what’s going on there?’ … [The tool] could push us to question ourselves, expand our coverage, seek a broader range of sources, and lead us to stories that need to be told.” — Cynthia Henry

📸 The Philadelphia Inquirer Photo Desk

After hearing about the availability of the tool, Inquirer photographer Tim Tai asked our team to map photo assignment locations to provide insights into the geographic representation of The Inquirer’s visual coverage.

Here is a map showing most of The Inquirer’s 2020 photo assignments plotted out by Philadelphia neighborhood:

After reviewing the results, Tim explained what the map starts to show in terms of clusters and gaps in visual representation:

“The graphical interface is really good and starts to show us the neighborhoods and counties we’re taking photos in and which ones we’re not. For example we’ve taken a lot of photos in Camden County, but not really in nearby Burlington County. We cover a lot of stuff along the Main Line, but not a lot in eastern Bucks County. We photograph in South Philly a lot, but much less often in Southwest Philly.

The tool helps us start to answer basic questions we have such as: what geographic areas are photographers frequenting? What areas are being ignored?”— Tim Tai

What’s next

In 2021 and with this new support, our teams will start seeking out additional newsroom partners, developing a board of technical and academic advisors and creating a research plan for assessing newsroom DEI needs and goals, including community panels that will play a role in goal-setting exercises. At the same time we’ll start building a test set of data and start identifying datasets we can use to develop benchmarks for equity and representation.

Get Connected

If you’re interested in the project and would like to be involved, please contact Michael Krisch and Sarah Schmalbach at [email protected] or [email protected]. You can follow our progress here and via the Brown Institute.

Leave a Reply

Your email address will not be published. Required fields are marked *