Nick Hagar

Postdoctoral Scholar @ Northwestern University, Generative AI in the Newsroom Initiative | Incoming Assistant Professor @ University of Minnesota

Recent Updates

🎉

I'm co-leading a SRCCON session with Jeremy Gilbert and Mandi Cai, titled 'Tiny Tools for Big Impact: Local and Lightweight LLMs for Journalists'.

Learn more
📄

Sachita Nishal and I got a paper accepted to an ICA preconference, titled 'Good enough' news? Model substitution for local reporting.

🏆

My dataset study with Jack Bandy, Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl, was accepted at ICWSM!

Research Focus

My research centers on understanding the dynamics of collective attention in complex digital ecosystems. I employ computational methods, data science, and network analysis to investigate:

🤖

AI in Journalism & Media

The application and impact of artificial intelligence, particularly generative AI, in newsrooms and the broader media landscape.

📈

Information Discovery & Virality

How individuals and groups discover information online, why certain content becomes popular or viral.

🌐

Platform Dynamics

Analyzing large social platforms and attention markets to understand their structural effects.

🔬

Computational Methods for Social Science

New computational techniques for studying platforms, content, and collective behavior.

Academic Papers & Talks

Good enough news? Model substitution for local reporting

Nick Hagar , Sachita Nishal

ICA (International Communication Association) Preconference, 2025

Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl

Nick Hagar , Jack Bandy

ICWSM (International AAAI Conference on Web and Social Media), 2025

Media Mentions & Appearances

For media inquiries: nicholas.hagar[at]northwestern.edu

Software & Code

Substack API

Unofficial wrapper for Substack APIs to fetch newsletters, posts, and more.

Explore project

Common Crawl Genealogy

Data collection code for 50+ LLM training datasets.

Explore project

Archive Check

CLI for collecting website data from the Internet Archive, GDELT, and more.

Explore project
Explore all projects on GitHub

Open source contributions and experiments in data science, web scraping, and journalism tools

Blog & Newsletter

Prototypes

LLM data schema generator

An early attempt to automatically generate schema for structured outputs with LLMs.

LLM Document Processing
Explore Prototype

About

I'm a postdoctoral scholar at Northwestern University, working on the Generative AI in the Newsroom Initiative. I'm also an incoming assistant professor at the University of Minnesota. I've worked at the New York Times, Meta, and Patreon. I also have a PhD from Northwestern University, where I was part of the Computational Journalism Lab.

I research how collective attention works in large, complex systems. I use data science and engineering to study how people discover information online, why things get popular, and what influences content creators. My first love is journalism, but I also study large social platforms (TikTok, Reddit, Facebook) and other attention markets (Netflix, Substack).

Before grad school, I worked in audience development and analytics at Fusion Media Group, Digiday, Pacific Standard and the Dallas Morning News. I've written for Pacific Standard, the Christian Science Monitor, McSweeney's Internet Tendency and others. I graduated from the Medill School of Journalism and am originally from Fort Wayne, Indiana.