LLMs in Survey and Public Opinion Research

A curated, searchable database of empirical studies using large language models across all phases of the research process. Find, compare, and cite papers with standardized metadata on tasks, LLMs, prompting approaches, and more.

Papers
LLM Families
Research Tasks
Loading papers…

About SurvAI Hub

What is this database?

SurvAI Hub is a repository of empirical studies that use large language models (LLMs) at one or more phases of the survey and public opinion research process. Each entry is coded with standardized metadata to enable systematic comparison across studies regarding their research designs. For details on the categories, see below. Entries are based on keyword searches on three scientific databases and include English-language empirical studies from 2019 to February 4, 2025, that use generative LLMs in the research process for gaining an understanding of human attitudes or behaviors. For details on the search strategy and eligibility criteria, as well as a systematic review of the studies, see the accompanying paper:

von der Heyde, L., Keusch, F., Buskirk, T., & Eck, A. (2026). AI in the Loop?! A Systematic Review of the Use of Large Language Models in Survey and Public Opinion Research. https://doi.org/10.31235/osf.io/eubj4_v1.

How to use this database

Use the search bar to find papers by author, title, abstract, or LLM used. Use the filters to narrow down by publication type and year, research phase, task, LLM family, prompting approach, and more. Multiple values can be selected within each filter. Click Show Details on any card to see the abstract, exact LLM(s) used, domain, citation, and DOI. Clicking on the paper title will open the DOI.

Design inspired by Jens Rupprecht (database forthcoming) at the University of Mannheim.

Coding Scheme

Every entry uses the following standardized coding scheme:

FieldTypeDescription
Publication typecategoricalType of publication: journal article, preprint, or conference paper
Phasecategorical (multi)Research phase(s) the LLM was used in: Before, during, after data collection
Taskcategorical (multi)Specific research task(s) the LLM was used for
Domaincategorical (multi)Academic domain the study was conducted in (missing values might occur)
LLM familycategorical (multi)LLM family/families used (e.g., GPT, Llama, Claude, etc.)
LLMtext (multi)Exact model(s) used (e.g., GPT-4, Llama-3.1, etc.)
Interaction approachcategorical (multi)Zero-shot, one-/few-shot, fine-tuning
Languagecategorical (multi)Language(s) of the prompt or LLM input data
Populationcategorical (multi)Population(s) the input data came from or output refers to. May be only a subpopulation of the named country!
Data typecategorical (multi)Type(s) of input data used: survey, social media, open-ended (e.g., interviews, free-text responses), online reviews
Silicon samplingbooleanWhether the study uses silicon sampling (studies with data generation task only)

Missing a study?

The database currently only features studies included in the accompanying systematic review. We aim to update it and take on submissions with the same eligibility criteria and metadata soon.