Skip to content

Humanities Data Benchmark

Welcome to the Humanities Data Benchmark report page. This page provides an overview of all benchmark tests, results, and comparisons.

Latest Benchmark Results

Benchmark Latest Results
bibliographic_data
IDModelDateResults
T27gpt-4o-mini2025-04-08fuzzy
T07gpt-4o2025-04-08fuzzy
T106claude-opus-4-202505142025-05-23fuzzy
T26gpt-4.5-preview2025-04-08fuzzy
T107claude-sonnet-4-202505142025-05-23fuzzy
T33gemini-2.0-flash-lite2025-04-08fuzzy
T30gemini-1.5-pro2025-04-08fuzzy
T29gemini-1.5-flash2025-04-08fuzzy
T08gemini-2.0-flash2025-04-08fuzzy
T35pixtral-large-latest2025-04-08fuzzy
T09claude-3-5-sonnet-202410222025-04-08fuzzy
T31claude-3-7-sonnet-202502192025-04-08fuzzy
fraktur
IDModelDateResults
T87gemini-exp-12062025-05-09fuzzy cer
T22gemini-2.5-pro-exp-03-252025-05-09fuzzy cer
T80gemini-2.5-pro-exp-03-252025-05-09fuzzy cer
T91gemini-2.0-pro-exp-02-052025-05-09fuzzy cer
T97gemini-2.5-pro-preview-05-062025-05-09fuzzy cer
T90gemini-2.0-flash-lite2025-05-09fuzzy cer
T89gemini-1.5-pro2025-05-09fuzzy cer
T96gemini-2.5-flash-preview-04-172025-05-09fuzzy cer
T86gemini-2.0-flash2025-05-09fuzzy cer
T98claude-opus-4-202505142025-05-23fuzzy cer
T99claude-sonnet-4-202505142025-05-23fuzzy cer
T88gemini-1.5-flash2025-05-09fuzzy cer
T83gpt-4.12025-05-09fuzzy cer
T84gpt-4.1-mini2025-05-09fuzzy cer
T79gpt-4o2025-05-09fuzzy cer
T95pixtral-large-latest2025-05-09fuzzy cer
T82gpt-4o-mini2025-05-09fuzzy cer
T81gpt-4.5-preview2025-05-09fuzzy cer
T93claude-3-5-sonnet-202410222025-05-09fuzzy cer
T85gpt-4.1-nano2025-05-09fuzzy cer
T94claude-3-opus-202402292025-05-09fuzzy cer
T92claude-3-7-sonnet-202502192025-05-09fuzzy cer
metadata_extraction
IDModelDateResults
T11gpt-4.5-preview2025-04-11f1_macro f1_micro
T40gpt-4.5-preview2025-04-11f1_macro f1_micro
T41gpt-4.5-preview2025-04-11f1_macro f1_micro
T14gemini-exp-12062025-04-11f1_macro f1_micro
T19gemini-2.5-pro-exp-03-252025-04-01f1_macro f1_micro
T21gemini-2.0-pro-exp-02-052025-04-01f1_macro f1_micro
T38gpt-4o2025-04-11f1_macro f1_micro
T77gpt-4o-mini2025-04-17f1_macro f1_micro
T70gpt-4.1-mini2025-04-17f1_macro f1_micro
T72gpt-4.1-mini2025-04-17f1_macro f1_micro
T71gpt-4.1-mini2025-04-17f1_macro f1_micro
T43gpt-4o-mini2025-04-11f1_macro f1_micro
T69gpt-4.12025-04-17f1_macro f1_micro
T78gpt-4o-mini2025-04-17f1_macro f1_micro
T101claude-opus-4-202505142025-05-23f1_macro f1_micro
T12gpt-4o-mini2025-04-11f1_macro f1_micro
T44gemini-2.0-flash2025-04-11f1_macro f1_micro
T76gpt-4o-mini2025-04-17f1_macro f1_micro
T10gpt-4o2025-04-11f1_macro f1_micro
T42gpt-4o-mini2025-04-11f1_macro f1_micro
T56gemini-2.0-flash-lite2025-04-11f1_macro f1_micro
T67gpt-4.12025-04-17f1_macro f1_micro
T20gemini-2.0-flash-lite2025-04-11f1_macro f1_micro
T39gpt-4o2025-04-11f1_macro f1_micro
T57gemini-2.0-flash-lite2025-04-11f1_macro f1_micro
T68gpt-4.12025-04-17f1_macro f1_micro
T13gemini-2.0-flash2025-04-11f1_macro f1_micro
T73gpt-4.1-nano2025-04-17f1_macro f1_micro
T45gemini-2.0-flash2025-04-11f1_macro f1_micro
T75gpt-4.1-nano2025-04-17f1_macro f1_micro
T74gpt-4.1-nano2025-04-17f1_macro f1_micro
T16gemini-1.5-pro2025-04-11f1_macro f1_micro
T17claude-3-7-sonnet-202502192025-04-11f1_macro f1_micro
T24claude-3-7-sonnet-202502192025-04-11f1_macro f1_micro
T52claude-3-5-sonnet-202410222025-04-11f1_macro f1_micro
T25claude-3-7-sonnet-202502192025-04-11f1_macro f1_micro
T18claude-3-5-sonnet-202410222025-04-11f1_macro f1_micro
T48gemini-1.5-flash2025-04-11f1_macro f1_micro
T53claude-3-5-sonnet-202410222025-04-11f1_macro f1_micro
T100claude-opus-4-202505142025-05-23f1_macro f1_micro
T15gemini-1.5-flash2025-04-11f1_macro f1_micro
T102claude-opus-4-202505142025-05-23f1_macro f1_micro
T105claude-sonnet-4-202505142025-05-23f1_macro f1_micro
T60pixtral-large-latest2025-04-11f1_macro f1_micro
T23pixtral-large-latest2025-04-11f1_macro f1_micro
T103claude-sonnet-4-202505142025-05-23f1_macro f1_micro
T49gemini-1.5-flash2025-04-11f1_macro f1_micro
T104claude-sonnet-4-202505142025-05-23f1_macro f1_micro
T61pixtral-large-latest2025-04-11f1_macro f1_micro
T63claude-3-opus-202402292025-04-11f1_macro f1_micro
T62claude-3-opus-202402292025-04-11f1_macro f1_micro
T36claude-3-opus-202402292025-04-11f1_macro f1_micro
test_benchmark
IDModelDateResults
T01gpt-4o2025-04-01score
T02gemini-2.0-flash2025-04-01score
T03claude-3-5-sonnet-202410222025-04-01score
test_benchmark2
IDModelDateResults
T04gpt-4o2025-04-01score
T05gemini-2.0-flash2025-04-01score
T06claude-3-5-sonnet-202410222025-04-01score

About This Page

This benchmark suite is designed to test AI models on humanities data tasks. The tests run weekly and results are automatically updated.

For more details, visit the GitHub repository.