Humanities Data Benchmark
Welcome to the Humanities Data Benchmark report page. This page provides an overview of all benchmark tests,
results, and comparisons.
Latest Benchmark Results
Benchmark |
Latest Results |
bibliographic_data |
ID | Model | Date | Results |
---|
T27 | gpt-4o-mini | 2025-04-08 |  | T07 | gpt-4o | 2025-04-08 |  | T106 | claude-opus-4-20250514 | 2025-05-23 |  | T26 | gpt-4.5-preview | 2025-04-08 |  | T107 | claude-sonnet-4-20250514 | 2025-05-23 |  | T33 | gemini-2.0-flash-lite | 2025-04-08 |  | T30 | gemini-1.5-pro | 2025-04-08 |  | T29 | gemini-1.5-flash | 2025-04-08 |  | T08 | gemini-2.0-flash | 2025-04-08 |  | T35 | pixtral-large-latest | 2025-04-08 |  | T09 | claude-3-5-sonnet-20241022 | 2025-04-08 |  | T31 | claude-3-7-sonnet-20250219 | 2025-04-08 |  |
|
fraktur |
ID | Model | Date | Results |
---|
T87 | gemini-exp-1206 | 2025-05-09 |  | T22 | gemini-2.5-pro-exp-03-25 | 2025-05-09 |  | T80 | gemini-2.5-pro-exp-03-25 | 2025-05-09 |  | T91 | gemini-2.0-pro-exp-02-05 | 2025-05-09 |  | T97 | gemini-2.5-pro-preview-05-06 | 2025-05-09 |  | T90 | gemini-2.0-flash-lite | 2025-05-09 |  | T89 | gemini-1.5-pro | 2025-05-09 |  | T96 | gemini-2.5-flash-preview-04-17 | 2025-05-09 |  | T86 | gemini-2.0-flash | 2025-05-09 |  | T98 | claude-opus-4-20250514 | 2025-05-23 |  | T99 | claude-sonnet-4-20250514 | 2025-05-23 |  | T88 | gemini-1.5-flash | 2025-05-09 |  | T83 | gpt-4.1 | 2025-05-09 |  | T84 | gpt-4.1-mini | 2025-05-09 |  | T79 | gpt-4o | 2025-05-09 |  | T95 | pixtral-large-latest | 2025-05-09 |  | T82 | gpt-4o-mini | 2025-05-09 |  | T81 | gpt-4.5-preview | 2025-05-09 |  | T93 | claude-3-5-sonnet-20241022 | 2025-05-09 |  | T85 | gpt-4.1-nano | 2025-05-09 |  | T94 | claude-3-opus-20240229 | 2025-05-09 |  | T92 | claude-3-7-sonnet-20250219 | 2025-05-09 |  |
|
metadata_extraction |
ID | Model | Date | Results |
---|
T11 | gpt-4.5-preview | 2025-04-11 |  | T40 | gpt-4.5-preview | 2025-04-11 |  | T41 | gpt-4.5-preview | 2025-04-11 |  | T14 | gemini-exp-1206 | 2025-04-11 |  | T19 | gemini-2.5-pro-exp-03-25 | 2025-04-01 |  | T21 | gemini-2.0-pro-exp-02-05 | 2025-04-01 |  | T38 | gpt-4o | 2025-04-11 |  | T77 | gpt-4o-mini | 2025-04-17 |  | T70 | gpt-4.1-mini | 2025-04-17 |  | T72 | gpt-4.1-mini | 2025-04-17 |  | T71 | gpt-4.1-mini | 2025-04-17 |  | T43 | gpt-4o-mini | 2025-04-11 |  | T69 | gpt-4.1 | 2025-04-17 |  | T78 | gpt-4o-mini | 2025-04-17 |  | T101 | claude-opus-4-20250514 | 2025-05-23 |  | T12 | gpt-4o-mini | 2025-04-11 |  | T44 | gemini-2.0-flash | 2025-04-11 |  | T76 | gpt-4o-mini | 2025-04-17 |  | T10 | gpt-4o | 2025-04-11 |  | T42 | gpt-4o-mini | 2025-04-11 |  | T56 | gemini-2.0-flash-lite | 2025-04-11 |  | T67 | gpt-4.1 | 2025-04-17 |  | T20 | gemini-2.0-flash-lite | 2025-04-11 |  | T39 | gpt-4o | 2025-04-11 |  | T57 | gemini-2.0-flash-lite | 2025-04-11 |  | T68 | gpt-4.1 | 2025-04-17 |  | T13 | gemini-2.0-flash | 2025-04-11 |  | T73 | gpt-4.1-nano | 2025-04-17 |  | T45 | gemini-2.0-flash | 2025-04-11 |  | T75 | gpt-4.1-nano | 2025-04-17 |  | T74 | gpt-4.1-nano | 2025-04-17 |  | T16 | gemini-1.5-pro | 2025-04-11 |  | T17 | claude-3-7-sonnet-20250219 | 2025-04-11 |  | T24 | claude-3-7-sonnet-20250219 | 2025-04-11 |  | T52 | claude-3-5-sonnet-20241022 | 2025-04-11 |  | T25 | claude-3-7-sonnet-20250219 | 2025-04-11 |  | T18 | claude-3-5-sonnet-20241022 | 2025-04-11 |  | T48 | gemini-1.5-flash | 2025-04-11 |  | T53 | claude-3-5-sonnet-20241022 | 2025-04-11 |  | T100 | claude-opus-4-20250514 | 2025-05-23 |  | T15 | gemini-1.5-flash | 2025-04-11 |  | T102 | claude-opus-4-20250514 | 2025-05-23 |  | T105 | claude-sonnet-4-20250514 | 2025-05-23 |  | T60 | pixtral-large-latest | 2025-04-11 |  | T23 | pixtral-large-latest | 2025-04-11 |  | T103 | claude-sonnet-4-20250514 | 2025-05-23 |  | T49 | gemini-1.5-flash | 2025-04-11 |  | T104 | claude-sonnet-4-20250514 | 2025-05-23 |  | T61 | pixtral-large-latest | 2025-04-11 |  | T63 | claude-3-opus-20240229 | 2025-04-11 |  | T62 | claude-3-opus-20240229 | 2025-04-11 |  | T36 | claude-3-opus-20240229 | 2025-04-11 |  |
|
test_benchmark |
ID | Model | Date | Results |
---|
T01 | gpt-4o | 2025-04-01 |  | T02 | gemini-2.0-flash | 2025-04-01 |  | T03 | claude-3-5-sonnet-20241022 | 2025-04-01 |  |
|
test_benchmark2 |
ID | Model | Date | Results |
---|
T04 | gpt-4o | 2025-04-01 |  | T05 | gemini-2.0-flash | 2025-04-01 |  | T06 | claude-3-5-sonnet-20241022 | 2025-04-01 |  |
|
About This Page
This benchmark suite is designed to test AI models on humanities data tasks. The tests run weekly and
results are automatically updated.
For more details, visit the GitHub repository.