Humanities Data Benchmark¶
Welcome to the Humanities Data Benchmark report page. This page provides an overview of all benchmark tests, results, and comparisons.
Leaderboard¶
The table below shows the global average performance, cost efficiency, and time efficiency of each model across the seven core benchmarks: bibliographic_data, blacklist, company_lists, fraktur, medieval_manuscripts, metadata_extraction, and zettelkatalog.
The Model and Provider columns identify each AI system. Global Average represents the mean performance score across all seven benchmarks (higher is better). Cost/Point and Time/Point show normalized efficiency metrics calculated per test, averaged per benchmark, then averaged globally; this multi-level normalization accounts for different numbers of items, test configurations, and benchmark scales. For efficiency metrics, lower values are better, indicating less cost or time needed per performance point achieved. The seven benchmark-specific columns show average performance for each individual benchmark. Only models with results in all seven benchmarks are included. Click on any column header to sort the table.
| Model ↕ | Provider ↕ | Global Average ↕ | Cost/Point ↕ | Time/Point ↕ | bibliographic_data ↕ | blacklist ↕ | company_lists ↕ | fraktur ↕ | medieval_manuscripts ↕ | metadata_extraction ↕ | zettelkatalog ↕ |
|---|---|---|---|---|---|---|---|---|---|---|---|
| gemini-2.5-pro | 0.739 | $0.2347 | 37.58s | ||||||||
| gemini-2.5-flash-preview-09-2025 | 0.728 | $0.1250 | 22.63s | ||||||||
| gemini-2.5-flash | 0.718 | $0.0582 | 29.67s | ||||||||
| gemini-2.0-flash | 0.659 | $0.0245 | 11.85s | ||||||||
| claude-3-7-sonnet-20250219 | Anthropic | 0.650 | $0.8158 | 24.77s | |||||||
| gemini-2.0-flash-lite | 0.649 | $0.0197 | 13.05s | ||||||||
| gpt-4.1 | OpenAI | 0.646 | $0.6032 | 44.79s | |||||||
| gpt-5-mini | OpenAI | 0.643 | $0.3385 | 82.07s | |||||||
| gpt-4o | OpenAI | 0.629 | $0.4074 | 63.44s | |||||||
| claude-opus-4-1-20250805 | Anthropic | 0.629 | $4.2044 | 43.15s | |||||||
| mistral-large-latest | Mistral AI | 0.617 | $0.3852 | 28.29s | |||||||
| mistral-medium-2505 | Mistral AI | 0.605 | $0.0943 | 34.23s | |||||||
| gpt-5 | OpenAI | 0.603 | $2.0520 | 246.02s | |||||||
| mistral-medium-2508 | Mistral AI | 0.597 | $0.0960 | 57.37s | |||||||
| gpt-4.1-mini | OpenAI | 0.596 | $0.0729 | 79.96s | |||||||
| claude-3-5-sonnet-20241022 | Anthropic | 0.595 | $0.8166 | 23.59s | |||||||
| o3 | OpenAI | 0.594 | $0.9085 | 130.09s | |||||||
| gemini-2.5-flash-lite-preview-09-2025 | 0.588 | $0.0763 | 5.45s | ||||||||
| claude-opus-4-20250514 | Anthropic | 0.586 | $4.4639 | 50.88s | |||||||
| gemini-2.5-flash-lite | 0.579 | $0.0366 | 11.40s | ||||||||
| claude-sonnet-4-20250514 | Anthropic | 0.554 | $0.9203 | 42.71s | |||||||
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | 0.544 | $0.1362 | 44.80s | |||||||
| pixtral-large-latest | Mistral AI | 0.502 | $0.8419 | 40.18s | |||||||
| gpt-5-nano | OpenAI | 0.450 | $0.2380 | 401.68s | |||||||
| claude-sonnet-4-5-20250929 | Anthropic | 0.433 | $1.1710 | 19.49s | |||||||
| gpt-4.1-nano | OpenAI | 0.430 | $0.0162 | 17.80s | |||||||
| gpt-4o-mini | OpenAI | 0.416 | $0.3556 | 2419.74s | |||||||
| claude-3-opus-20240229 | Anthropic | 0.395 | $6.5252 | 98.10s | |||||||
| pixtral-12b | Mistral AI | 0.374 | $0.0689 | 61.59s | |||||||
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | 0.283 | $0.0000 | 326.11s | |||||||
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | 0.171 | $0.6134 | 220.84s |
The following radar chart shows the performance distribution of top models across the seven core benchmarks:

Latest Benchmark Results¶
The tables below show detailed results for each benchmark, with each row representing a single test configuration run on the most recent date. The Model and Provider columns identify the AI system used. Each test has a unique Test ID (click to see full history) and shows the most recent execution Date. The Prompt and Rules columns indicate the configuration used. Results show the performance score (fuzzy match for bibliographic_data/fraktur, F1-micro for metadata_extraction/zettelkatalog; higher is better). Cost (USD) represents the total cost for processing all items in the test. Cost/Point shows cost efficiency ($/performance point; lower is better). Test Time (s) is the total execution time for all items. Time/Point shows time efficiency (seconds/performance point; lower is better).
bibliographic_data¶
| Model ↕ | Provider ↕ | Test ID ↕ | Date ↕ | Prompt ↕ | Rules ↕ | Results ↕ | Cost (USD) ↕ | Cost/Point ↕ | Test Time (s) ↕ | Time/Point ↕ |
|---|---|---|---|---|---|---|---|---|---|---|
| gemini-2.5-flash-preview-09-2025 | T0219 | 2025-10-01 | prompt.txt | None | $0.0307 | $0.0437 | 116.65 | 166.10 | ||
| gpt-5 | OpenAI | T0129 | 2025-10-01 | prompt.txt | None | $0.3421 | $0.4992 | 591.90 | 863.65 | |
| gpt-5-mini | OpenAI | T0130 | 2025-10-01 | prompt.txt | None | $0.0582 | $0.0860 | 411.12 | 607.56 | |
| gemini-2.5-flash | T0195 | 2025-09-30 | prompt.txt | None | $0.0252 | $0.0376 | 195.82 | 292.59 | ||
| claude-sonnet-4-20250514 | Anthropic | T0107 | 2025-09-30 | prompt.txt | None | $0.1692 | $0.2531 | 127.79 | 191.16 | |
| o3 | OpenAI | T0133 | 2025-10-01 | prompt.txt | None | $0.1885 | $0.2827 | 391.04 | 586.48 | |
| gemini-2.5-pro | T0128 | 2025-09-30 | prompt.txt | None | $0.1032 | $0.1554 | 227.18 | 342.25 | ||
| mistral-medium-2505 | Mistral AI | T0170 | 2025-10-01 | prompt.txt | None | $0.0222 | $0.0336 | 128.32 | 194.01 | |
| gpt-4.1 | OpenAI | T0139 | 2025-10-01 | prompt.txt | None | $0.0952 | $0.1449 | 298.94 | 455.07 | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0233 | 2025-10-17 | prompt.txt | None | $0.1268 | $0.1931 | 923.12 | 1405.84 | |
| mistral-medium-2508 | Mistral AI | T0169 | 2025-10-01 | prompt.txt | None | $0.0220 | $0.0336 | 112.70 | 172.31 | |
| claude-3-5-sonnet-20241022 | Anthropic | T0009 | 2025-09-30 | prompt.txt | None | $0.1682 | $0.2576 | 124.19 | 190.17 | |
| gpt-4o | OpenAI | T0007 | 2025-09-30 | prompt.txt | None | $0.1136 | $0.1748 | 350.22 | 538.95 | |
| claude-3-7-sonnet-20250219 | Anthropic | T0031 | 2025-09-30 | prompt.txt | None | $0.1765 | $0.2720 | 136.48 | 210.38 | |
| gpt-4.1-mini | OpenAI | T0140 | 2025-10-01 | prompt.txt | None | $0.0199 | $0.0307 | 164.93 | 254.41 | |
| mistral-large-latest | Mistral AI | T0187 | 2025-10-01 | prompt.txt | None | $0.0805 | $0.1259 | 136.28 | 213.12 | |
| claude-opus-4-1-20250805 | Anthropic | T0127 | 2025-09-30 | prompt.txt | None | $0.9735 | $1.5435 | 203.32 | 322.38 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0234 | 2025-10-17 | prompt.txt | None | $0.0062 | $0.0099 | 151.02 | 241.35 | |
| gemini-2.0-flash | T0008 | 2025-09-30 | prompt.txt | None | $0.0052 | $0.0087 | 69.66 | 115.32 | ||
| gpt-5-nano | OpenAI | T0131 | 2025-10-01 | prompt.txt | None | $0.0281 | $0.0476 | 401.62 | 681.07 | |
| claude-opus-4-20250514 | Anthropic | T0106 | 2025-09-30 | prompt.txt | None | $0.8992 | $1.5413 | 193.49 | 331.67 | |
| gemini-2.5-flash-lite-preview-09-2025 | T0211 | 2025-10-01 | prompt.txt | None | $0.0048 | $0.0083 | 18.69 | 32.28 | ||
| gemini-2.5-flash-lite | T0203 | 2025-10-01 | prompt.txt | None | $0.0039 | $0.0072 | 19.33 | 35.50 | ||
| pixtral-large-latest | Mistral AI | T0035 | 2025-09-30 | prompt.txt | None | $0.1079 | $0.2123 | 199.57 | 392.57 | |
| gpt-4o-mini | OpenAI | T0027 | 2025-09-30 | prompt.txt | None | $0.0261 | $0.0526 | 233.18 | 468.66 | |
| gemini-2.0-flash-lite | T0033 | 2025-09-30 | prompt.txt | None | $0.0055 | $0.0140 | 90.14 | 230.26 | ||
| claude-3-opus-20240229 | Anthropic | T0138 | 2025-10-01 | prompt.txt | None | $0.6830 | $1.8417 | 237.58 | 640.62 | |
| gpt-4.1-nano | OpenAI | T0141 | 2025-10-01 | prompt.txt | None | $0.0044 | $0.0139 | 105.83 | 330.56 | |
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0237 | 2025-10-17 | prompt.txt | None | $0.0000 | $0.0000 | 361.78 | 1560.25 | |
| gpt-4.5-preview | OpenAI | T0026 | 2025-04-08 | prompt.txt | None | N/A | N/A | 480.16 | 2367.38 | |
| pixtral-12b | Mistral AI | T0181 | 2025-10-01 | prompt.txt | None | $0.0036 | $0.0197 | 38.06 | 207.04 | |
| gemini-1.5-pro | T0030 | 2025-04-08 | prompt.txt | None | N/A | N/A | 88.55 | 763.13 | ||
| gemini-1.5-flash | T0029 | 2025-04-08 | prompt.txt | None | N/A | N/A | 62.39 | 586.44 | ||
| claude-sonnet-4-5-20250929 | Anthropic | T0225 | 2025-10-01 | prompt.txt | None | $0.1636 | N/A | 121.63 | N/A |
blacklist¶
| Model ↕ | Provider ↕ | Test ID ↕ | Date ↕ | Prompt ↕ | Rules ↕ | Results ↕ | Cost (USD) ↕ | Cost/Point ↕ | Test Time (s) ↕ | Time/Point ↕ |
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-4.1-mini | OpenAI | T0307 | 2025-10-24 | prompt.txt | None | $0.0274 | $0.0287 | 77.51 | 81.03 | |
| gemini-2.5-flash-preview-09-2025 | T0231 | 2025-10-24 | prompt.txt | None | $0.0188 | $0.0198 | 188.45 | 197.72 | ||
| gpt-4o | OpenAI | T0305 | 2025-10-24 | prompt.txt | None | $0.1336 | $0.1433 | 186.39 | 199.98 | |
| gpt-4.1 | OpenAI | T0232 | 2025-10-24 | prompt.txt | None | $0.1125 | $0.1219 | 185.64 | 201.16 | |
| gemini-2.5-pro | T0316 | 2025-10-24 | prompt.txt | None | $0.0583 | $0.0633 | 793.69 | 861.99 | ||
| gemini-2.0-flash-lite | T0314 | 2025-10-24 | prompt.txt | None | $0.0116 | $0.0128 | 145.10 | 159.49 | ||
| gemini-2.0-flash | T0313 | 2025-10-24 | prompt.txt | None | $0.0123 | $0.0135 | 145.32 | 159.96 | ||
| gpt-5 | OpenAI | T0309 | 2025-10-24 | prompt.txt | None | $0.5062 | $0.5587 | 1640.52 | 1810.67 | |
| x-ai/grok-4 | xAI (via OpenRouter) | T0336 | 2025-10-24 | prompt.txt | None | $0.9520 | $1.0572 | 1496.67 | 1662.16 | |
| mistral-medium-2508 | Mistral AI | T0327 | 2025-10-24 | prompt.txt | None | $0.0306 | $0.0342 | 170.53 | 190.87 | |
| claude-opus-4-1-20250805 | Anthropic | T0324 | 2025-10-24 | prompt.txt | None | $1.4416 | $1.6265 | 304.49 | 343.54 | |
| mistral-medium-2505 | Mistral AI | T0328 | 2025-10-24 | prompt.txt | None | $0.0306 | $0.0348 | 177.77 | 202.24 | |
| gemini-2.5-flash | T0315 | 2025-10-24 | prompt.txt | None | $0.0139 | $0.0158 | 216.33 | 246.30 | ||
| mistral-large-latest | Mistral AI | T0330 | 2025-10-24 | prompt.txt | None | $0.1344 | $0.1554 | 217.12 | 251.03 | |
| claude-3-7-sonnet-20250219 | Anthropic | T0320 | 2025-10-24 | prompt.txt | None | $0.2945 | $0.3414 | 158.36 | 183.55 | |
| gpt-5-mini | OpenAI | T0310 | 2025-10-24 | prompt.txt | None | $0.0844 | $0.0984 | 768.66 | 896.62 | |
| gemini-2.5-flash-lite-preview-09-2025 | T0318 | 2025-10-24 | prompt.txt | None | $0.0040 | $0.0047 | 29.94 | 35.06 | ||
| claude-sonnet-4-5-20250929 | Anthropic | T0325 | 2025-10-24 | prompt.txt | None | $0.3212 | $0.3766 | 194.14 | 227.63 | |
| gemini-2.5-flash-lite | T0317 | 2025-10-24 | prompt.txt | None | $0.0040 | $0.0047 | 89.48 | 105.22 | ||
| o3 | OpenAI | T0312 | 2025-10-24 | prompt.txt | None | $0.2112 | $0.2484 | 478.97 | 563.25 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0333 | 2025-10-24 | prompt.txt | None | $0.0130 | $0.0159 | 89.34 | 108.86 | |
| claude-opus-4-20250514 | Anthropic | T0322 | 2025-10-24 | prompt.txt | None | $1.4412 | $1.7591 | 301.84 | 368.43 | |
| claude-sonnet-4-20250514 | Anthropic | T0323 | 2025-10-24 | prompt.txt | None | $0.2911 | $0.3685 | 179.58 | 227.31 | |
| gpt-4.1-nano | OpenAI | T0308 | 2025-10-24 | prompt.txt | None | $0.0090 | $0.0115 | 69.88 | 89.36 | |
| gpt-4o-mini | OpenAI | T0306 | 2025-10-24 | prompt.txt | None | $0.1313 | $0.1685 | 155.92 | 200.10 | |
| pixtral-large-latest | Mistral AI | T0326 | 2025-10-24 | prompt.txt | None | $0.2711 | $0.3482 | 133.68 | 171.70 | |
| pixtral-12b | Mistral AI | T0329 | 2025-10-24 | prompt.txt | None | $0.0186 | $0.0241 | 62.89 | 81.43 | |
| gpt-5-nano | OpenAI | T0311 | 2025-10-24 | prompt.txt | None | $0.0370 | $0.0486 | 981.19 | 1289.00 | |
| qwen/qwen3-vl-8b-instruct | Alibaba (via OpenRouter) | T0335 | 2025-10-24 | prompt.txt | None | $0.0050 | $0.0070 | 114.55 | 160.22 | |
| qwen/qwen3-vl-30b-a3b-instruct | Alibaba (via OpenRouter) | T0334 | 2025-10-24 | prompt.txt | None | $0.0122 | $0.0175 | 293.34 | 422.08 | |
| claude-3-5-sonnet-20241022 | Anthropic | T0319 | 2025-10-24 | prompt.txt | None | $0.2892 | $0.4392 | 141.91 | 215.53 | |
| claude-3-opus-20240229 | Anthropic | T0321 | 2025-10-24 | prompt.txt | None | $1.4458 | $2.2659 | 258.87 | 405.70 | |
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0331 | 2025-10-24 | prompt.txt | None | $0.0000 | $0.0000 | N/A | N/A | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0332 | 2025-10-24 | prompt.txt | None | $0.0171 | $0.1113 | 220.78 | 1437.97 |
company_lists¶
| Model ↕ | Provider ↕ | Test ID ↕ | Date ↕ | Prompt ↕ | Rules ↕ | Results ↕ | Cost (USD) ↕ | Cost/Point ↕ | Test Time (s) ↕ | Time/Point ↕ |
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-5 | OpenAI | T0347 | 2025-10-28 | prompt.txt | None | $0.7163 | $1.2200 | 1853.75 | 3157.12 | |
| o3 | OpenAI | T0353 | 2025-10-28 | prompt.txt | None | $0.2726 | $0.4798 | 596.40 | 1049.60 | |
| gemini-2.5-pro | T0361 | 2025-10-28 | prompt.txt | None | $0.1753 | $0.3306 | 468.16 | 882.83 | ||
| claude-opus-4-20250514 | Anthropic | T0373 | 2025-10-28 | prompt.txt | None | $1.6953 | $3.4059 | 366.75 | 736.84 | |
| gemini-2.0-flash | T0355 | 2025-10-28 | prompt.txt | None | $0.0095 | $0.0193 | 141.12 | 286.34 | ||
| gemini-2.5-pro | T0362 | 2025-10-28 | prompt_min.txt | None | $0.1439 | $0.3002 | 328.96 | 686.13 | ||
| gemini-2.5-flash | T0359 | 2025-10-28 | prompt.txt | None | $0.0432 | $0.0907 | 291.69 | 612.05 | ||
| gemini-2.5-flash-preview-09-2025 | T0236 | 2025-10-28 | prompt_min.txt | None | $0.0388 | $0.0822 | 187.04 | 395.64 | ||
| claude-3-7-sonnet-20250219 | Anthropic | T0369 | 2025-10-28 | prompt.txt | None | $0.3387 | $0.7236 | 246.12 | 525.79 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0395 | 2025-10-28 | prompt.txt | None | $0.0128 | $0.0274 | 334.89 | 716.34 | |
| claude-opus-4-1-20250805 | Anthropic | T0378 | 2025-10-28 | prompt_min.txt | None | $1.6020 | $3.5332 | 494.73 | 1091.15 | |
| gemini-2.0-flash-lite | T0357 | 2025-10-28 | prompt.txt | None | $0.0075 | $0.0165 | 101.13 | 223.41 | ||
| gpt-5-mini | OpenAI | T0349 | 2025-10-28 | prompt.txt | None | $0.0853 | $0.1901 | 839.22 | 1870.80 | |
| claude-opus-4-1-20250805 | Anthropic | T0377 | 2025-10-28 | prompt.txt | None | $1.6994 | $3.7940 | 495.85 | 1107.02 | |
| gpt-5-mini | OpenAI | T0350 | 2025-10-28 | prompt_min.txt | None | $0.0740 | $0.1665 | 778.50 | 1750.76 | |
| gpt-4.1-mini | OpenAI | T0343 | 2025-10-28 | prompt.txt | None | $0.0243 | $0.0547 | 163.93 | 368.93 | |
| gemini-2.5-flash-preview-09-2025 | T0235 | 2025-10-28 | prompt.txt | None | $0.2013 | $0.4551 | 483.83 | 1094.03 | ||
| mistral-large-latest | Mistral AI | T0389 | 2025-10-28 | prompt.txt | None | $0.1304 | $0.2978 | 266.34 | 608.15 | |
| o3 | OpenAI | T0354 | 2025-10-28 | prompt_min.txt | None | $0.2749 | $0.6318 | 619.62 | 1424.11 | |
| claude-sonnet-4-20250514 | Anthropic | T0375 | 2025-10-28 | prompt.txt | None | $0.3448 | $0.8030 | 272.75 | 635.14 | |
| gpt-4o | OpenAI | T0337 | 2025-10-28 | prompt.txt | None | $0.1335 | $0.3135 | 463.46 | 1087.95 | |
| gemini-2.0-flash-lite | T0358 | 2025-10-28 | prompt_min.txt | None | $0.0069 | $0.0162 | 91.90 | 216.02 | ||
| gemini-2.5-flash | T0360 | 2025-10-28 | prompt_min.txt | None | $0.0358 | $0.0844 | 242.59 | 571.53 | ||
| mistral-large-latest | Mistral AI | T0390 | 2025-10-28 | prompt_min.txt | None | $0.1148 | $0.2731 | 279.32 | 664.14 | |
| mistral-medium-2505 | Mistral AI | T0385 | 2025-10-28 | prompt.txt | None | $0.0369 | $0.0881 | 238.87 | 570.63 | |
| claude-sonnet-4-5-20250929 | Anthropic | T0379 | 2025-10-28 | prompt.txt | None | $0.3680 | $0.8886 | 314.00 | 758.19 | |
| mistral-medium-2508 | Mistral AI | T0383 | 2025-10-28 | prompt.txt | None | $0.0379 | $0.0930 | 270.89 | 664.86 | |
| gpt-4.1 | OpenAI | T0341 | 2025-10-28 | prompt.txt | None | $0.1110 | $0.2765 | 406.12 | 1011.69 | |
| claude-3-5-sonnet-20241022 | Anthropic | T0367 | 2025-10-28 | prompt.txt | None | $0.3323 | $0.8346 | 233.88 | 587.36 | |
| gpt-5-nano | OpenAI | T0351 | 2025-10-28 | prompt.txt | None | $0.0356 | $0.0900 | 780.68 | 1975.90 | |
| gpt-4o | OpenAI | T0338 | 2025-10-28 | prompt_min.txt | None | $0.1185 | $0.3025 | 362.18 | 924.67 | |
| gpt-4.1-mini | OpenAI | T0344 | 2025-10-28 | prompt_min.txt | None | $0.0215 | $0.0553 | 141.42 | 364.44 | |
| claude-sonnet-4-5-20250929 | Anthropic | T0380 | 2025-10-28 | prompt_min.txt | None | $0.3289 | $0.8506 | 281.66 | 728.47 | |
| claude-opus-4-20250514 | Anthropic | T0374 | 2025-10-28 | prompt_min.txt | None | $1.5534 | $4.0595 | 368.93 | 964.11 | |
| mistral-medium-2508 | Mistral AI | T0384 | 2025-10-28 | prompt_min.txt | None | $0.0342 | $0.0894 | 294.97 | 771.99 | |
| claude-3-7-sonnet-20250219 | Anthropic | T0370 | 2025-10-28 | prompt_min.txt | None | $0.3080 | $0.8069 | 240.89 | 631.11 | |
| gemini-2.5-flash-lite-preview-09-2025 | T0366 | 2025-10-28 | prompt_min.txt | None | $0.0072 | $0.0190 | 31.60 | 83.75 | ||
| gpt-5 | OpenAI | T0348 | 2025-10-28 | prompt_min.txt | None | $0.5257 | $1.4404 | 1434.08 | 3929.70 | |
| mistral-medium-2505 | Mistral AI | T0386 | 2025-10-28 | prompt_min.txt | None | $0.0300 | $0.0833 | 251.47 | 697.90 | |
| gemini-2.0-flash | T0356 | 2025-10-28 | prompt_min.txt | None | $0.0092 | $0.0256 | 125.15 | 347.59 | ||
| claude-sonnet-4-20250514 | Anthropic | T0376 | 2025-10-28 | prompt_min.txt | None | $0.3161 | $0.8906 | 274.61 | 773.66 | |
| claude-3-5-sonnet-20241022 | Anthropic | T0368 | 2025-10-28 | prompt_min.txt | None | $0.2924 | $0.8266 | 218.63 | 618.04 | |
| gpt-4.1-nano | OpenAI | T0346 | 2025-10-28 | prompt_min.txt | None | $0.0057 | $0.0161 | 111.25 | 315.16 | |
| gpt-4.1-nano | OpenAI | T0345 | 2025-10-28 | prompt.txt | None | $0.0067 | $0.0192 | 119.48 | 341.26 | |
| gemini-2.5-flash-lite-preview-09-2025 | T0365 | 2025-10-28 | prompt.txt | None | $0.0091 | $0.0270 | 38.88 | 115.48 | ||
| gpt-4.1 | OpenAI | T0342 | 2025-10-28 | prompt_min.txt | None | $0.0885 | $0.2719 | 248.61 | 764.17 | |
| qwen/qwen3-vl-30b-a3b-instruct | Alibaba (via OpenRouter) | T0397 | 2025-10-28 | prompt.txt | None | $0.0147 | $0.0457 | 139.96 | 435.86 | |
| pixtral-large-latest | Mistral AI | T0382 | 2025-10-28 | prompt_min.txt | None | $0.2187 | $0.6925 | 441.43 | 1397.87 | |
| gpt-4o-mini | OpenAI | T0339 | 2025-10-28 | prompt.txt | None | $0.0655 | $0.2193 | 326.35 | 1092.12 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0396 | 2025-10-28 | prompt_min.txt | None | $0.0131 | $0.0445 | 423.01 | 1434.36 | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0394 | 2025-10-28 | prompt_min.txt | None | $0.1410 | $0.4870 | 2486.69 | 8588.47 | |
| gemini-2.5-flash-lite | T0363 | 2025-10-28 | prompt.txt | None | $0.0342 | $0.1212 | 296.39 | 1049.37 | ||
| gemini-2.5-flash-lite | T0364 | 2025-10-28 | prompt_min.txt | None | $0.0072 | $0.0261 | 88.32 | 321.70 | ||
| gpt-5-nano | OpenAI | T0352 | 2025-10-28 | prompt_min.txt | None | $0.0274 | $0.1010 | 536.15 | 1973.48 | |
| pixtral-12b | Mistral AI | T0387 | 2025-10-28 | prompt.txt | None | $0.0089 | $0.0334 | 72.07 | 269.86 | |
| qwen/qwen3-vl-8b-instruct | Alibaba (via OpenRouter) | T0399 | 2025-10-28 | prompt.txt | None | $0.0069 | $0.0268 | 465.24 | 1817.21 | |
| qwen/qwen3-vl-30b-a3b-instruct | Alibaba (via OpenRouter) | T0398 | 2025-10-28 | prompt_min.txt | None | $0.0118 | $0.0483 | 284.79 | 1169.75 | |
| gpt-4o-mini | OpenAI | T0340 | 2025-10-28 | prompt_min.txt | None | $0.0646 | $0.3063 | 276.13 | 1308.89 | |
| pixtral-large-latest | Mistral AI | T0381 | 2025-10-28 | prompt.txt | None | $0.1928 | $0.9326 | 313.38 | 1516.03 | |
| claude-3-opus-20240229 | Anthropic | T0371 | 2025-10-28 | prompt.txt | None | $1.9587 | $11.2177 | 705.22 | 4038.81 | |
| claude-3-opus-20240229 | Anthropic | T0372 | 2025-10-28 | prompt_min.txt | None | $1.3550 | $12.5063 | 503.03 | 4642.94 | |
| x-ai/grok-4 | xAI (via OpenRouter) | T0401 | 2025-10-28 | prompt.txt | None | $1.7651 | $34.0710 | 5302.03 | 102341.52 | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0393 | 2025-10-28 | prompt.txt | None | $0.0508 | $0.9962 | 754.65 | 14785.48 | |
| x-ai/grok-4 | xAI (via OpenRouter) | T0402 | 2025-10-28 | prompt_min.txt | None | $1.4525 | $44.7691 | 3063.28 | 94414.70 | |
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0391 | 2025-10-28 | prompt.txt | None | $0.0000 | N/A | N/A | N/A | |
| pixtral-12b | Mistral AI | T0388 | 2025-10-28 | prompt_min.txt | None | $0.0073 | N/A | 24.35 | N/A | |
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0392 | 2025-10-28 | prompt_min.txt | None | $0.0000 | N/A | N/A | N/A | |
| qwen/qwen3-vl-8b-instruct | Alibaba (via OpenRouter) | T0400 | 2025-10-28 | prompt_min.txt | None | $0.0124 | N/A | 538.37 | N/A |
fraktur¶
| Model ↕ | Provider ↕ | Test ID ↕ | Date ↕ | Prompt ↕ | Rules ↕ | Results ↕ | Cost (USD) ↕ | Cost/Point ↕ | Test Time (s) ↕ | Time/Point ↕ |
|---|---|---|---|---|---|---|---|---|---|---|
| gemini-exp-1206 | T0087 | 2025-05-09 | prompt_optimized.txt | None | N/A | N/A | 875.69 | 908.39 | ||
| gemini-2.5-pro | T0132 | 2025-10-01 | prompt_optimized.txt | None | $0.1068 | $0.1112 | 244.01 | 254.18 | ||
| gemini-2.5-pro-exp-03-25 | T0022 | 2025-05-09 | prompt.txt | None | N/A | N/A | 807.86 | 855.79 | ||
| gemini-2.5-pro-exp-03-25 | T0080 | 2025-05-09 | prompt_optimized.txt | None | N/A | N/A | 799.79 | 847.24 | ||
| gemini-2.0-pro-exp-02-05 | T0091 | 2025-05-09 | prompt_optimized.txt | None | N/A | N/A | 855.41 | 906.15 | ||
| gemini-2.5-pro-preview-05-06 | T0097 | 2025-05-09 | prompt_optimized.txt | None | N/A | N/A | 891.10 | 964.40 | ||
| gemini-2.5-flash | T0199 | 2025-09-30 | prompt_optimized.txt | None | $0.0253 | $0.0276 | 243.38 | 265.12 | ||
| gemini-2.5-flash-preview-09-2025 | T0223 | 2025-10-01 | prompt_optimized.txt | None | $0.0287 | $0.0329 | 157.75 | 180.91 | ||
| gemini-2.0-flash-lite | T0090 | 2025-09-30 | prompt_optimized.txt | None | $0.0034 | $0.0041 | 60.48 | 73.40 | ||
| gemini-2.0-flash | T0086 | 2025-09-30 | prompt_optimized.txt | None | $0.0044 | $0.0060 | 55.03 | 75.39 | ||
| claude-3-7-sonnet-20250219 | Anthropic | T0092 | 2025-09-30 | prompt_optimized.txt | None | $0.1777 | $0.2605 | 196.58 | 288.25 | |
| gemini-1.5-pro | T0089 | 2025-05-09 | prompt_optimized.txt | None | N/A | N/A | 123.51 | 192.99 | ||
| gemini-2.5-flash-lite | T0207 | 2025-10-01 | prompt_optimized.txt | None | $0.0045 | $0.0072 | 37.23 | 59.10 | ||
| gemini-2.5-flash-preview-04-17 | T0096 | 2025-05-09 | prompt_optimized.txt | None | N/A | N/A | 223.59 | 370.18 | ||
| gpt-4.1 | OpenAI | T0083 | 2025-09-30 | prompt_optimized.txt | None | $0.0776 | $0.1362 | 224.89 | 394.54 | |
| claude-opus-4-1-20250805 | Anthropic | T0123 | 2025-09-30 | prompt_optimized.txt | None | $0.9371 | $1.6615 | 289.28 | 512.90 | |
| gemini-1.5-flash | T0088 | 2025-05-09 | prompt_optimized.txt | None | N/A | N/A | 81.93 | 148.43 | ||
| mistral-large-latest | Mistral AI | T0191 | 2025-10-01 | prompt_optimized.txt | None | $0.0683 | $0.1329 | 217.88 | 423.88 | |
| gemini-2.5-flash-lite-preview-09-2025 | T0215 | 2025-10-01 | prompt_optimized.txt | None | $0.0058 | $0.0114 | 33.93 | 66.79 | ||
| claude-3-5-sonnet-20241022 | Anthropic | T0093 | 2025-09-30 | prompt_optimized.txt | None | $0.1305 | $0.2632 | 133.05 | 268.26 | |
| gpt-4o | OpenAI | T0079 | 2025-09-30 | prompt_optimized.txt | None | $0.0790 | $0.1659 | 516.44 | 1084.96 | |
| gpt-5-mini | OpenAI | T0121 | 2025-09-30 | prompt_optimized.txt | None | $0.0353 | $0.0746 | 243.61 | 513.94 | |
| claude-opus-4-20250514 | Anthropic | T0098 | 2025-09-30 | prompt_optimized.txt | None | $0.9807 | $2.1135 | 392.86 | 846.67 | |
| mistral-medium-2505 | Mistral AI | T0178 | 2025-10-01 | prompt_optimized.txt | None | $0.0211 | $0.0490 | 183.12 | 425.87 | |
| pixtral-large-latest | Mistral AI | T0095 | 2025-09-30 | prompt_optimized.txt | None | $0.0861 | $0.2121 | 137.85 | 339.53 | |
| claude-sonnet-4-20250514 | Anthropic | T0099 | 2025-09-30 | prompt_optimized.txt | None | $0.2083 | $0.5820 | 298.19 | 832.94 | |
| mistral-medium-2508 | Mistral AI | T0177 | 2025-10-01 | prompt_optimized.txt | None | $0.0213 | $0.0642 | 484.57 | 1459.55 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0251 | 2025-10-17 | prompt_optimized.txt | None | $0.0059 | $0.0196 | 112.89 | 376.30 | |
| gpt-4o-mini | OpenAI | T0082 | 2025-09-30 | prompt_optimized.txt | None | $0.0223 | $0.0834 | 110.91 | 413.85 | |
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0241 | 2025-10-17 | prompt_optimized.txt | None | $0.0000 | $0.0000 | 675.50 | 2659.44 | |
| claude-3-opus-20240229 | Anthropic | T0094 | 2025-09-30 | prompt_optimized.txt | None | $0.6288 | $2.8326 | 214.63 | 966.82 | |
| pixtral-12b | Mistral AI | T0185 | 2025-10-01 | prompt_optimized.txt | None | $0.0037 | $0.0171 | 376.97 | 1729.20 | |
| gpt-4.5-preview | OpenAI | T0081 | 2025-05-09 | prompt_optimized.txt | None | N/A | N/A | 224.02 | 1349.51 | |
| gpt-5 | OpenAI | T0120 | 2025-09-30 | prompt_optimized.txt | None | $0.2036 | $1.3394 | 493.97 | 3249.77 | |
| o3 | OpenAI | T0137 | 2025-10-01 | prompt_optimized.txt | None | $0.1485 | $1.0606 | 357.90 | 2556.41 | |
| gpt-4.1-mini | OpenAI | T0084 | 2025-09-30 | prompt_optimized.txt | None | $0.0106 | $0.2306 | 74.24 | 1613.84 | |
| gpt-5-nano | OpenAI | T0122 | 2025-09-30 | prompt_optimized.txt | None | $0.0054 | $0.8930 | 70.61 | 11768.32 | |
| gpt-4.1-nano | OpenAI | T0085 | 2025-09-30 | prompt_optimized.txt | None | $0.0013 | N/A | 11.57 | N/A | |
| claude-sonnet-4-5-20250929 | Anthropic | T0229 | 2025-10-01 | prompt_optimized.txt | None | $0.2178 | N/A | 301.59 | N/A | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0246 | 2025-10-17 | prompt_optimized.txt | None | $0.0048 | N/A | 47.09 | N/A |
medieval_manuscripts¶
| Model ↕ | Provider ↕ | Test ID ↕ | Date ↕ | Prompt ↕ | Rules ↕ | Results ↕ | Cost (USD) ↕ | Cost/Point ↕ | Test Time (s) ↕ | Time/Point ↕ |
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-4.1-mini | OpenAI | T0277 | 2025-10-24 | prompt.txt | None | $0.0125 | $0.0156 | 52.71 | 66.06 | |
| gpt-5-mini | OpenAI | T0280 | 2025-10-24 | prompt.txt | None | $0.0618 | $0.0865 | 507.84 | 711.26 | |
| claude-3-5-sonnet-20241022 | Anthropic | T0288 | 2025-10-24 | prompt.txt | None | $0.1236 | $0.1751 | 96.74 | 137.03 | |
| gemini-2.5-flash-preview-09-2025 | T0287 | 2025-10-24 | prompt.txt | None | $0.0106 | $0.0151 | 119.02 | 170.52 | ||
| gemini-2.5-flash | T0271 | 2025-10-24 | prompt.txt | None | $0.0111 | $0.0161 | 185.44 | 268.76 | ||
| gemini-2.5-pro | T0272 | 2025-10-24 | prompt.txt | None | $0.0391 | $0.0572 | 276.60 | 404.38 | ||
| claude-opus-4-1-20250805 | Anthropic | T0292 | 2025-10-24 | prompt.txt | None | $0.6464 | $0.9634 | 264.03 | 393.48 | |
| claude-3-7-sonnet-20250219 | Anthropic | T0274 | 2025-10-24 | prompt.txt | None | $0.1277 | $0.1932 | 101.40 | 153.40 | |
| gpt-4.1 | OpenAI | T0273 | 2025-10-24 | prompt.txt | None | $0.0535 | $0.0825 | 94.03 | 145.11 | |
| claude-opus-4-20250514 | Anthropic | T0290 | 2025-10-24 | prompt.txt | None | $0.6472 | $0.9988 | 242.75 | 374.62 | |
| mistral-medium-2508 | Mistral AI | T0295 | 2025-10-24 | prompt.txt | None | $0.0141 | $0.0218 | 84.91 | 131.03 | |
| qwen/qwen3-vl-8b-instruct | Alibaba (via OpenRouter) | T0303 | 2025-10-24 | prompt.txt | None | $0.0024 | $0.0038 | 39.60 | 62.46 | |
| gemini-2.5-flash-lite-preview-09-2025 | T0286 | 2025-10-24 | prompt.txt | None | $0.0021 | $0.0034 | 18.96 | 30.43 | ||
| mistral-large-latest | Mistral AI | T0298 | 2025-10-24 | prompt.txt | None | $0.0578 | $0.0936 | 75.48 | 122.13 | |
| gemini-2.0-flash-lite | T0284 | 2025-10-24 | prompt.txt | None | $0.0029 | $0.0048 | 36.60 | 60.90 | ||
| mistral-medium-2505 | Mistral AI | T0296 | 2025-10-24 | prompt.txt | None | $0.0189 | $0.0315 | 402.69 | 670.04 | |
| gpt-4o | OpenAI | T0275 | 2025-10-24 | prompt.txt | None | $0.0666 | $0.1121 | 107.60 | 181.15 | |
| gemini-2.5-flash-lite | T0285 | 2025-10-24 | prompt.txt | None | $0.0021 | $0.0036 | 43.30 | 74.40 | ||
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0299 | 2025-10-24 | prompt.txt | None | $0.0000 | $0.0000 | 39.56 | 68.91 | |
| gpt-4o-mini | OpenAI | T0276 | 2025-10-24 | prompt.txt | None | $0.0487 | $0.0856 | 73.54 | 129.25 | |
| gemini-2.0-flash | T0283 | 2025-10-24 | prompt.txt | None | $0.0071 | $0.0125 | 83.75 | 147.71 | ||
| claude-sonnet-4-5-20250929 | Anthropic | T0293 | 2025-10-24 | prompt.txt | None | $0.1378 | $0.2506 | 161.13 | 292.96 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0301 | 2025-10-24 | prompt.txt | None | $0.0060 | $0.0111 | 46.40 | 85.46 | |
| claude-sonnet-4-20250514 | Anthropic | T0291 | 2025-10-24 | prompt.txt | None | $0.1305 | $0.2504 | 142.79 | 274.06 | |
| o3 | OpenAI | T0282 | 2025-10-24 | prompt.txt | None | $0.3410 | $0.6686 | 708.16 | 1388.55 | |
| qwen/qwen3-vl-30b-a3b-instruct | Alibaba (via OpenRouter) | T0302 | 2025-10-24 | prompt.txt | None | $0.0078 | $0.0155 | 241.69 | 479.55 | |
| pixtral-large-latest | Mistral AI | T0294 | 2025-10-24 | prompt.txt | None | $0.1187 | $0.2655 | 98.34 | 220.01 | |
| pixtral-12b | Mistral AI | T0297 | 2025-10-24 | prompt.txt | None | $0.0068 | $0.0164 | 34.56 | 82.89 | |
| gpt-5 | OpenAI | T0279 | 2025-10-24 | prompt.txt | None | $0.5342 | $1.2965 | 2247.52 | 5455.15 | |
| x-ai/grok-4 | xAI (via OpenRouter) | T0304 | 2025-10-24 | prompt.txt | None | $1.4935 | $3.7152 | 2573.65 | 6402.12 | |
| claude-3-opus-20240229 | Anthropic | T0289 | 2025-10-24 | prompt.txt | None | $0.6457 | $1.6305 | 151.84 | 383.43 | |
| gpt-4.1-nano | OpenAI | T0278 | 2025-10-24 | prompt.txt | None | $0.0032 | $0.0081 | 31.64 | 80.31 | |
| gpt-5-nano | OpenAI | T0281 | 2025-10-24 | prompt.txt | None | $0.0094 | $0.0482 | 216.78 | 1117.42 | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0300 | 2025-10-24 | prompt.txt | None | $0.0298 | N/A | 2000.00 | N/A |
metadata_extraction¶
| Model ↕ | Provider ↕ | Test ID ↕ | Date ↕ | Prompt ↕ | Rules ↕ | Results ↕ | Cost (USD) ↕ | Cost/Point ↕ | Test Time (s) ↕ | Time/Point ↕ |
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-5 | OpenAI | T0109 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.4823 | $0.6183 | 1030.30 | 1320.90 | |
| o3 | OpenAI | T0135 | 2025-10-01 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.1855 | $0.2541 | 394.51 | 540.43 | |
| gpt-5 | OpenAI | T0108 | 2025-09-30 | prompt.txt | None | $1.2982 | $1.8285 | 2922.29 | 4115.90 | |
| gpt-5 | OpenAI | T0110 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.7651 | $1.1419 | 1711.52 | 2554.51 | |
| o3 | OpenAI | T0134 | 2025-10-01 | prompt.txt | None | $0.5053 | $0.8021 | 1038.49 | 1648.39 | |
| gemini-2.0-flash-lite | T0056 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0042 | $0.0066 | 45.77 | 72.65 | ||
| gemini-2.5-flash | T0197 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0087 | $0.0139 | 195.57 | 310.43 | ||
| gemini-2.5-pro | T0125 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0426 | $0.0687 | 203.80 | 328.71 | ||
| gpt-4.5-preview | OpenAI | T0011 | 2025-04-11 | prompt.txt | None | N/A | N/A | 960.76 | 1575.02 | |
| gemini-2.0-flash | T0044 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0055 | $0.0090 | 51.72 | 84.78 | ||
| gpt-4.5-preview | OpenAI | T0040 | 2025-04-11 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | N/A | N/A | 373.44 | 622.41 | |
| gemini-2.5-flash-preview-09-2025 | T0221 | 2025-10-01 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0108 | $0.0180 | 112.91 | 188.19 | ||
| gpt-4.5-preview | OpenAI | T0041 | 2025-04-11 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | N/A | N/A | 526.13 | 876.89 | |
| o3 | OpenAI | T0136 | 2025-10-01 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.3052 | $0.5087 | 624.76 | 1041.27 | |
| mistral-medium-2505 | Mistral AI | T0174 | 2025-10-01 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0231 | $0.0392 | 69.80 | 118.30 | |
| gemini-exp-1206 | T0014 | 2025-04-11 | prompt.txt | None | N/A | N/A | 894.64 | 1542.48 | ||
| gemini-2.5-pro-exp-03-25 | T0019 | 2025-04-01 | prompt.txt | None | N/A | N/A | 877.02 | 1512.10 | ||
| gpt-5-mini | OpenAI | T0112 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0622 | $0.1072 | 413.46 | 712.86 | |
| gemini-2.0-pro-exp-02-05 | T0021 | 2025-04-01 | prompt.txt | None | N/A | N/A | 856.03 | 1501.80 | ||
| gpt-5-nano | OpenAI | T0115 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0299 | $0.0524 | 429.53 | 753.56 | |
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0239 | 2025-10-17 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0000 | $0.0000 | 22.11 | 38.79 | |
| gpt-4.1-mini | OpenAI | T0070 | 2025-09-30 | prompt.txt | None | $0.0630 | $0.1124 | 152.75 | 272.78 | |
| gpt-4o-mini | OpenAI | T0012 | 2025-09-30 | prompt.txt | None | $0.3814 | $0.6935 | 211.44 | 384.43 | |
| gemini-2.5-pro | T0124 | 2025-09-30 | prompt.txt | None | $0.1057 | $0.1921 | 493.26 | 896.84 | ||
| gemini-2.5-flash-lite-preview-09-2025 | T0213 | 2025-10-01 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0026 | $0.0048 | 20.70 | 37.64 | ||
| gemini-2.0-flash-lite | T0020 | 2025-09-30 | prompt.txt | None | $0.0089 | $0.0165 | 105.24 | 194.88 | ||
| gpt-4o-mini | OpenAI | T0076 | 2025-09-30 | prompt.txt | None | $0.3815 | $0.7064 | 217.37 | 402.54 | |
| gpt-5-mini | OpenAI | T0111 | 2025-09-30 | prompt.txt | None | $0.1486 | $0.2752 | 1069.71 | 1980.95 | |
| gemini-2.5-flash | T0196 | 2025-09-30 | prompt.txt | None | $0.0217 | $0.0403 | 439.55 | 813.98 | ||
| gemini-2.5-flash-preview-09-2025 | T0220 | 2025-10-01 | prompt.txt | None | $0.0427 | $0.0791 | 307.46 | 569.36 | ||
| gpt-4.1-mini | OpenAI | T0071 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0238 | $0.0440 | 75.20 | 139.26 | |
| mistral-large-latest | Mistral AI | T0189 | 2025-10-01 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.1082 | $0.2004 | 64.28 | 119.03 | |
| gpt-4.1-mini | OpenAI | T0072 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0393 | $0.0727 | 86.89 | 160.91 | |
| gpt-4o | OpenAI | T0010 | 2025-09-30 | prompt.txt | None | $0.2844 | $0.5367 | 432.47 | 815.98 | |
| gpt-4o-mini | OpenAI | T0042 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.1368 | $0.2580 | 86.90 | 163.97 | |
| gpt-4o-mini | OpenAI | T0077 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.1368 | $0.2580 | 84.87 | 160.14 | |
| gemini-2.0-flash-lite | T0057 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0047 | $0.0089 | 55.63 | 104.95 | ||
| gemini-2.5-pro | T0126 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0617 | $0.1164 | 275.31 | 519.45 | ||
| gemini-2.0-flash | T0013 | 2025-09-30 | prompt.txt | None | $0.0118 | $0.0227 | 121.45 | 233.56 | ||
| gpt-4o | OpenAI | T0039 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.1756 | $0.3376 | 358.87 | 690.14 | |
| gpt-4o-mini | OpenAI | T0043 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.2447 | $0.4706 | 120.99 | 232.67 | |
| gemini-2.5-flash | T0198 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0130 | $0.0250 | 196.67 | 378.22 | ||
| gpt-4.1 | OpenAI | T0067 | 2025-09-30 | prompt.txt | None | $0.2347 | $0.4603 | 229.72 | 450.42 | |
| mistral-medium-2508 | Mistral AI | T0173 | 2025-10-01 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0232 | $0.0454 | 66.29 | 129.99 | |
| gemini-2.5-flash-lite | T0205 | 2025-10-01 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0028 | $0.0055 | 24.16 | 47.38 | ||
| gpt-4o-mini | OpenAI | T0078 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.2447 | $0.4798 | 128.56 | 252.08 | |
| gpt-5-mini | OpenAI | T0113 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0868 | $0.1702 | 628.66 | 1232.66 | |
| gemini-2.5-flash-preview-09-2025 | T0222 | 2025-10-01 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0157 | $0.0307 | 167.65 | 328.72 | ||
| gemini-2.5-flash-lite-preview-09-2025 | T0212 | 2025-10-01 | prompt.txt | None | $0.0066 | $0.0132 | 50.73 | 101.46 | ||
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0238 | 2025-10-17 | prompt.txt | None | $0.0000 | $0.0000 | 71.75 | 143.50 | |
| gpt-4o | OpenAI | T0038 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.1094 | $0.2188 | 136.28 | 272.55 | |
| gpt-4.1 | OpenAI | T0068 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0905 | $0.1811 | 87.29 | 174.58 | |
| gpt-4.1-nano | OpenAI | T0074 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0081 | $0.0161 | 70.52 | 141.04 | |
| gpt-4.1 | OpenAI | T0069 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.1447 | $0.2893 | 139.77 | 279.53 | |
| gpt-4.1-nano | OpenAI | T0073 | 2025-09-30 | prompt.txt | None | $0.0217 | $0.0444 | 134.41 | 274.30 | |
| mistral-medium-2508 | Mistral AI | T0171 | 2025-10-01 | prompt.txt | None | $0.0601 | $0.1252 | 167.64 | 349.25 | |
| gemini-2.0-flash | T0045 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0063 | $0.0131 | 69.17 | 144.11 | ||
| gemini-2.5-flash-lite-preview-09-2025 | T0214 | 2025-10-01 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0040 | $0.0082 | 31.45 | 65.52 | ||
| mistral-medium-2505 | Mistral AI | T0172 | 2025-10-01 | prompt.txt | None | $0.0602 | $0.1282 | 163.77 | 348.45 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0248 | 2025-10-17 | prompt.txt | None | $0.0327 | $0.0696 | 183.79 | 391.03 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0249 | 2025-10-17 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0120 | $0.0255 | 54.29 | 115.50 | |
| gpt-4.1-nano | OpenAI | T0075 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0137 | $0.0291 | 88.44 | 188.18 | |
| gpt-5-nano | OpenAI | T0114 | 2025-09-30 | prompt.txt | None | $0.0652 | $0.1417 | 1002.37 | 2179.06 | |
| gemini-2.5-flash-lite | T0204 | 2025-10-01 | prompt.txt | None | $0.0068 | $0.0149 | 61.57 | 133.85 | ||
| gpt-5-nano | OpenAI | T0116 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0341 | $0.0741 | 465.75 | 1012.51 | |
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0240 | 2025-10-17 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0000 | $0.0000 | 33.01 | 71.76 | |
| gemini-1.5-pro | T0016 | 2025-04-11 | prompt.txt | None | N/A | N/A | 325.48 | 723.29 | ||
| mistral-large-latest | Mistral AI | T0188 | 2025-10-01 | prompt.txt | None | $0.2842 | $0.6316 | 168.93 | 375.40 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0250 | 2025-10-17 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0205 | $0.0456 | 70.88 | 157.50 | |
| claude-sonnet-4-5-20250929 | Anthropic | T0226 | 2025-10-01 | prompt.txt | None | $0.5785 | $1.3147 | 243.73 | 553.93 | |
| claude-opus-4-1-20250805 | Anthropic | T0118 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $1.0480 | $2.3818 | 101.89 | 231.56 | |
| claude-sonnet-4-5-20250929 | Anthropic | T0227 | 2025-10-01 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.2292 | $0.5209 | 102.12 | 232.08 | |
| gemini-2.5-flash-lite | T0206 | 2025-10-01 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0040 | $0.0092 | 34.99 | 79.53 | ||
| mistral-medium-2505 | Mistral AI | T0176 | 2025-10-01 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0371 | $0.0863 | 108.79 | 252.99 | |
| mistral-large-latest | Mistral AI | T0190 | 2025-10-01 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.1758 | $0.4088 | 108.78 | 252.98 | |
| claude-3-7-sonnet-20250219 | Anthropic | T0017 | 2025-09-30 | prompt.txt | None | $0.5398 | $1.2853 | 235.51 | 560.74 | |
| gemini-1.5-flash | T0048 | 2025-04-11 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | N/A | N/A | 113.85 | 271.08 | ||
| claude-3-7-sonnet-20250219 | Anthropic | T0025 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.3273 | $0.7794 | 140.90 | 335.48 | |
| mistral-medium-2508 | Mistral AI | T0175 | 2025-10-01 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0371 | $0.0883 | 100.16 | 238.48 | |
| claude-3-5-sonnet-20241022 | Anthropic | T0018 | 2025-09-30 | prompt.txt | None | $0.5403 | $1.3177 | 171.85 | 419.14 | |
| claude-3-7-sonnet-20250219 | Anthropic | T0024 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.2127 | $0.5188 | 109.17 | 266.27 | |
| claude-opus-4-20250514 | Anthropic | T0101 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $1.0565 | $2.5769 | 120.33 | 293.48 | |
| claude-3-5-sonnet-20241022 | Anthropic | T0052 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.2131 | $0.5326 | 77.08 | 192.71 | |
| claude-3-5-sonnet-20241022 | Anthropic | T0053 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.3281 | $0.8202 | 135.74 | 339.35 | |
| gemini-1.5-flash | T0015 | 2025-04-11 | prompt.txt | None | N/A | N/A | 291.55 | 747.57 | ||
| claude-sonnet-4-5-20250929 | Anthropic | T0228 | 2025-10-01 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.3493 | $0.9193 | 128.77 | 338.86 | |
| claude-opus-4-20250514 | Anthropic | T0100 | 2025-09-30 | prompt.txt | None | $2.6761 | $7.2328 | 282.63 | 763.87 | |
| pixtral-large-latest | Mistral AI | T0023 | 2025-09-30 | prompt.txt | None | $0.6883 | $1.9121 | 164.98 | 458.28 | |
| claude-opus-4-1-20250805 | Anthropic | T0117 | 2025-09-30 | prompt.txt | None | $2.6621 | $7.3947 | 238.52 | 662.57 | |
| pixtral-large-latest | Mistral AI | T0061 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.4348 | $1.2789 | 102.15 | 300.45 | |
| claude-sonnet-4-20250514 | Anthropic | T0105 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.3226 | $0.9487 | 113.88 | 334.95 | |
| gemini-1.5-flash | T0049 | 2025-04-11 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | N/A | N/A | 184.94 | 560.41 | ||
| claude-opus-4-1-20250805 | Anthropic | T0119 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $1.6173 | $5.0542 | 137.86 | 430.81 | |
| claude-sonnet-4-20250514 | Anthropic | T0103 | 2025-09-30 | prompt.txt | None | $0.5322 | $1.7169 | 211.51 | 682.28 | |
| pixtral-large-latest | Mistral AI | T0060 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.2545 | $0.8208 | 66.61 | 214.87 | |
| pixtral-12b | Mistral AI | T0184 | 2025-10-01 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0314 | $0.1014 | 65.18 | 210.27 | |
| claude-opus-4-20250514 | Anthropic | T0102 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $1.6251 | $5.4172 | 167.21 | 557.37 | |
| pixtral-12b | Mistral AI | T0182 | 2025-10-01 | prompt.txt | None | $0.0496 | $0.1710 | 109.70 | 378.27 | |
| claude-sonnet-4-20250514 | Anthropic | T0104 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.2107 | $0.7804 | 78.40 | 290.37 | |
| claude-3-opus-20240229 | Anthropic | T0036 | 2025-09-30 | prompt.txt | None | $2.6852 | $10.7406 | 304.65 | 1218.61 | |
| pixtral-12b | Mistral AI | T0183 | 2025-10-01 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0182 | $0.0726 | 43.94 | 175.76 | |
| claude-3-opus-20240229 | Anthropic | T0063 | 2025-09-30 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $1.6204 | $6.4818 | 172.46 | 689.82 | |
| claude-3-opus-20240229 | Anthropic | T0062 | 2025-09-30 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $1.0614 | $5.0543 | 128.92 | 613.89 | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0244 | 2025-10-17 | prompt.txt | {"skip_signatures": true, "skip_non_signatures": false} | $0.0337 | $0.1775 | 184.67 | 971.93 | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0245 | 2025-10-17 | prompt.txt | {"skip_signatures": false, "skip_non_signatures": true} | $0.0583 | $0.5834 | 312.08 | 3120.79 | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0243 | 2025-10-17 | prompt.txt | None | $0.0581 | $0.9691 | 267.65 | 4460.85 |
test_benchmark¶
| Model ↕ | Provider ↕ | Test ID ↕ | Date ↕ | Prompt ↕ | Rules ↕ | Results ↕ | Cost (USD) ↕ | Cost/Point ↕ | Test Time (s) ↕ | Time/Point ↕ |
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-4o | OpenAI | T0001 | 2025-09-30 | prompt.txt | None | N/A | $0.0045 | N/A | 7.66 | N/A |
| gemini-2.0-flash | T0002 | 2025-09-30 | prompt.txt | None | N/A | $0.0002 | N/A | 3.05 | N/A | |
| claude-3-5-sonnet-20241022 | Anthropic | T0003 | 2025-09-30 | prompt.txt | None | N/A | $0.0080 | N/A | 8.60 | N/A |
| gemini-2.5-flash | T0193 | 2025-09-30 | prompt.txt | None | N/A | $0.0012 | N/A | 19.04 | N/A | |
| gemini-2.5-flash-lite | T0201 | 2025-10-01 | prompt.txt | None | N/A | $0.0263 | N/A | 99.68 | N/A | |
| gemini-2.5-flash-lite-preview-09-2025 | T0209 | 2025-10-01 | prompt.txt | None | N/A | $0.0001 | N/A | 1.07 | N/A | |
| gemini-2.5-flash-preview-09-2025 | T0217 | 2025-10-01 | prompt.txt | None | N/A | $0.0008 | N/A | 7.19 | N/A |
test_benchmark2¶
| Model ↕ | Provider ↕ | Test ID ↕ | Date ↕ | Prompt ↕ | Rules ↕ | Results ↕ | Cost (USD) ↕ | Cost/Point ↕ | Test Time (s) ↕ | Time/Point ↕ |
|---|---|---|---|---|---|---|---|---|---|---|
| gpt-4o | OpenAI | T0004 | 2025-09-30 | a_prompt.txt | None | N/A | $0.0039 | N/A | 10.09 | N/A |
| gemini-2.0-flash | T0005 | 2025-09-30 | a_prompt.txt | None | N/A | $0.0002 | N/A | 2.58 | N/A | |
| claude-3-5-sonnet-20241022 | Anthropic | T0006 | 2025-09-30 | a_prompt.txt | None | N/A | $0.0069 | N/A | 8.17 | N/A |
| gemini-2.5-flash | T0194 | 2025-09-30 | a_prompt.txt | None | N/A | $0.0003 | N/A | 10.83 | N/A | |
| gemini-2.5-flash-lite | T0202 | 2025-10-01 | a_prompt.txt | None | N/A | $0.0001 | N/A | 1.54 | N/A | |
| gemini-2.5-flash-lite-preview-09-2025 | T0210 | 2025-10-01 | a_prompt.txt | None | N/A | $0.0001 | N/A | 0.98 | N/A | |
| gemini-2.5-flash-preview-09-2025 | T0218 | 2025-10-01 | a_prompt.txt | None | N/A | $0.0006 | N/A | 7.22 | N/A |
zettelkatalog¶
| Model ↕ | Provider ↕ | Test ID ↕ | Date ↕ | Prompt ↕ | Rules ↕ | Results ↕ | Cost (USD) ↕ | Cost/Point ↕ | Test Time (s) ↕ | Time/Point ↕ |
|---|---|---|---|---|---|---|---|---|---|---|
| claude-3-5-sonnet-20241022 | Anthropic | T0143 | 2025-10-01 | prompt.txt | None | $2.5000 | $2.8568 | 1577.11 | 1802.20 | |
| gpt-5 | OpenAI | T0165 | 2025-10-01 | prompt.txt | None | $7.1243 | $8.1870 | 21108.28 | 24256.89 | |
| gemini-2.5-pro | T0155 | 2025-10-01 | prompt.txt | None | $0.7095 | $0.8157 | 3744.41 | 4304.93 | ||
| gemini-2.5-flash-preview-09-2025 | T0224 | 2025-10-01 | prompt.txt | None | $0.3949 | $0.4591 | 2424.33 | 2818.30 | ||
| gemini-2.5-flash | T0200 | 2025-09-30 | prompt.txt | None | $0.1684 | $0.1965 | 3027.50 | 3531.85 | ||
| gpt-4.1 | OpenAI | T0160 | 2025-10-01 | prompt.txt | None | $2.6952 | $3.1509 | 12936.40 | 15123.99 | |
| claude-3-7-sonnet-20250219 | Anthropic | T0144 | 2025-10-01 | prompt.txt | None | $2.5778 | $3.0183 | 1529.23 | 1790.50 | |
| claude-sonnet-4-20250514 | Anthropic | T0148 | 2025-10-01 | prompt.txt | None | $2.5141 | $2.9870 | 1456.90 | 1730.95 | |
| gemini-2.0-flash | T0151 | 2025-10-01 | prompt.txt | None | $0.0795 | $0.0945 | 632.49 | 751.61 | ||
| o3 | OpenAI | T0168 | 2025-10-01 | prompt.txt | None | $2.5396 | $3.0453 | 9024.04 | 10821.21 | |
| claude-opus-4-1-20250805 | Anthropic | T0146 | 2025-10-01 | prompt.txt | None | $12.5676 | $15.2166 | 1581.36 | 1914.67 | |
| gpt-4o | OpenAI | T0066 | 2025-09-30 | prompt.txt | None | $1.3031 | $1.5799 | 3291.08 | 3989.93 | |
| gemini-2.0-flash-lite | T0152 | 2025-10-01 | prompt.txt | None | $0.0615 | $0.0757 | 618.19 | 761.16 | ||
| claude-sonnet-4-5-20250929 | Anthropic | T0230 | 2025-09-30 | prompt.txt | None | $2.7730 | $3.4395 | 1428.77 | 1772.17 | |
| gpt-5-mini | OpenAI | T0166 | 2025-10-01 | prompt.txt | None | $1.3158 | $1.6634 | 22744.78 | 28753.17 | |
| claude-opus-4-20250514 | Anthropic | T0147 | 2025-10-01 | prompt.txt | None | $12.7064 | $16.1835 | 1760.17 | 2241.84 | |
| mistral-medium-2508 | Mistral AI | T0179 | 2025-10-01 | prompt.txt | None | $0.2675 | $0.3413 | 934.75 | 1192.46 | |
| mistral-large-latest | Mistral AI | T0192 | 2025-10-01 | prompt.txt | None | $1.1730 | $1.5031 | 862.11 | 1104.69 | |
| pixtral-large-latest | Mistral AI | T0159 | 2025-10-01 | prompt.txt | None | $2.1044 | $2.7040 | 1298.95 | 1669.01 | |
| mistral-medium-2505 | Mistral AI | T0180 | 2025-10-01 | prompt.txt | None | $0.2675 | $0.3448 | 835.72 | 1077.34 | |
| gpt-5-nano | OpenAI | T0167 | 2025-10-01 | prompt.txt | None | $0.3448 | $0.4475 | 5049.08 | 6552.84 | |
| claude-3-opus-20240229 | Anthropic | T0145 | 2025-10-01 | prompt.txt | None | $13.6030 | $17.8355 | 2789.97 | 3658.04 | |
| gpt-4.1-mini | OpenAI | T0161 | 2025-10-02 | prompt.txt | None | N/A | N/A | 29647.30 | 39063.63 | |
| gemini-2.5-flash-lite | T0208 | 2025-10-01 | prompt.txt | None | $0.1044 | $0.1497 | 587.82 | 842.84 | ||
| gemini-2.5-flash-lite-preview-09-2025 | T0216 | 2025-10-01 | prompt.txt | None | $0.3252 | $0.4750 | 1152.22 | 1682.91 | ||
| gpt-4.1-nano | OpenAI | T0162 | 2025-10-02 | prompt.txt | None | N/A | N/A | 471.26 | 696.91 | |
| meta-llama/llama-4-maverick | Meta (via OpenRouter) | T0252 | 2025-10-17 | prompt.txt | None | $0.5506 | $0.8159 | 19155.46 | 28385.64 | |
| pixtral-12b | Mistral AI | T0186 | 2025-10-01 | prompt.txt | None | $0.1384 | $0.2274 | 596.77 | 980.26 | |
| GLM-4.5V-FP8 | Z.ai (via sciCORE) | T0242 | 2025-10-17 | prompt.txt | None | $0.0000 | $0.0000 | 44963.40 | 173666.98 | |
| qwen/qwen3-vl-8b-thinking | Alibaba (via OpenRouter) | T0247 | 2025-10-17 | prompt.txt | None | $0.1827 | $1.7703 | 1035.96 | 10037.63 | |
| gpt-4o-mini | OpenAI | T0164 | 2025-10-03 | prompt.txt | None | $0.0201 | $1.3640 | 1229.45 | 83295.51 |
About This Page¶
This benchmark suite is designed to test AI models on humanities data tasks. The tests run monthly and results are automatically updated.
For more details, visit the GitHub repository.