Skip to content

Humanities Data Benchmark

Welcome to the Humanities Data Benchmark report page. This page provides an overview of all benchmark tests, results, and comparisons.

Leaderboard

The table below shows the global average performance, cost efficiency, and time efficiency of each model across the four core benchmarks: bibliographic_data, fraktur, metadata_extraction, and zettelkatalog.

The Model and Provider columns identify each AI system. Global Average represents the mean performance score across all four benchmarks (higher is better). Cost/Point and Time/Point show normalized efficiency metrics calculated per test, averaged per benchmark, then averaged globally; this multi-level normalization accounts for different numbers of items, test configurations, and benchmark scales. For efficiency metrics, lower values are better, indicating less cost or time needed per performance point achieved. The four benchmark-specific columns show average performance for each individual benchmark. Only models with results in all four benchmarks are included. Click on any column header to sort the table.

Model ↕ Provider ↕ Global Average ↕ Cost/Point ↕ Time/Point ↕ bibliographic_data fraktur metadata_extraction zettelkatalog
gemini-2.5-proGoogle0.765$0.301537.65sfuzzyfuzzyf1_microf1_micro
gemini-2.5-flashGoogle0.752$0.071834.45sfuzzyfuzzyf1_microf1_micro
gemini-2.5-flash-preview-09-2025Google0.746$0.144422.32sfuzzyfuzzyf1_microf1_micro
gemini-2.0-flashGoogle0.678$0.031011.24sfuzzyfuzzyf1_microf1_micro
claude-3-7-sonnet-20250219Anthropic0.650$1.103729.21sfuzzyfuzzyf1_microf1_micro
gemini-2.0-flash-liteGoogle0.649$0.026016.70sfuzzyfuzzyf1_microf1_micro
gpt-4.1OpenAI0.646$0.935858.83sfuzzyfuzzyf1_microf1_micro
gpt-5-miniOpenAI0.621$0.501691.83sfuzzyfuzzyf1_microf1_micro
gpt-4oOpenAI0.617$0.572088.90sfuzzyfuzzyf1_microf1_micro
gpt-5OpenAI0.607$2.8010245.62sfuzzyfuzzyf1_microf1_micro
claude-3-5-sonnet-20241022Anthropic0.607$1.067826.77sfuzzyfuzzyf1_microf1_micro
mistral-large-latestMistral AI0.602$0.540534.47sfuzzyfuzzyf1_microf1_micro
claude-opus-4-1-20250805Anthropic0.598$5.794646.39sfuzzyfuzzyf1_microf1_micro
mistral-medium-2505Mistral AI0.591$0.127133.54sfuzzyfuzzyf1_microf1_micro
gemini-2.5-flash-liteGoogle0.585$0.04356.09sfuzzyfuzzyf1_microf1_micro
o3OpenAI0.574$1.2242174.26sfuzzyfuzzyf1_microf1_micro
gemini-2.5-flash-lite-preview-09-2025Google0.570$0.12586.99sfuzzyfuzzyf1_microf1_micro
mistral-medium-2508Mistral AI0.560$0.131184.27sfuzzyfuzzyf1_microf1_micro
claude-opus-4-20250514Anthropic0.548$6.199864.52sfuzzyfuzzyf1_microf1_micro
claude-sonnet-4-20250514Anthropic0.544$1.245155.68sfuzzyfuzzyf1_microf1_micro
meta-llama/llama-4-maverickMeta (via OpenRouter)0.516$0.223159.23sfuzzyfuzzyf1_microf1_micro
pixtral-large-latestMistral AI0.507$1.123140.36sfuzzyfuzzyf1_microf1_micro
gpt-4.1-miniOpenAI0.500$0.1127131.83sfuzzyfuzzyf1_microf1_micro
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)0.487$0.07121633.78sfuzzyfuzzyf1_microf1_micro
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)0.472$0.1840176.98sfuzzyfuzzyf1_microf1_micro
gpt-5-nanoOpenAI0.466$0.3687636.97sfuzzyfuzzyf1_microf1_micro
claude-3-opus-20240229Anthropic0.398$7.517389.44sfuzzyfuzzyf1_microf1_micro
gpt-4.1-nanoOpenAI0.371$0.021824.73sfuzzyfuzzyf1_microf1_micro
gpt-4o-miniOpenAI0.328$0.49494210.65sfuzzyfuzzyf1_microf1_micro
pixtral-12bMistral AI0.323$0.095299.43sfuzzyfuzzyf1_microf1_micro
GLM-4.5V-FP8Z.ai (via sciCORE)0.314$0.0000406.20sfuzzyfuzzyf1_microf1_micro
claude-sonnet-4-5-20250929Anthropic0.307$2.17898.28sfuzzyfuzzyf1_microf1_micro
x-ai/grok-4xAI (via OpenRouter)0.263$25.05041464.25sfuzzyfuzzyf1_microf1_micro
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)0.219$0.7975127.24sfuzzyfuzzyf1_microf1_micro

The following radar chart shows the performance distribution of top models across the four core benchmarks:

Radar Chart

Latest Benchmark Results

The tables below show detailed results for each benchmark, with each row representing a single test configuration run on the most recent date. The Model and Provider columns identify the AI system used. Each test has a unique Test ID (click to see full history) and shows the most recent execution Date. The Prompt and Rules columns indicate the configuration used. Results show the performance score (fuzzy match for bibliographic_data/fraktur, F1-micro for metadata_extraction/zettelkatalog; higher is better). Cost (USD) represents the total cost for processing all items in the test. Cost/Point shows cost efficiency ($/performance point; lower is better). Test Time (s) is the total execution time for all items. Time/Point shows time efficiency (seconds/performance point; lower is better).

bibliographic_data

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gemini-2.5-flash-preview-09-2025GoogleT02192025-10-01prompt.txtNonefuzzy$0.0307$0.0437116.65166.10
gpt-5OpenAIT01292025-10-01prompt.txtNonefuzzy$0.3421$0.4992591.90863.65
gpt-5-miniOpenAIT01302025-10-01prompt.txtNonefuzzy$0.0582$0.0860411.12607.56
gemini-2.5-flashGoogleT01952025-09-30prompt.txtNonefuzzy$0.0252$0.0376195.82292.59
claude-sonnet-4-20250514AnthropicT01072025-09-30prompt.txtNonefuzzy$0.1692$0.2531127.79191.16
o3OpenAIT01332025-10-01prompt.txtNonefuzzy$0.1885$0.2827391.04586.48
gemini-2.5-proGoogleT01282025-09-30prompt.txtNonefuzzy$0.1032$0.1554227.18342.25
mistral-medium-2505Mistral AIT01702025-10-01prompt.txtNonefuzzy$0.0222$0.0336128.32194.01
gpt-4.1OpenAIT01392025-10-01prompt.txtNonefuzzy$0.0952$0.1449298.94455.07
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02332025-10-17prompt.txtNonefuzzy$0.1268$0.1931923.121405.84
mistral-medium-2508Mistral AIT01692025-10-01prompt.txtNonefuzzy$0.0220$0.0336112.70172.31
claude-3-5-sonnet-20241022AnthropicT00092025-09-30prompt.txtNonefuzzy$0.1682$0.2576124.19190.17
gpt-4oOpenAIT00072025-09-30prompt.txtNonefuzzy$0.1136$0.1748350.22538.95
claude-3-7-sonnet-20250219AnthropicT00312025-09-30prompt.txtNonefuzzy$0.1765$0.2720136.48210.38
gpt-4.1-miniOpenAIT01402025-10-01prompt.txtNonefuzzy$0.0199$0.0307164.93254.41
mistral-large-latestMistral AIT01872025-10-01prompt.txtNonefuzzy$0.0805$0.1259136.28213.12
claude-opus-4-1-20250805AnthropicT01272025-09-30prompt.txtNonefuzzy$0.9735$1.5435203.32322.38
meta-llama/llama-4-maverickMeta (via OpenRouter)T02342025-10-17prompt.txtNonefuzzy$0.0062$0.0099151.02241.35
gemini-2.0-flashGoogleT00082025-09-30prompt.txtNonefuzzy$0.0052$0.008769.66115.32
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T02592025-10-20prompt.txtNonefuzzy$0.0050$0.0084115.49193.02
gpt-5-nanoOpenAIT01312025-10-01prompt.txtNonefuzzy$0.0281$0.0476401.62681.07
claude-opus-4-20250514AnthropicT01062025-09-30prompt.txtNonefuzzy$0.8992$1.5413193.49331.67
gemini-2.5-flash-lite-preview-09-2025GoogleT02112025-10-01prompt.txtNonefuzzy$0.0048$0.008318.6932.28
gemini-2.5-flash-liteGoogleT02032025-10-01prompt.txtNonefuzzy$0.0039$0.007219.3335.50
pixtral-large-latestMistral AIT00352025-09-30prompt.txtNonefuzzy$0.1079$0.2123199.57392.57
gpt-4o-miniOpenAIT00272025-09-30prompt.txtNonefuzzy$0.0261$0.0526233.18468.66
gemini-2.0-flash-liteGoogleT00332025-09-30prompt.txtNonefuzzy$0.0055$0.014090.14230.26
claude-3-opus-20240229AnthropicT01382025-10-01prompt.txtNonefuzzy$0.6830$1.8417237.58640.62
gpt-4.1-nanoOpenAIT01412025-10-01prompt.txtNonefuzzy$0.0044$0.0139105.83330.56
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T02532025-10-20prompt.txtNonefuzzy$0.0160$0.0518516.041674.49
x-ai/grok-4xAI (via OpenRouter)T02652025-10-20prompt.txtNonefuzzy$1.2925$4.83912024.037577.84
GLM-4.5V-FP8Z.ai (via sciCORE)T02372025-10-17prompt.txtNonefuzzy$0.0000$0.0000361.781560.25
gpt-4.5-previewOpenAIT00262025-04-08prompt.txtNonefuzzyN/AN/A480.162367.38
pixtral-12bMistral AIT01812025-10-01prompt.txtNonefuzzy$0.0036$0.019738.06207.04
gemini-1.5-proGoogleT00302025-04-08prompt.txtNonefuzzyN/AN/A88.55763.13
gemini-1.5-flashGoogleT00292025-04-08prompt.txtNonefuzzyN/AN/A62.39586.44
claude-sonnet-4-5-20250929AnthropicT02252025-10-01prompt.txtNonefuzzy$0.1636N/A121.63N/A

blacklist

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕

company_lists

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕

fraktur

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gemini-exp-1206GoogleT00872025-05-09prompt_optimized.txtNonefuzzyN/AN/A875.69908.39
gemini-2.5-proGoogleT01322025-10-01prompt_optimized.txtNonefuzzy$0.1068$0.1112244.01254.18
gemini-2.5-pro-exp-03-25GoogleT00222025-05-09prompt.txtNonefuzzyN/AN/A807.86855.79
gemini-2.5-pro-exp-03-25GoogleT00802025-05-09prompt_optimized.txtNonefuzzyN/AN/A799.79847.24
gemini-2.0-pro-exp-02-05GoogleT00912025-05-09prompt_optimized.txtNonefuzzyN/AN/A855.41906.15
gemini-2.5-pro-preview-05-06GoogleT00972025-05-09prompt_optimized.txtNonefuzzyN/AN/A891.10964.40
gemini-2.5-flashGoogleT01992025-09-30prompt_optimized.txtNonefuzzy$0.0253$0.0276243.38265.12
gemini-2.5-flash-preview-09-2025GoogleT02232025-10-01prompt_optimized.txtNonefuzzy$0.0287$0.0329157.75180.91
gemini-2.0-flash-liteGoogleT00902025-09-30prompt_optimized.txtNonefuzzy$0.0034$0.004160.4873.40
gemini-2.0-flashGoogleT00862025-09-30prompt_optimized.txtNonefuzzy$0.0044$0.006055.0375.39
claude-3-7-sonnet-20250219AnthropicT00922025-09-30prompt_optimized.txtNonefuzzy$0.1777$0.2605196.58288.25
gemini-1.5-proGoogleT00892025-05-09prompt_optimized.txtNonefuzzyN/AN/A123.51192.99
gemini-2.5-flash-liteGoogleT02072025-10-01prompt_optimized.txtNonefuzzy$0.0045$0.007237.2359.10
gemini-2.5-flash-preview-04-17GoogleT00962025-05-09prompt_optimized.txtNonefuzzyN/AN/A223.59370.18
gpt-4.1OpenAIT00832025-09-30prompt_optimized.txtNonefuzzy$0.0776$0.1362224.89394.54
claude-opus-4-1-20250805AnthropicT01232025-09-30prompt_optimized.txtNonefuzzy$0.9371$1.6615289.28512.90
gemini-1.5-flashGoogleT00882025-05-09prompt_optimized.txtNonefuzzyN/AN/A81.93148.43
mistral-large-latestMistral AIT01912025-10-01prompt_optimized.txtNonefuzzy$0.0683$0.1329217.88423.88
gemini-2.5-flash-lite-preview-09-2025GoogleT02152025-10-01prompt_optimized.txtNonefuzzy$0.0058$0.011433.9366.79
claude-3-5-sonnet-20241022AnthropicT00932025-09-30prompt_optimized.txtNonefuzzy$0.1305$0.2632133.05268.26
gpt-4oOpenAIT00792025-09-30prompt_optimized.txtNonefuzzy$0.0790$0.1659516.441084.96
gpt-5-miniOpenAIT01212025-09-30prompt_optimized.txtNonefuzzy$0.0353$0.0746243.61513.94
claude-opus-4-20250514AnthropicT00982025-09-30prompt_optimized.txtNonefuzzy$0.9807$2.1135392.86846.67
mistral-medium-2505Mistral AIT01782025-10-01prompt_optimized.txtNonefuzzy$0.0211$0.0490183.12425.87
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T02572025-10-20prompt_optimized.txtNonefuzzy$0.0174$0.0413593.011405.23
pixtral-large-latestMistral AIT00952025-09-30prompt_optimized.txtNonefuzzy$0.0861$0.2121137.85339.53
claude-sonnet-4-20250514AnthropicT00992025-09-30prompt_optimized.txtNonefuzzy$0.2083$0.5820298.19832.94
mistral-medium-2508Mistral AIT01772025-10-01prompt_optimized.txtNonefuzzy$0.0213$0.0642484.571459.55
meta-llama/llama-4-maverickMeta (via OpenRouter)T02512025-10-17prompt_optimized.txtNonefuzzy$0.0059$0.0196112.89376.30
gpt-4o-miniOpenAIT00822025-09-30prompt_optimized.txtNonefuzzy$0.0223$0.0834110.91413.85
GLM-4.5V-FP8Z.ai (via sciCORE)T02412025-10-17prompt_optimized.txtNonefuzzy$0.0000$0.0000675.502659.44
claude-3-opus-20240229AnthropicT00942025-09-30prompt_optimized.txtNonefuzzy$0.6288$2.8326214.63966.82
pixtral-12bMistral AIT01852025-10-01prompt_optimized.txtNonefuzzy$0.0037$0.0171376.971729.20
gpt-4.5-previewOpenAIT00812025-05-09prompt_optimized.txtNonefuzzyN/AN/A224.021349.51
gpt-5OpenAIT01202025-09-30prompt_optimized.txtNonefuzzy$0.2036$1.3394493.973249.77
o3OpenAIT01372025-10-01prompt_optimized.txtNonefuzzy$0.1485$1.0606357.902556.41
gpt-4.1-miniOpenAIT00842025-09-30prompt_optimized.txtNonefuzzy$0.0106$0.230674.241613.84
gpt-5-nanoOpenAIT01222025-09-30prompt_optimized.txtNonefuzzy$0.0054$0.893070.6111768.32
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T02632025-10-20prompt_optimized.txtNonefuzzy$0.0012$0.1938194.6332438.90
gpt-4.1-nanoOpenAIT00852025-09-30prompt_optimized.txtNonefuzzy$0.0013N/A11.57N/A
claude-sonnet-4-5-20250929AnthropicT02292025-10-01prompt_optimized.txtNonefuzzy$0.2178N/A301.59N/A
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02462025-10-17prompt_optimized.txtNonefuzzy$0.0048N/A47.09N/A
x-ai/grok-4xAI (via OpenRouter)T02692025-10-20prompt_optimized.txtNonefuzzy$0.6505N/A2896.21N/A

medieval_manuscripts

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
claude-3-7-sonnet-20250219AnthropicT02742025-10-21prompt.txtNonefuzzy$0.0573$0.079743.5060.50
gpt-4.1OpenAIT02732025-10-21prompt.txtNonefuzzy$0.0251$0.036454.4278.98
gemini-2.5-flashGoogleT02712025-10-21prompt.txtNonefuzzy$0.0044$0.006548.1371.30
gemini-2.5-proGoogleT02722025-10-21prompt.txtNonefuzzy$0.0181$0.0268112.99167.39

metadata_extraction

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gpt-5OpenAIT01092025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.4823$0.61831030.301320.90
o3OpenAIT01352025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1855$0.2541394.51540.43
gpt-5OpenAIT01082025-09-30prompt.txtNonef1_micro$1.2982$1.82852922.294115.90
gpt-5OpenAIT01102025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.7651$1.14191711.522554.51
o3OpenAIT01342025-10-01prompt.txtNonef1_micro$0.5053$0.80211038.491648.39
gemini-2.0-flash-liteGoogleT00562025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0042$0.006645.7772.65
gemini-2.5-flashGoogleT01972025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0087$0.0139195.57310.43
gemini-2.5-proGoogleT01252025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0426$0.0687203.80328.71
gpt-4.5-previewOpenAIT00112025-04-11prompt.txtNonef1_microN/AN/A960.761575.02
gemini-2.0-flashGoogleT00442025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0055$0.009051.7284.78
gpt-4.5-previewOpenAIT00402025-04-11prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_microN/AN/A373.44622.41
gemini-2.5-flash-preview-09-2025GoogleT02212025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0108$0.0180112.91188.19
gpt-4.5-previewOpenAIT00412025-04-11prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_microN/AN/A526.13876.89
o3OpenAIT01362025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3052$0.5087624.761041.27
mistral-medium-2505Mistral AIT01742025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0231$0.039269.80118.30
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T02612025-10-20prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0046$0.007762.40105.76
gemini-exp-1206GoogleT00142025-04-11prompt.txtNonef1_microN/AN/A894.641542.48
gemini-2.5-pro-exp-03-25GoogleT00192025-04-01prompt.txtNonef1_microN/AN/A877.021512.10
gpt-5-miniOpenAIT01122025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0622$0.1072413.46712.86
gemini-2.0-pro-exp-02-05GoogleT00212025-04-01prompt.txtNonef1_microN/AN/A856.031501.80
gpt-5-nanoOpenAIT01152025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0299$0.0524429.53753.56
GLM-4.5V-FP8Z.ai (via sciCORE)T02392025-10-17prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0000$0.000022.1138.79
gpt-4.1-miniOpenAIT00702025-09-30prompt.txtNonef1_micro$0.0630$0.1124152.75272.78
gpt-4o-miniOpenAIT00122025-09-30prompt.txtNonef1_micro$0.3814$0.6935211.44384.43
gemini-2.5-proGoogleT01242025-09-30prompt.txtNonef1_micro$0.1057$0.1921493.26896.84
gemini-2.5-flash-lite-preview-09-2025GoogleT02132025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0026$0.004820.7037.64
gemini-2.0-flash-liteGoogleT00202025-09-30prompt.txtNonef1_micro$0.0089$0.0165105.24194.88
gpt-4o-miniOpenAIT00762025-09-30prompt.txtNonef1_micro$0.3815$0.7064217.37402.54
gpt-5-miniOpenAIT01112025-09-30prompt.txtNonef1_micro$0.1486$0.27521069.711980.95
gemini-2.5-flashGoogleT01962025-09-30prompt.txtNonef1_micro$0.0217$0.0403439.55813.98
gemini-2.5-flash-preview-09-2025GoogleT02202025-10-01prompt.txtNonef1_micro$0.0427$0.0791307.46569.36
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T02602025-10-20prompt.txtNonef1_micro$0.0117$0.0216131.71243.91
gpt-4.1-miniOpenAIT00712025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0238$0.044075.20139.26
mistral-large-latestMistral AIT01892025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1082$0.200464.28119.03
gpt-4.1-miniOpenAIT00722025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0393$0.072786.89160.91
gpt-4oOpenAIT00102025-09-30prompt.txtNonef1_micro$0.2844$0.5367432.47815.98
gpt-4o-miniOpenAIT00422025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1368$0.258086.90163.97
gpt-4o-miniOpenAIT00772025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1368$0.258084.87160.14
gemini-2.0-flash-liteGoogleT00572025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0047$0.008955.63104.95
gemini-2.5-proGoogleT01262025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0617$0.1164275.31519.45
gemini-2.0-flashGoogleT00132025-09-30prompt.txtNonef1_micro$0.0118$0.0227121.45233.56
gpt-4oOpenAIT00392025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.1756$0.3376358.87690.14
gpt-4o-miniOpenAIT00432025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.2447$0.4706120.99232.67
gemini-2.5-flashGoogleT01982025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0130$0.0250196.67378.22
gpt-4.1OpenAIT00672025-09-30prompt.txtNonef1_micro$0.2347$0.4603229.72450.42
mistral-medium-2508Mistral AIT01732025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0232$0.045466.29129.99
gemini-2.5-flash-liteGoogleT02052025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0028$0.005524.1647.38
gpt-4o-miniOpenAIT00782025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.2447$0.4798128.56252.08
gpt-5-miniOpenAIT01132025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0868$0.1702628.661232.66
gemini-2.5-flash-preview-09-2025GoogleT02222025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0157$0.0307167.65328.72
gemini-2.5-flash-lite-preview-09-2025GoogleT02122025-10-01prompt.txtNonef1_micro$0.0066$0.013250.73101.46
GLM-4.5V-FP8Z.ai (via sciCORE)T02382025-10-17prompt.txtNonef1_micro$0.0000$0.000071.75143.50
gpt-4oOpenAIT00382025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1094$0.2188136.28272.55
gpt-4.1OpenAIT00682025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0905$0.181187.29174.58
gpt-4.1-nanoOpenAIT00742025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0081$0.016170.52141.04
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T02552025-10-20prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0092$0.018456.40112.80
gpt-4.1OpenAIT00692025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.1447$0.2893139.77279.53
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T02622025-10-20prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0066$0.013196.53193.06
gpt-4.1-nanoOpenAIT00732025-09-30prompt.txtNonef1_micro$0.0217$0.0444134.41274.30
mistral-medium-2508Mistral AIT01712025-10-01prompt.txtNonef1_micro$0.0601$0.1252167.64349.25
gemini-2.0-flashGoogleT00452025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0063$0.013169.17144.11
gemini-2.5-flash-lite-preview-09-2025GoogleT02142025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0040$0.008231.4565.52
mistral-medium-2505Mistral AIT01722025-10-01prompt.txtNonef1_micro$0.0602$0.1282163.77348.45
meta-llama/llama-4-maverickMeta (via OpenRouter)T02482025-10-17prompt.txtNonef1_micro$0.0327$0.0696183.79391.03
meta-llama/llama-4-maverickMeta (via OpenRouter)T02492025-10-17prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0120$0.025554.29115.50
gpt-4.1-nanoOpenAIT00752025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0137$0.029188.44188.18
gpt-5-nanoOpenAIT01142025-09-30prompt.txtNonef1_micro$0.0652$0.14171002.372179.06
gemini-2.5-flash-liteGoogleT02042025-10-01prompt.txtNonef1_micro$0.0068$0.014961.57133.85
gpt-5-nanoOpenAIT01162025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0341$0.0741465.751012.51
GLM-4.5V-FP8Z.ai (via sciCORE)T02402025-10-17prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0000$0.000033.0171.76
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T02562025-10-20prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0145$0.031587.88191.05
gemini-1.5-proGoogleT00162025-04-11prompt.txtNonef1_microN/AN/A325.48723.29
mistral-large-latestMistral AIT01882025-10-01prompt.txtNonef1_micro$0.2842$0.6316168.93375.40
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T02542025-10-20prompt.txtNonef1_micro$0.0227$0.0505147.26327.23
meta-llama/llama-4-maverickMeta (via OpenRouter)T02502025-10-17prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0205$0.045670.88157.50
claude-sonnet-4-5-20250929AnthropicT02262025-10-01prompt.txtNonef1_micro$0.5785$1.3147243.73553.93
claude-opus-4-1-20250805AnthropicT01182025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$1.0480$2.3818101.89231.56
claude-sonnet-4-5-20250929AnthropicT02272025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2292$0.5209102.12232.08
gemini-2.5-flash-liteGoogleT02062025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0040$0.009234.9979.53
mistral-medium-2505Mistral AIT01762025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0371$0.0863108.79252.99
mistral-large-latestMistral AIT01902025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.1758$0.4088108.78252.98
claude-3-7-sonnet-20250219AnthropicT00172025-09-30prompt.txtNonef1_micro$0.5398$1.2853235.51560.74
gemini-1.5-flashGoogleT00482025-04-11prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_microN/AN/A113.85271.08
claude-3-7-sonnet-20250219AnthropicT00252025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3273$0.7794140.90335.48
mistral-medium-2508Mistral AIT01752025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0371$0.0883100.16238.48
claude-3-5-sonnet-20241022AnthropicT00182025-09-30prompt.txtNonef1_micro$0.5403$1.3177171.85419.14
claude-3-7-sonnet-20250219AnthropicT00242025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2127$0.5188109.17266.27
claude-opus-4-20250514AnthropicT01012025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$1.0565$2.5769120.33293.48
claude-3-5-sonnet-20241022AnthropicT00522025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2131$0.532677.08192.71
claude-3-5-sonnet-20241022AnthropicT00532025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3281$0.8202135.74339.35
gemini-1.5-flashGoogleT00152025-04-11prompt.txtNonef1_microN/AN/A291.55747.57
claude-sonnet-4-5-20250929AnthropicT02282025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3493$0.9193128.77338.86
claude-opus-4-20250514AnthropicT01002025-09-30prompt.txtNonef1_micro$2.6761$7.2328282.63763.87
pixtral-large-latestMistral AIT00232025-09-30prompt.txtNonef1_micro$0.6883$1.9121164.98458.28
claude-opus-4-1-20250805AnthropicT01172025-09-30prompt.txtNonef1_micro$2.6621$7.3947238.52662.57
pixtral-large-latestMistral AIT00612025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.4348$1.2789102.15300.45
claude-sonnet-4-20250514AnthropicT01052025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3226$0.9487113.88334.95
gemini-1.5-flashGoogleT00492025-04-11prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_microN/AN/A184.94560.41
claude-opus-4-1-20250805AnthropicT01192025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$1.6173$5.0542137.86430.81
claude-sonnet-4-20250514AnthropicT01032025-09-30prompt.txtNonef1_micro$0.5322$1.7169211.51682.28
pixtral-large-latestMistral AIT00602025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2545$0.820866.61214.87
pixtral-12bMistral AIT01842025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0314$0.101465.18210.27
claude-opus-4-20250514AnthropicT01022025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$1.6251$5.4172167.21557.37
pixtral-12bMistral AIT01822025-10-01prompt.txtNonef1_micro$0.0496$0.1710109.70378.27
claude-sonnet-4-20250514AnthropicT01042025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2107$0.780478.40290.37
claude-3-opus-20240229AnthropicT00362025-09-30prompt.txtNonef1_micro$2.6852$10.7406304.651218.61
pixtral-12bMistral AIT01832025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0182$0.072643.94175.76
claude-3-opus-20240229AnthropicT00632025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$1.6204$6.4818172.46689.82
claude-3-opus-20240229AnthropicT00622025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$1.0614$5.0543128.92613.89
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02442025-10-17prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0337$0.1775184.67971.93
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02452025-10-17prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0583$0.5834312.083120.79
x-ai/grok-4xAI (via OpenRouter)T02682025-10-20prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$1.6650$23.78582638.8037697.19
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02432025-10-17prompt.txtNonef1_micro$0.0581$0.9691267.654460.85
x-ai/grok-4xAI (via OpenRouter)T02662025-10-20prompt.txtNonef1_micro$3.0152$100.50784888.20162940.09
x-ai/grok-4xAI (via OpenRouter)T02672025-10-20prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$1.4549$72.74454039.29201964.64

test_benchmark

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gpt-4oOpenAIT00012025-09-30prompt.txtNoneN/A$0.0045N/A7.66N/A
gemini-2.0-flashGoogleT00022025-09-30prompt.txtNoneN/A$0.0002N/A3.05N/A
claude-3-5-sonnet-20241022AnthropicT00032025-09-30prompt.txtNoneN/A$0.0080N/A8.60N/A
gemini-2.5-flashGoogleT01932025-09-30prompt.txtNoneN/A$0.0012N/A19.04N/A
gemini-2.5-flash-liteGoogleT02012025-10-01prompt.txtNoneN/A$0.0263N/A99.68N/A
gemini-2.5-flash-lite-preview-09-2025GoogleT02092025-10-01prompt.txtNoneN/A$0.0001N/A1.07N/A
gemini-2.5-flash-preview-09-2025GoogleT02172025-10-01prompt.txtNoneN/A$0.0008N/A7.19N/A

test_benchmark2

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gpt-4oOpenAIT00042025-09-30a_prompt.txtNoneN/A$0.0039N/A10.09N/A
gemini-2.0-flashGoogleT00052025-09-30a_prompt.txtNoneN/A$0.0002N/A2.58N/A
claude-3-5-sonnet-20241022AnthropicT00062025-09-30a_prompt.txtNoneN/A$0.0069N/A8.17N/A
gemini-2.5-flashGoogleT01942025-09-30a_prompt.txtNoneN/A$0.0003N/A10.83N/A
gemini-2.5-flash-liteGoogleT02022025-10-01a_prompt.txtNoneN/A$0.0001N/A1.54N/A
gemini-2.5-flash-lite-preview-09-2025GoogleT02102025-10-01a_prompt.txtNoneN/A$0.0001N/A0.98N/A
gemini-2.5-flash-preview-09-2025GoogleT02182025-10-01a_prompt.txtNoneN/A$0.0006N/A7.22N/A

zettelkatalog

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
claude-3-5-sonnet-20241022AnthropicT01432025-10-01prompt.txtNonef1_micro$2.5000$2.85681577.111802.20
gpt-5OpenAIT01652025-10-01prompt.txtNonef1_micro$7.1243$8.187021108.2824256.89
gemini-2.5-proGoogleT01552025-10-01prompt.txtNonef1_micro$0.7095$0.81573744.414304.93
gemini-2.5-flash-preview-09-2025GoogleT02242025-10-01prompt.txtNonef1_micro$0.3949$0.45912424.332818.30
gemini-2.5-flashGoogleT02002025-09-30prompt.txtNonef1_micro$0.1684$0.19653027.503531.85
gpt-4.1OpenAIT01602025-10-01prompt.txtNonef1_micro$2.6952$3.150912936.4015123.99
claude-3-7-sonnet-20250219AnthropicT01442025-10-01prompt.txtNonef1_micro$2.5778$3.01831529.231790.50
claude-sonnet-4-20250514AnthropicT01482025-10-01prompt.txtNonef1_micro$2.5141$2.98701456.901730.95
gemini-2.0-flashGoogleT01512025-10-01prompt.txtNonef1_micro$0.0795$0.0945632.49751.61
o3OpenAIT01682025-10-01prompt.txtNonef1_micro$2.5396$3.04539024.0410821.21
claude-opus-4-1-20250805AnthropicT01462025-10-01prompt.txtNonef1_micro$12.5676$15.21661581.361914.67
gpt-4oOpenAIT00662025-09-30prompt.txtNonef1_micro$1.3031$1.57993291.083989.93
gemini-2.0-flash-liteGoogleT01522025-10-01prompt.txtNonef1_micro$0.0615$0.0757618.19761.16
claude-sonnet-4-5-20250929AnthropicT02302025-09-30prompt.txtNonef1_micro$2.7730$3.43951428.771772.17
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T02642025-10-20prompt.txtNonef1_micro$0.0549$0.0687829.981039.52
gpt-5-miniOpenAIT01662025-10-01prompt.txtNonef1_micro$1.3158$1.663422744.7828753.17
claude-opus-4-20250514AnthropicT01472025-10-01prompt.txtNonef1_micro$12.7064$16.18351760.172241.84
mistral-medium-2508Mistral AIT01792025-10-01prompt.txtNonef1_micro$0.2675$0.3413934.751192.46
mistral-large-latestMistral AIT01922025-10-01prompt.txtNonef1_micro$1.1730$1.5031862.111104.69
pixtral-large-latestMistral AIT01592025-10-01prompt.txtNonef1_micro$2.1044$2.70401298.951669.01
mistral-medium-2505Mistral AIT01802025-10-01prompt.txtNonef1_micro$0.2675$0.3448835.721077.34
gpt-5-nanoOpenAIT01672025-10-01prompt.txtNonef1_micro$0.3448$0.44755049.086552.84
claude-3-opus-20240229AnthropicT01452025-10-01prompt.txtNonef1_micro$13.6030$17.83552789.973658.04
gpt-4.1-miniOpenAIT01612025-10-02prompt.txtNonef1_microN/AN/A29647.3039063.63
x-ai/grok-4xAI (via OpenRouter)T02702025-10-20prompt.txtNonef1_micro$14.3280$19.186025318.4533902.87
gemini-2.5-flash-liteGoogleT02082025-10-01prompt.txtNonef1_micro$0.1044$0.1497587.82842.84
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T02582025-10-20prompt.txtNonef1_micro$0.4187$0.610115627.9322775.35
gemini-2.5-flash-lite-preview-09-2025GoogleT02162025-10-01prompt.txtNonef1_micro$0.3252$0.47501152.221682.91
gpt-4.1-nanoOpenAIT01622025-10-02prompt.txtNonef1_microN/AN/A471.26696.91
meta-llama/llama-4-maverickMeta (via OpenRouter)T02522025-10-17prompt.txtNonef1_micro$0.5506$0.815919155.4628385.64
pixtral-12bMistral AIT01862025-10-01prompt.txtNonef1_micro$0.1384$0.2274596.77980.26
GLM-4.5V-FP8Z.ai (via sciCORE)T02422025-10-17prompt.txtNonef1_micro$0.0000$0.000044963.40173666.98
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02472025-10-17prompt.txtNonef1_micro$0.1827$1.77031035.9610037.63
gpt-4o-miniOpenAIT01642025-10-03prompt.txtNonef1_micro$0.0201$1.36401229.4583295.51

About This Page

This benchmark suite is designed to test AI models on humanities data tasks. The tests run monthly and results are automatically updated.

For more details, visit the GitHub repository.