Skip to content

Humanities Data Benchmark

Welcome to the Humanities Data Benchmark report page. This page provides an overview of all benchmark tests, results, and comparisons.

Leaderboard

The table below shows the global average performance, cost efficiency, and time efficiency of each model across the seven core benchmarks: bibliographic_data, blacklist, company_lists, fraktur, medieval_manuscripts, metadata_extraction, and zettelkatalog.

The Model and Provider columns identify each AI system. Global Average represents the mean performance score across all seven benchmarks (higher is better). Cost/Point and Time/Point show normalized efficiency metrics calculated per test, averaged per benchmark, then averaged globally; this multi-level normalization accounts for different numbers of items, test configurations, and benchmark scales. For efficiency metrics, lower values are better, indicating less cost or time needed per performance point achieved. The seven benchmark-specific columns show average performance for each individual benchmark. Only models with results in all seven benchmarks are included. Click on any column header to sort the table.

Model ↕ Provider ↕ Global Average ↕ Cost/Point ↕ Time/Point ↕ bibliographic_data blacklist company_lists fraktur medieval_manuscripts metadata_extraction zettelkatalog
gemini-2.5-proGoogle0.739$0.234737.58sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gemini-2.5-flash-preview-09-2025Google0.728$0.125022.63sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gemini-2.5-flashGoogle0.718$0.058229.67sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gemini-2.0-flashGoogle0.659$0.024511.85sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
claude-3-7-sonnet-20250219Anthropic0.650$0.815824.77sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gemini-2.0-flash-liteGoogle0.649$0.019713.05sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gpt-4.1OpenAI0.646$0.603244.79sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gpt-5-miniOpenAI0.643$0.338582.07sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gpt-4oOpenAI0.629$0.407463.44sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
claude-opus-4-1-20250805Anthropic0.629$4.204443.15sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
mistral-large-latestMistral AI0.617$0.385228.29sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
mistral-medium-2505Mistral AI0.605$0.094334.23sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gpt-5OpenAI0.603$2.0520246.02sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
mistral-medium-2508Mistral AI0.597$0.096057.37sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gpt-4.1-miniOpenAI0.596$0.072979.96sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
claude-3-5-sonnet-20241022Anthropic0.595$0.816623.59sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
o3OpenAI0.594$0.9085130.09sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gemini-2.5-flash-lite-preview-09-2025Google0.588$0.07635.45sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
claude-opus-4-20250514Anthropic0.586$4.463950.88sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gemini-2.5-flash-liteGoogle0.579$0.036611.40sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
claude-sonnet-4-20250514Anthropic0.554$0.920342.71sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
meta-llama/llama-4-maverickMeta (via OpenRouter)0.544$0.136244.80sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
pixtral-large-latestMistral AI0.502$0.841940.18sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gpt-5-nanoOpenAI0.450$0.2380401.68sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
claude-sonnet-4-5-20250929Anthropic0.433$1.171019.49sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gpt-4.1-nanoOpenAI0.430$0.016217.80sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
gpt-4o-miniOpenAI0.416$0.35562419.74sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
claude-3-opus-20240229Anthropic0.395$6.525298.10sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
pixtral-12bMistral AI0.374$0.068961.59sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
GLM-4.5V-FP8Z.ai (via sciCORE)0.283$0.0000326.11sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)0.171$0.6134220.84sfuzzyfuzzyf1_microfuzzyfuzzyf1_microf1_micro

The following radar chart shows the performance distribution of top models across the seven core benchmarks:

Radar Chart

Latest Benchmark Results

The tables below show detailed results for each benchmark, with each row representing a single test configuration run on the most recent date. The Model and Provider columns identify the AI system used. Each test has a unique Test ID (click to see full history) and shows the most recent execution Date. The Prompt and Rules columns indicate the configuration used. Results show the performance score (fuzzy match for bibliographic_data/fraktur, F1-micro for metadata_extraction/zettelkatalog; higher is better). Cost (USD) represents the total cost for processing all items in the test. Cost/Point shows cost efficiency ($/performance point; lower is better). Test Time (s) is the total execution time for all items. Time/Point shows time efficiency (seconds/performance point; lower is better).

bibliographic_data

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gemini-2.5-flash-preview-09-2025GoogleT02192025-10-01prompt.txtNonefuzzy$0.0307$0.0437116.65166.10
gpt-5OpenAIT01292025-10-01prompt.txtNonefuzzy$0.3421$0.4992591.90863.65
gpt-5-miniOpenAIT01302025-10-01prompt.txtNonefuzzy$0.0582$0.0860411.12607.56
gemini-2.5-flashGoogleT01952025-09-30prompt.txtNonefuzzy$0.0252$0.0376195.82292.59
claude-sonnet-4-20250514AnthropicT01072025-09-30prompt.txtNonefuzzy$0.1692$0.2531127.79191.16
o3OpenAIT01332025-10-01prompt.txtNonefuzzy$0.1885$0.2827391.04586.48
gemini-2.5-proGoogleT01282025-09-30prompt.txtNonefuzzy$0.1032$0.1554227.18342.25
mistral-medium-2505Mistral AIT01702025-10-01prompt.txtNonefuzzy$0.0222$0.0336128.32194.01
gpt-4.1OpenAIT01392025-10-01prompt.txtNonefuzzy$0.0952$0.1449298.94455.07
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02332025-10-17prompt.txtNonefuzzy$0.1268$0.1931923.121405.84
mistral-medium-2508Mistral AIT01692025-10-01prompt.txtNonefuzzy$0.0220$0.0336112.70172.31
claude-3-5-sonnet-20241022AnthropicT00092025-09-30prompt.txtNonefuzzy$0.1682$0.2576124.19190.17
gpt-4oOpenAIT00072025-09-30prompt.txtNonefuzzy$0.1136$0.1748350.22538.95
claude-3-7-sonnet-20250219AnthropicT00312025-09-30prompt.txtNonefuzzy$0.1765$0.2720136.48210.38
gpt-4.1-miniOpenAIT01402025-10-01prompt.txtNonefuzzy$0.0199$0.0307164.93254.41
mistral-large-latestMistral AIT01872025-10-01prompt.txtNonefuzzy$0.0805$0.1259136.28213.12
claude-opus-4-1-20250805AnthropicT01272025-09-30prompt.txtNonefuzzy$0.9735$1.5435203.32322.38
meta-llama/llama-4-maverickMeta (via OpenRouter)T02342025-10-17prompt.txtNonefuzzy$0.0062$0.0099151.02241.35
gemini-2.0-flashGoogleT00082025-09-30prompt.txtNonefuzzy$0.0052$0.008769.66115.32
gpt-5-nanoOpenAIT01312025-10-01prompt.txtNonefuzzy$0.0281$0.0476401.62681.07
claude-opus-4-20250514AnthropicT01062025-09-30prompt.txtNonefuzzy$0.8992$1.5413193.49331.67
gemini-2.5-flash-lite-preview-09-2025GoogleT02112025-10-01prompt.txtNonefuzzy$0.0048$0.008318.6932.28
gemini-2.5-flash-liteGoogleT02032025-10-01prompt.txtNonefuzzy$0.0039$0.007219.3335.50
pixtral-large-latestMistral AIT00352025-09-30prompt.txtNonefuzzy$0.1079$0.2123199.57392.57
gpt-4o-miniOpenAIT00272025-09-30prompt.txtNonefuzzy$0.0261$0.0526233.18468.66
gemini-2.0-flash-liteGoogleT00332025-09-30prompt.txtNonefuzzy$0.0055$0.014090.14230.26
claude-3-opus-20240229AnthropicT01382025-10-01prompt.txtNonefuzzy$0.6830$1.8417237.58640.62
gpt-4.1-nanoOpenAIT01412025-10-01prompt.txtNonefuzzy$0.0044$0.0139105.83330.56
GLM-4.5V-FP8Z.ai (via sciCORE)T02372025-10-17prompt.txtNonefuzzy$0.0000$0.0000361.781560.25
gpt-4.5-previewOpenAIT00262025-04-08prompt.txtNonefuzzyN/AN/A480.162367.38
pixtral-12bMistral AIT01812025-10-01prompt.txtNonefuzzy$0.0036$0.019738.06207.04
gemini-1.5-proGoogleT00302025-04-08prompt.txtNonefuzzyN/AN/A88.55763.13
gemini-1.5-flashGoogleT00292025-04-08prompt.txtNonefuzzyN/AN/A62.39586.44
claude-sonnet-4-5-20250929AnthropicT02252025-10-01prompt.txtNonefuzzy$0.1636N/A121.63N/A

blacklist

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gpt-4.1-miniOpenAIT03072025-10-24prompt.txtNonefuzzy$0.0274$0.028777.5181.03
gemini-2.5-flash-preview-09-2025GoogleT02312025-10-24prompt.txtNonefuzzy$0.0188$0.0198188.45197.72
gpt-4oOpenAIT03052025-10-24prompt.txtNonefuzzy$0.1336$0.1433186.39199.98
gpt-4.1OpenAIT02322025-10-24prompt.txtNonefuzzy$0.1125$0.1219185.64201.16
gemini-2.5-proGoogleT03162025-10-24prompt.txtNonefuzzy$0.0583$0.0633793.69861.99
gemini-2.0-flash-liteGoogleT03142025-10-24prompt.txtNonefuzzy$0.0116$0.0128145.10159.49
gemini-2.0-flashGoogleT03132025-10-24prompt.txtNonefuzzy$0.0123$0.0135145.32159.96
gpt-5OpenAIT03092025-10-24prompt.txtNonefuzzy$0.5062$0.55871640.521810.67
x-ai/grok-4xAI (via OpenRouter)T03362025-10-24prompt.txtNonefuzzy$0.9520$1.05721496.671662.16
mistral-medium-2508Mistral AIT03272025-10-24prompt.txtNonefuzzy$0.0306$0.0342170.53190.87
claude-opus-4-1-20250805AnthropicT03242025-10-24prompt.txtNonefuzzy$1.4416$1.6265304.49343.54
mistral-medium-2505Mistral AIT03282025-10-24prompt.txtNonefuzzy$0.0306$0.0348177.77202.24
gemini-2.5-flashGoogleT03152025-10-24prompt.txtNonefuzzy$0.0139$0.0158216.33246.30
mistral-large-latestMistral AIT03302025-10-24prompt.txtNonefuzzy$0.1344$0.1554217.12251.03
claude-3-7-sonnet-20250219AnthropicT03202025-10-24prompt.txtNonefuzzy$0.2945$0.3414158.36183.55
gpt-5-miniOpenAIT03102025-10-24prompt.txtNonefuzzy$0.0844$0.0984768.66896.62
gemini-2.5-flash-lite-preview-09-2025GoogleT03182025-10-24prompt.txtNonefuzzy$0.0040$0.004729.9435.06
claude-sonnet-4-5-20250929AnthropicT03252025-10-24prompt.txtNonefuzzy$0.3212$0.3766194.14227.63
gemini-2.5-flash-liteGoogleT03172025-10-24prompt.txtNonefuzzy$0.0040$0.004789.48105.22
o3OpenAIT03122025-10-24prompt.txtNonefuzzy$0.2112$0.2484478.97563.25
meta-llama/llama-4-maverickMeta (via OpenRouter)T03332025-10-24prompt.txtNonefuzzy$0.0130$0.015989.34108.86
claude-opus-4-20250514AnthropicT03222025-10-24prompt.txtNonefuzzy$1.4412$1.7591301.84368.43
claude-sonnet-4-20250514AnthropicT03232025-10-24prompt.txtNonefuzzy$0.2911$0.3685179.58227.31
gpt-4.1-nanoOpenAIT03082025-10-24prompt.txtNonefuzzy$0.0090$0.011569.8889.36
gpt-4o-miniOpenAIT03062025-10-24prompt.txtNonefuzzy$0.1313$0.1685155.92200.10
pixtral-large-latestMistral AIT03262025-10-24prompt.txtNonefuzzy$0.2711$0.3482133.68171.70
pixtral-12bMistral AIT03292025-10-24prompt.txtNonefuzzy$0.0186$0.024162.8981.43
gpt-5-nanoOpenAIT03112025-10-24prompt.txtNonefuzzy$0.0370$0.0486981.191289.00
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T03352025-10-24prompt.txtNonefuzzy$0.0050$0.0070114.55160.22
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T03342025-10-24prompt.txtNonefuzzy$0.0122$0.0175293.34422.08
claude-3-5-sonnet-20241022AnthropicT03192025-10-24prompt.txtNonefuzzy$0.2892$0.4392141.91215.53
claude-3-opus-20240229AnthropicT03212025-10-24prompt.txtNonefuzzy$1.4458$2.2659258.87405.70
GLM-4.5V-FP8Z.ai (via sciCORE)T03312025-10-24prompt.txtNonefuzzy$0.0000$0.0000N/AN/A
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T03322025-10-24prompt.txtNonefuzzy$0.0171$0.1113220.781437.97

company_lists

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gpt-5OpenAIT03472025-10-28prompt.txtNonef1_micro$0.7163$1.22001853.753157.12
o3OpenAIT03532025-10-28prompt.txtNonef1_micro$0.2726$0.4798596.401049.60
gemini-2.5-proGoogleT03612025-10-28prompt.txtNonef1_micro$0.1753$0.3306468.16882.83
claude-opus-4-20250514AnthropicT03732025-10-28prompt.txtNonef1_micro$1.6953$3.4059366.75736.84
gemini-2.0-flashGoogleT03552025-10-28prompt.txtNonef1_micro$0.0095$0.0193141.12286.34
gemini-2.5-proGoogleT03622025-10-28prompt_min.txtNonef1_micro$0.1439$0.3002328.96686.13
gemini-2.5-flashGoogleT03592025-10-28prompt.txtNonef1_micro$0.0432$0.0907291.69612.05
gemini-2.5-flash-preview-09-2025GoogleT02362025-10-28prompt_min.txtNonef1_micro$0.0388$0.0822187.04395.64
claude-3-7-sonnet-20250219AnthropicT03692025-10-28prompt.txtNonef1_micro$0.3387$0.7236246.12525.79
meta-llama/llama-4-maverickMeta (via OpenRouter)T03952025-10-28prompt.txtNonef1_micro$0.0128$0.0274334.89716.34
claude-opus-4-1-20250805AnthropicT03782025-10-28prompt_min.txtNonef1_micro$1.6020$3.5332494.731091.15
gemini-2.0-flash-liteGoogleT03572025-10-28prompt.txtNonef1_micro$0.0075$0.0165101.13223.41
gpt-5-miniOpenAIT03492025-10-28prompt.txtNonef1_micro$0.0853$0.1901839.221870.80
claude-opus-4-1-20250805AnthropicT03772025-10-28prompt.txtNonef1_micro$1.6994$3.7940495.851107.02
gpt-5-miniOpenAIT03502025-10-28prompt_min.txtNonef1_micro$0.0740$0.1665778.501750.76
gpt-4.1-miniOpenAIT03432025-10-28prompt.txtNonef1_micro$0.0243$0.0547163.93368.93
gemini-2.5-flash-preview-09-2025GoogleT02352025-10-28prompt.txtNonef1_micro$0.2013$0.4551483.831094.03
mistral-large-latestMistral AIT03892025-10-28prompt.txtNonef1_micro$0.1304$0.2978266.34608.15
o3OpenAIT03542025-10-28prompt_min.txtNonef1_micro$0.2749$0.6318619.621424.11
claude-sonnet-4-20250514AnthropicT03752025-10-28prompt.txtNonef1_micro$0.3448$0.8030272.75635.14
gpt-4oOpenAIT03372025-10-28prompt.txtNonef1_micro$0.1335$0.3135463.461087.95
gemini-2.0-flash-liteGoogleT03582025-10-28prompt_min.txtNonef1_micro$0.0069$0.016291.90216.02
gemini-2.5-flashGoogleT03602025-10-28prompt_min.txtNonef1_micro$0.0358$0.0844242.59571.53
mistral-large-latestMistral AIT03902025-10-28prompt_min.txtNonef1_micro$0.1148$0.2731279.32664.14
mistral-medium-2505Mistral AIT03852025-10-28prompt.txtNonef1_micro$0.0369$0.0881238.87570.63
claude-sonnet-4-5-20250929AnthropicT03792025-10-28prompt.txtNonef1_micro$0.3680$0.8886314.00758.19
mistral-medium-2508Mistral AIT03832025-10-28prompt.txtNonef1_micro$0.0379$0.0930270.89664.86
gpt-4.1OpenAIT03412025-10-28prompt.txtNonef1_micro$0.1110$0.2765406.121011.69
claude-3-5-sonnet-20241022AnthropicT03672025-10-28prompt.txtNonef1_micro$0.3323$0.8346233.88587.36
gpt-5-nanoOpenAIT03512025-10-28prompt.txtNonef1_micro$0.0356$0.0900780.681975.90
gpt-4oOpenAIT03382025-10-28prompt_min.txtNonef1_micro$0.1185$0.3025362.18924.67
gpt-4.1-miniOpenAIT03442025-10-28prompt_min.txtNonef1_micro$0.0215$0.0553141.42364.44
claude-sonnet-4-5-20250929AnthropicT03802025-10-28prompt_min.txtNonef1_micro$0.3289$0.8506281.66728.47
claude-opus-4-20250514AnthropicT03742025-10-28prompt_min.txtNonef1_micro$1.5534$4.0595368.93964.11
mistral-medium-2508Mistral AIT03842025-10-28prompt_min.txtNonef1_micro$0.0342$0.0894294.97771.99
claude-3-7-sonnet-20250219AnthropicT03702025-10-28prompt_min.txtNonef1_micro$0.3080$0.8069240.89631.11
gemini-2.5-flash-lite-preview-09-2025GoogleT03662025-10-28prompt_min.txtNonef1_micro$0.0072$0.019031.6083.75
gpt-5OpenAIT03482025-10-28prompt_min.txtNonef1_micro$0.5257$1.44041434.083929.70
mistral-medium-2505Mistral AIT03862025-10-28prompt_min.txtNonef1_micro$0.0300$0.0833251.47697.90
gemini-2.0-flashGoogleT03562025-10-28prompt_min.txtNonef1_micro$0.0092$0.0256125.15347.59
claude-sonnet-4-20250514AnthropicT03762025-10-28prompt_min.txtNonef1_micro$0.3161$0.8906274.61773.66
claude-3-5-sonnet-20241022AnthropicT03682025-10-28prompt_min.txtNonef1_micro$0.2924$0.8266218.63618.04
gpt-4.1-nanoOpenAIT03462025-10-28prompt_min.txtNonef1_micro$0.0057$0.0161111.25315.16
gpt-4.1-nanoOpenAIT03452025-10-28prompt.txtNonef1_micro$0.0067$0.0192119.48341.26
gemini-2.5-flash-lite-preview-09-2025GoogleT03652025-10-28prompt.txtNonef1_micro$0.0091$0.027038.88115.48
gpt-4.1OpenAIT03422025-10-28prompt_min.txtNonef1_micro$0.0885$0.2719248.61764.17
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T03972025-10-28prompt.txtNonef1_micro$0.0147$0.0457139.96435.86
pixtral-large-latestMistral AIT03822025-10-28prompt_min.txtNonef1_micro$0.2187$0.6925441.431397.87
gpt-4o-miniOpenAIT03392025-10-28prompt.txtNonef1_micro$0.0655$0.2193326.351092.12
meta-llama/llama-4-maverickMeta (via OpenRouter)T03962025-10-28prompt_min.txtNonef1_micro$0.0131$0.0445423.011434.36
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T03942025-10-28prompt_min.txtNonef1_micro$0.1410$0.48702486.698588.47
gemini-2.5-flash-liteGoogleT03632025-10-28prompt.txtNonef1_micro$0.0342$0.1212296.391049.37
gemini-2.5-flash-liteGoogleT03642025-10-28prompt_min.txtNonef1_micro$0.0072$0.026188.32321.70
gpt-5-nanoOpenAIT03522025-10-28prompt_min.txtNonef1_micro$0.0274$0.1010536.151973.48
pixtral-12bMistral AIT03872025-10-28prompt.txtNonef1_micro$0.0089$0.033472.07269.86
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T03992025-10-28prompt.txtNonef1_micro$0.0069$0.0268465.241817.21
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T03982025-10-28prompt_min.txtNonef1_micro$0.0118$0.0483284.791169.75
gpt-4o-miniOpenAIT03402025-10-28prompt_min.txtNonef1_micro$0.0646$0.3063276.131308.89
pixtral-large-latestMistral AIT03812025-10-28prompt.txtNonef1_micro$0.1928$0.9326313.381516.03
claude-3-opus-20240229AnthropicT03712025-10-28prompt.txtNonef1_micro$1.9587$11.2177705.224038.81
claude-3-opus-20240229AnthropicT03722025-10-28prompt_min.txtNonef1_micro$1.3550$12.5063503.034642.94
x-ai/grok-4xAI (via OpenRouter)T04012025-10-28prompt.txtNonef1_micro$1.7651$34.07105302.03102341.52
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T03932025-10-28prompt.txtNonef1_micro$0.0508$0.9962754.6514785.48
x-ai/grok-4xAI (via OpenRouter)T04022025-10-28prompt_min.txtNonef1_micro$1.4525$44.76913063.2894414.70
GLM-4.5V-FP8Z.ai (via sciCORE)T03912025-10-28prompt.txtNonef1_micro$0.0000N/AN/AN/A
pixtral-12bMistral AIT03882025-10-28prompt_min.txtNonef1_micro$0.0073N/A24.35N/A
GLM-4.5V-FP8Z.ai (via sciCORE)T03922025-10-28prompt_min.txtNonef1_micro$0.0000N/AN/AN/A
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T04002025-10-28prompt_min.txtNonef1_micro$0.0124N/A538.37N/A

fraktur

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gemini-exp-1206GoogleT00872025-05-09prompt_optimized.txtNonefuzzyN/AN/A875.69908.39
gemini-2.5-proGoogleT01322025-10-01prompt_optimized.txtNonefuzzy$0.1068$0.1112244.01254.18
gemini-2.5-pro-exp-03-25GoogleT00222025-05-09prompt.txtNonefuzzyN/AN/A807.86855.79
gemini-2.5-pro-exp-03-25GoogleT00802025-05-09prompt_optimized.txtNonefuzzyN/AN/A799.79847.24
gemini-2.0-pro-exp-02-05GoogleT00912025-05-09prompt_optimized.txtNonefuzzyN/AN/A855.41906.15
gemini-2.5-pro-preview-05-06GoogleT00972025-05-09prompt_optimized.txtNonefuzzyN/AN/A891.10964.40
gemini-2.5-flashGoogleT01992025-09-30prompt_optimized.txtNonefuzzy$0.0253$0.0276243.38265.12
gemini-2.5-flash-preview-09-2025GoogleT02232025-10-01prompt_optimized.txtNonefuzzy$0.0287$0.0329157.75180.91
gemini-2.0-flash-liteGoogleT00902025-09-30prompt_optimized.txtNonefuzzy$0.0034$0.004160.4873.40
gemini-2.0-flashGoogleT00862025-09-30prompt_optimized.txtNonefuzzy$0.0044$0.006055.0375.39
claude-3-7-sonnet-20250219AnthropicT00922025-09-30prompt_optimized.txtNonefuzzy$0.1777$0.2605196.58288.25
gemini-1.5-proGoogleT00892025-05-09prompt_optimized.txtNonefuzzyN/AN/A123.51192.99
gemini-2.5-flash-liteGoogleT02072025-10-01prompt_optimized.txtNonefuzzy$0.0045$0.007237.2359.10
gemini-2.5-flash-preview-04-17GoogleT00962025-05-09prompt_optimized.txtNonefuzzyN/AN/A223.59370.18
gpt-4.1OpenAIT00832025-09-30prompt_optimized.txtNonefuzzy$0.0776$0.1362224.89394.54
claude-opus-4-1-20250805AnthropicT01232025-09-30prompt_optimized.txtNonefuzzy$0.9371$1.6615289.28512.90
gemini-1.5-flashGoogleT00882025-05-09prompt_optimized.txtNonefuzzyN/AN/A81.93148.43
mistral-large-latestMistral AIT01912025-10-01prompt_optimized.txtNonefuzzy$0.0683$0.1329217.88423.88
gemini-2.5-flash-lite-preview-09-2025GoogleT02152025-10-01prompt_optimized.txtNonefuzzy$0.0058$0.011433.9366.79
claude-3-5-sonnet-20241022AnthropicT00932025-09-30prompt_optimized.txtNonefuzzy$0.1305$0.2632133.05268.26
gpt-4oOpenAIT00792025-09-30prompt_optimized.txtNonefuzzy$0.0790$0.1659516.441084.96
gpt-5-miniOpenAIT01212025-09-30prompt_optimized.txtNonefuzzy$0.0353$0.0746243.61513.94
claude-opus-4-20250514AnthropicT00982025-09-30prompt_optimized.txtNonefuzzy$0.9807$2.1135392.86846.67
mistral-medium-2505Mistral AIT01782025-10-01prompt_optimized.txtNonefuzzy$0.0211$0.0490183.12425.87
pixtral-large-latestMistral AIT00952025-09-30prompt_optimized.txtNonefuzzy$0.0861$0.2121137.85339.53
claude-sonnet-4-20250514AnthropicT00992025-09-30prompt_optimized.txtNonefuzzy$0.2083$0.5820298.19832.94
mistral-medium-2508Mistral AIT01772025-10-01prompt_optimized.txtNonefuzzy$0.0213$0.0642484.571459.55
meta-llama/llama-4-maverickMeta (via OpenRouter)T02512025-10-17prompt_optimized.txtNonefuzzy$0.0059$0.0196112.89376.30
gpt-4o-miniOpenAIT00822025-09-30prompt_optimized.txtNonefuzzy$0.0223$0.0834110.91413.85
GLM-4.5V-FP8Z.ai (via sciCORE)T02412025-10-17prompt_optimized.txtNonefuzzy$0.0000$0.0000675.502659.44
claude-3-opus-20240229AnthropicT00942025-09-30prompt_optimized.txtNonefuzzy$0.6288$2.8326214.63966.82
pixtral-12bMistral AIT01852025-10-01prompt_optimized.txtNonefuzzy$0.0037$0.0171376.971729.20
gpt-4.5-previewOpenAIT00812025-05-09prompt_optimized.txtNonefuzzyN/AN/A224.021349.51
gpt-5OpenAIT01202025-09-30prompt_optimized.txtNonefuzzy$0.2036$1.3394493.973249.77
o3OpenAIT01372025-10-01prompt_optimized.txtNonefuzzy$0.1485$1.0606357.902556.41
gpt-4.1-miniOpenAIT00842025-09-30prompt_optimized.txtNonefuzzy$0.0106$0.230674.241613.84
gpt-5-nanoOpenAIT01222025-09-30prompt_optimized.txtNonefuzzy$0.0054$0.893070.6111768.32
gpt-4.1-nanoOpenAIT00852025-09-30prompt_optimized.txtNonefuzzy$0.0013N/A11.57N/A
claude-sonnet-4-5-20250929AnthropicT02292025-10-01prompt_optimized.txtNonefuzzy$0.2178N/A301.59N/A
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02462025-10-17prompt_optimized.txtNonefuzzy$0.0048N/A47.09N/A

medieval_manuscripts

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gpt-4.1-miniOpenAIT02772025-10-24prompt.txtNonefuzzy$0.0125$0.015652.7166.06
gpt-5-miniOpenAIT02802025-10-24prompt.txtNonefuzzy$0.0618$0.0865507.84711.26
claude-3-5-sonnet-20241022AnthropicT02882025-10-24prompt.txtNonefuzzy$0.1236$0.175196.74137.03
gemini-2.5-flash-preview-09-2025GoogleT02872025-10-24prompt.txtNonefuzzy$0.0106$0.0151119.02170.52
gemini-2.5-flashGoogleT02712025-10-24prompt.txtNonefuzzy$0.0111$0.0161185.44268.76
gemini-2.5-proGoogleT02722025-10-24prompt.txtNonefuzzy$0.0391$0.0572276.60404.38
claude-opus-4-1-20250805AnthropicT02922025-10-24prompt.txtNonefuzzy$0.6464$0.9634264.03393.48
claude-3-7-sonnet-20250219AnthropicT02742025-10-24prompt.txtNonefuzzy$0.1277$0.1932101.40153.40
gpt-4.1OpenAIT02732025-10-24prompt.txtNonefuzzy$0.0535$0.082594.03145.11
claude-opus-4-20250514AnthropicT02902025-10-24prompt.txtNonefuzzy$0.6472$0.9988242.75374.62
mistral-medium-2508Mistral AIT02952025-10-24prompt.txtNonefuzzy$0.0141$0.021884.91131.03
qwen/qwen3-vl-8b-instructAlibaba (via OpenRouter)T03032025-10-24prompt.txtNonefuzzy$0.0024$0.003839.6062.46
gemini-2.5-flash-lite-preview-09-2025GoogleT02862025-10-24prompt.txtNonefuzzy$0.0021$0.003418.9630.43
mistral-large-latestMistral AIT02982025-10-24prompt.txtNonefuzzy$0.0578$0.093675.48122.13
gemini-2.0-flash-liteGoogleT02842025-10-24prompt.txtNonefuzzy$0.0029$0.004836.6060.90
mistral-medium-2505Mistral AIT02962025-10-24prompt.txtNonefuzzy$0.0189$0.0315402.69670.04
gpt-4oOpenAIT02752025-10-24prompt.txtNonefuzzy$0.0666$0.1121107.60181.15
gemini-2.5-flash-liteGoogleT02852025-10-24prompt.txtNonefuzzy$0.0021$0.003643.3074.40
GLM-4.5V-FP8Z.ai (via sciCORE)T02992025-10-24prompt.txtNonefuzzy$0.0000$0.000039.5668.91
gpt-4o-miniOpenAIT02762025-10-24prompt.txtNonefuzzy$0.0487$0.085673.54129.25
gemini-2.0-flashGoogleT02832025-10-24prompt.txtNonefuzzy$0.0071$0.012583.75147.71
claude-sonnet-4-5-20250929AnthropicT02932025-10-24prompt.txtNonefuzzy$0.1378$0.2506161.13292.96
meta-llama/llama-4-maverickMeta (via OpenRouter)T03012025-10-24prompt.txtNonefuzzy$0.0060$0.011146.4085.46
claude-sonnet-4-20250514AnthropicT02912025-10-24prompt.txtNonefuzzy$0.1305$0.2504142.79274.06
o3OpenAIT02822025-10-24prompt.txtNonefuzzy$0.3410$0.6686708.161388.55
qwen/qwen3-vl-30b-a3b-instructAlibaba (via OpenRouter)T03022025-10-24prompt.txtNonefuzzy$0.0078$0.0155241.69479.55
pixtral-large-latestMistral AIT02942025-10-24prompt.txtNonefuzzy$0.1187$0.265598.34220.01
pixtral-12bMistral AIT02972025-10-24prompt.txtNonefuzzy$0.0068$0.016434.5682.89
gpt-5OpenAIT02792025-10-24prompt.txtNonefuzzy$0.5342$1.29652247.525455.15
x-ai/grok-4xAI (via OpenRouter)T03042025-10-24prompt.txtNonefuzzy$1.4935$3.71522573.656402.12
claude-3-opus-20240229AnthropicT02892025-10-24prompt.txtNonefuzzy$0.6457$1.6305151.84383.43
gpt-4.1-nanoOpenAIT02782025-10-24prompt.txtNonefuzzy$0.0032$0.008131.6480.31
gpt-5-nanoOpenAIT02812025-10-24prompt.txtNonefuzzy$0.0094$0.0482216.781117.42
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T03002025-10-24prompt.txtNonefuzzy$0.0298N/A2000.00N/A

metadata_extraction

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gpt-5OpenAIT01092025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.4823$0.61831030.301320.90
o3OpenAIT01352025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1855$0.2541394.51540.43
gpt-5OpenAIT01082025-09-30prompt.txtNonef1_micro$1.2982$1.82852922.294115.90
gpt-5OpenAIT01102025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.7651$1.14191711.522554.51
o3OpenAIT01342025-10-01prompt.txtNonef1_micro$0.5053$0.80211038.491648.39
gemini-2.0-flash-liteGoogleT00562025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0042$0.006645.7772.65
gemini-2.5-flashGoogleT01972025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0087$0.0139195.57310.43
gemini-2.5-proGoogleT01252025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0426$0.0687203.80328.71
gpt-4.5-previewOpenAIT00112025-04-11prompt.txtNonef1_microN/AN/A960.761575.02
gemini-2.0-flashGoogleT00442025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0055$0.009051.7284.78
gpt-4.5-previewOpenAIT00402025-04-11prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_microN/AN/A373.44622.41
gemini-2.5-flash-preview-09-2025GoogleT02212025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0108$0.0180112.91188.19
gpt-4.5-previewOpenAIT00412025-04-11prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_microN/AN/A526.13876.89
o3OpenAIT01362025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3052$0.5087624.761041.27
mistral-medium-2505Mistral AIT01742025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0231$0.039269.80118.30
gemini-exp-1206GoogleT00142025-04-11prompt.txtNonef1_microN/AN/A894.641542.48
gemini-2.5-pro-exp-03-25GoogleT00192025-04-01prompt.txtNonef1_microN/AN/A877.021512.10
gpt-5-miniOpenAIT01122025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0622$0.1072413.46712.86
gemini-2.0-pro-exp-02-05GoogleT00212025-04-01prompt.txtNonef1_microN/AN/A856.031501.80
gpt-5-nanoOpenAIT01152025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0299$0.0524429.53753.56
GLM-4.5V-FP8Z.ai (via sciCORE)T02392025-10-17prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0000$0.000022.1138.79
gpt-4.1-miniOpenAIT00702025-09-30prompt.txtNonef1_micro$0.0630$0.1124152.75272.78
gpt-4o-miniOpenAIT00122025-09-30prompt.txtNonef1_micro$0.3814$0.6935211.44384.43
gemini-2.5-proGoogleT01242025-09-30prompt.txtNonef1_micro$0.1057$0.1921493.26896.84
gemini-2.5-flash-lite-preview-09-2025GoogleT02132025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0026$0.004820.7037.64
gemini-2.0-flash-liteGoogleT00202025-09-30prompt.txtNonef1_micro$0.0089$0.0165105.24194.88
gpt-4o-miniOpenAIT00762025-09-30prompt.txtNonef1_micro$0.3815$0.7064217.37402.54
gpt-5-miniOpenAIT01112025-09-30prompt.txtNonef1_micro$0.1486$0.27521069.711980.95
gemini-2.5-flashGoogleT01962025-09-30prompt.txtNonef1_micro$0.0217$0.0403439.55813.98
gemini-2.5-flash-preview-09-2025GoogleT02202025-10-01prompt.txtNonef1_micro$0.0427$0.0791307.46569.36
gpt-4.1-miniOpenAIT00712025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0238$0.044075.20139.26
mistral-large-latestMistral AIT01892025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1082$0.200464.28119.03
gpt-4.1-miniOpenAIT00722025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0393$0.072786.89160.91
gpt-4oOpenAIT00102025-09-30prompt.txtNonef1_micro$0.2844$0.5367432.47815.98
gpt-4o-miniOpenAIT00422025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1368$0.258086.90163.97
gpt-4o-miniOpenAIT00772025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1368$0.258084.87160.14
gemini-2.0-flash-liteGoogleT00572025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0047$0.008955.63104.95
gemini-2.5-proGoogleT01262025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0617$0.1164275.31519.45
gemini-2.0-flashGoogleT00132025-09-30prompt.txtNonef1_micro$0.0118$0.0227121.45233.56
gpt-4oOpenAIT00392025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.1756$0.3376358.87690.14
gpt-4o-miniOpenAIT00432025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.2447$0.4706120.99232.67
gemini-2.5-flashGoogleT01982025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0130$0.0250196.67378.22
gpt-4.1OpenAIT00672025-09-30prompt.txtNonef1_micro$0.2347$0.4603229.72450.42
mistral-medium-2508Mistral AIT01732025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0232$0.045466.29129.99
gemini-2.5-flash-liteGoogleT02052025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0028$0.005524.1647.38
gpt-4o-miniOpenAIT00782025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.2447$0.4798128.56252.08
gpt-5-miniOpenAIT01132025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0868$0.1702628.661232.66
gemini-2.5-flash-preview-09-2025GoogleT02222025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0157$0.0307167.65328.72
gemini-2.5-flash-lite-preview-09-2025GoogleT02122025-10-01prompt.txtNonef1_micro$0.0066$0.013250.73101.46
GLM-4.5V-FP8Z.ai (via sciCORE)T02382025-10-17prompt.txtNonef1_micro$0.0000$0.000071.75143.50
gpt-4oOpenAIT00382025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.1094$0.2188136.28272.55
gpt-4.1OpenAIT00682025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0905$0.181187.29174.58
gpt-4.1-nanoOpenAIT00742025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0081$0.016170.52141.04
gpt-4.1OpenAIT00692025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.1447$0.2893139.77279.53
gpt-4.1-nanoOpenAIT00732025-09-30prompt.txtNonef1_micro$0.0217$0.0444134.41274.30
mistral-medium-2508Mistral AIT01712025-10-01prompt.txtNonef1_micro$0.0601$0.1252167.64349.25
gemini-2.0-flashGoogleT00452025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0063$0.013169.17144.11
gemini-2.5-flash-lite-preview-09-2025GoogleT02142025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0040$0.008231.4565.52
mistral-medium-2505Mistral AIT01722025-10-01prompt.txtNonef1_micro$0.0602$0.1282163.77348.45
meta-llama/llama-4-maverickMeta (via OpenRouter)T02482025-10-17prompt.txtNonef1_micro$0.0327$0.0696183.79391.03
meta-llama/llama-4-maverickMeta (via OpenRouter)T02492025-10-17prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0120$0.025554.29115.50
gpt-4.1-nanoOpenAIT00752025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0137$0.029188.44188.18
gpt-5-nanoOpenAIT01142025-09-30prompt.txtNonef1_micro$0.0652$0.14171002.372179.06
gemini-2.5-flash-liteGoogleT02042025-10-01prompt.txtNonef1_micro$0.0068$0.014961.57133.85
gpt-5-nanoOpenAIT01162025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0341$0.0741465.751012.51
GLM-4.5V-FP8Z.ai (via sciCORE)T02402025-10-17prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0000$0.000033.0171.76
gemini-1.5-proGoogleT00162025-04-11prompt.txtNonef1_microN/AN/A325.48723.29
mistral-large-latestMistral AIT01882025-10-01prompt.txtNonef1_micro$0.2842$0.6316168.93375.40
meta-llama/llama-4-maverickMeta (via OpenRouter)T02502025-10-17prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0205$0.045670.88157.50
claude-sonnet-4-5-20250929AnthropicT02262025-10-01prompt.txtNonef1_micro$0.5785$1.3147243.73553.93
claude-opus-4-1-20250805AnthropicT01182025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$1.0480$2.3818101.89231.56
claude-sonnet-4-5-20250929AnthropicT02272025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2292$0.5209102.12232.08
gemini-2.5-flash-liteGoogleT02062025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0040$0.009234.9979.53
mistral-medium-2505Mistral AIT01762025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0371$0.0863108.79252.99
mistral-large-latestMistral AIT01902025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.1758$0.4088108.78252.98
claude-3-7-sonnet-20250219AnthropicT00172025-09-30prompt.txtNonef1_micro$0.5398$1.2853235.51560.74
gemini-1.5-flashGoogleT00482025-04-11prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_microN/AN/A113.85271.08
claude-3-7-sonnet-20250219AnthropicT00252025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3273$0.7794140.90335.48
mistral-medium-2508Mistral AIT01752025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0371$0.0883100.16238.48
claude-3-5-sonnet-20241022AnthropicT00182025-09-30prompt.txtNonef1_micro$0.5403$1.3177171.85419.14
claude-3-7-sonnet-20250219AnthropicT00242025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2127$0.5188109.17266.27
claude-opus-4-20250514AnthropicT01012025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$1.0565$2.5769120.33293.48
claude-3-5-sonnet-20241022AnthropicT00522025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2131$0.532677.08192.71
claude-3-5-sonnet-20241022AnthropicT00532025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3281$0.8202135.74339.35
gemini-1.5-flashGoogleT00152025-04-11prompt.txtNonef1_microN/AN/A291.55747.57
claude-sonnet-4-5-20250929AnthropicT02282025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3493$0.9193128.77338.86
claude-opus-4-20250514AnthropicT01002025-09-30prompt.txtNonef1_micro$2.6761$7.2328282.63763.87
pixtral-large-latestMistral AIT00232025-09-30prompt.txtNonef1_micro$0.6883$1.9121164.98458.28
claude-opus-4-1-20250805AnthropicT01172025-09-30prompt.txtNonef1_micro$2.6621$7.3947238.52662.57
pixtral-large-latestMistral AIT00612025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.4348$1.2789102.15300.45
claude-sonnet-4-20250514AnthropicT01052025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.3226$0.9487113.88334.95
gemini-1.5-flashGoogleT00492025-04-11prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_microN/AN/A184.94560.41
claude-opus-4-1-20250805AnthropicT01192025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$1.6173$5.0542137.86430.81
claude-sonnet-4-20250514AnthropicT01032025-09-30prompt.txtNonef1_micro$0.5322$1.7169211.51682.28
pixtral-large-latestMistral AIT00602025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2545$0.820866.61214.87
pixtral-12bMistral AIT01842025-10-01prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0314$0.101465.18210.27
claude-opus-4-20250514AnthropicT01022025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$1.6251$5.4172167.21557.37
pixtral-12bMistral AIT01822025-10-01prompt.txtNonef1_micro$0.0496$0.1710109.70378.27
claude-sonnet-4-20250514AnthropicT01042025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.2107$0.780478.40290.37
claude-3-opus-20240229AnthropicT00362025-09-30prompt.txtNonef1_micro$2.6852$10.7406304.651218.61
pixtral-12bMistral AIT01832025-10-01prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0182$0.072643.94175.76
claude-3-opus-20240229AnthropicT00632025-09-30prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$1.6204$6.4818172.46689.82
claude-3-opus-20240229AnthropicT00622025-09-30prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$1.0614$5.0543128.92613.89
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02442025-10-17prompt.txt
{"skip_signatures": true, "skip_non_signatures": false}
f1_micro$0.0337$0.1775184.67971.93
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02452025-10-17prompt.txt
{"skip_signatures": false, "skip_non_signatures": true}
f1_micro$0.0583$0.5834312.083120.79
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02432025-10-17prompt.txtNonef1_micro$0.0581$0.9691267.654460.85

test_benchmark

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gpt-4oOpenAIT00012025-09-30prompt.txtNoneN/A$0.0045N/A7.66N/A
gemini-2.0-flashGoogleT00022025-09-30prompt.txtNoneN/A$0.0002N/A3.05N/A
claude-3-5-sonnet-20241022AnthropicT00032025-09-30prompt.txtNoneN/A$0.0080N/A8.60N/A
gemini-2.5-flashGoogleT01932025-09-30prompt.txtNoneN/A$0.0012N/A19.04N/A
gemini-2.5-flash-liteGoogleT02012025-10-01prompt.txtNoneN/A$0.0263N/A99.68N/A
gemini-2.5-flash-lite-preview-09-2025GoogleT02092025-10-01prompt.txtNoneN/A$0.0001N/A1.07N/A
gemini-2.5-flash-preview-09-2025GoogleT02172025-10-01prompt.txtNoneN/A$0.0008N/A7.19N/A

test_benchmark2

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
gpt-4oOpenAIT00042025-09-30a_prompt.txtNoneN/A$0.0039N/A10.09N/A
gemini-2.0-flashGoogleT00052025-09-30a_prompt.txtNoneN/A$0.0002N/A2.58N/A
claude-3-5-sonnet-20241022AnthropicT00062025-09-30a_prompt.txtNoneN/A$0.0069N/A8.17N/A
gemini-2.5-flashGoogleT01942025-09-30a_prompt.txtNoneN/A$0.0003N/A10.83N/A
gemini-2.5-flash-liteGoogleT02022025-10-01a_prompt.txtNoneN/A$0.0001N/A1.54N/A
gemini-2.5-flash-lite-preview-09-2025GoogleT02102025-10-01a_prompt.txtNoneN/A$0.0001N/A0.98N/A
gemini-2.5-flash-preview-09-2025GoogleT02182025-10-01a_prompt.txtNoneN/A$0.0006N/A7.22N/A

zettelkatalog

Model ↕ Provider ↕ Test ID ↕ Date ↕ Prompt ↕ Rules ↕ Results ↕ Cost (USD) ↕ Cost/Point ↕ Test Time (s) ↕ Time/Point ↕
claude-3-5-sonnet-20241022AnthropicT01432025-10-01prompt.txtNonef1_micro$2.5000$2.85681577.111802.20
gpt-5OpenAIT01652025-10-01prompt.txtNonef1_micro$7.1243$8.187021108.2824256.89
gemini-2.5-proGoogleT01552025-10-01prompt.txtNonef1_micro$0.7095$0.81573744.414304.93
gemini-2.5-flash-preview-09-2025GoogleT02242025-10-01prompt.txtNonef1_micro$0.3949$0.45912424.332818.30
gemini-2.5-flashGoogleT02002025-09-30prompt.txtNonef1_micro$0.1684$0.19653027.503531.85
gpt-4.1OpenAIT01602025-10-01prompt.txtNonef1_micro$2.6952$3.150912936.4015123.99
claude-3-7-sonnet-20250219AnthropicT01442025-10-01prompt.txtNonef1_micro$2.5778$3.01831529.231790.50
claude-sonnet-4-20250514AnthropicT01482025-10-01prompt.txtNonef1_micro$2.5141$2.98701456.901730.95
gemini-2.0-flashGoogleT01512025-10-01prompt.txtNonef1_micro$0.0795$0.0945632.49751.61
o3OpenAIT01682025-10-01prompt.txtNonef1_micro$2.5396$3.04539024.0410821.21
claude-opus-4-1-20250805AnthropicT01462025-10-01prompt.txtNonef1_micro$12.5676$15.21661581.361914.67
gpt-4oOpenAIT00662025-09-30prompt.txtNonef1_micro$1.3031$1.57993291.083989.93
gemini-2.0-flash-liteGoogleT01522025-10-01prompt.txtNonef1_micro$0.0615$0.0757618.19761.16
claude-sonnet-4-5-20250929AnthropicT02302025-09-30prompt.txtNonef1_micro$2.7730$3.43951428.771772.17
gpt-5-miniOpenAIT01662025-10-01prompt.txtNonef1_micro$1.3158$1.663422744.7828753.17
claude-opus-4-20250514AnthropicT01472025-10-01prompt.txtNonef1_micro$12.7064$16.18351760.172241.84
mistral-medium-2508Mistral AIT01792025-10-01prompt.txtNonef1_micro$0.2675$0.3413934.751192.46
mistral-large-latestMistral AIT01922025-10-01prompt.txtNonef1_micro$1.1730$1.5031862.111104.69
pixtral-large-latestMistral AIT01592025-10-01prompt.txtNonef1_micro$2.1044$2.70401298.951669.01
mistral-medium-2505Mistral AIT01802025-10-01prompt.txtNonef1_micro$0.2675$0.3448835.721077.34
gpt-5-nanoOpenAIT01672025-10-01prompt.txtNonef1_micro$0.3448$0.44755049.086552.84
claude-3-opus-20240229AnthropicT01452025-10-01prompt.txtNonef1_micro$13.6030$17.83552789.973658.04
gpt-4.1-miniOpenAIT01612025-10-02prompt.txtNonef1_microN/AN/A29647.3039063.63
gemini-2.5-flash-liteGoogleT02082025-10-01prompt.txtNonef1_micro$0.1044$0.1497587.82842.84
gemini-2.5-flash-lite-preview-09-2025GoogleT02162025-10-01prompt.txtNonef1_micro$0.3252$0.47501152.221682.91
gpt-4.1-nanoOpenAIT01622025-10-02prompt.txtNonef1_microN/AN/A471.26696.91
meta-llama/llama-4-maverickMeta (via OpenRouter)T02522025-10-17prompt.txtNonef1_micro$0.5506$0.815919155.4628385.64
pixtral-12bMistral AIT01862025-10-01prompt.txtNonef1_micro$0.1384$0.2274596.77980.26
GLM-4.5V-FP8Z.ai (via sciCORE)T02422025-10-17prompt.txtNonef1_micro$0.0000$0.000044963.40173666.98
qwen/qwen3-vl-8b-thinkingAlibaba (via OpenRouter)T02472025-10-17prompt.txtNonef1_micro$0.1827$1.77031035.9610037.63
gpt-4o-miniOpenAIT01642025-10-03prompt.txtNonef1_micro$0.0201$1.36401229.4583295.51

About This Page

This benchmark suite is designed to test AI models on humanities data tasks. The tests run monthly and results are automatically updated.

For more details, visit the GitHub repository.