Metadata Extraction from Historic Rheinschifffahrt Letters¶
Table of Contents¶
- Overview
- Creator
- Historic Rheinschifffahrt Letters
- Ground Truth
- Guidelines for creating the ground truth
- Metadata schema
- Importing the ground truth from Google Sheets
- Scoring
- Scoring a letter
- Scoring the collection
- To Dos
Overview¶
This benchmark focuses on the extraction of metadata from historic letters. It provides a ground truth for the metadata categories send_date
, sender_persons
and receiver_persons
for a collection of letters from the Basler Rheinschifffahrt-Aktiengesellschaft between 1926 and 1932, and F1-micro and F1-macro scores across these categories.
Creator¶
This benchmark was created by the University of Basel's Research and Infrastructure Support RISE (rise@unibas.ch) between 2022 and 2025.
Historic Rheinschifffahrt Letters¶
This benchmark uses as input the digital collection "Basler Rheinschifffahrt-Aktiengesellschaft, insbesondere über die Veräusserung des Dieselmotorbootes 'Rheinfelden' und die Gewährung eines Darlehens zur Finanzierung der Erstellung des Dieselmotorbootes 'Rhyblitz' an diese Firma" (shelf mark: CH SWA HS 191 V 10
, persistent link: http://dx.doi.org/10.7891/e-manuscripta-54917, referred to as "the collection" in what follows) of the Schweizer Wirtschaftsarchiv.
The collection consists of 68 letters of various length (mostly 1-3 pages). The letters are dated between 1926 and 1932 and are written in German. The letters are mostly typewritten, with some handwritten annotations. The letters reflect the correspondence of the Basler Rheinschifffahrt-Aktiengesellschaft and are mostly signed by individuals or companies. The letters cover a variety of topics, including business transactions, shipping schedules, and personnel matters. In this benchmark, a subset of 57 letters have been ground truthed (see below).
Ground Truth¶
The ground truth for the collection is created in the RSF Letters Ground Truth Google Sheet. It is then imported and used to benchmark LLMs with respect to information extraction tasks.
Guidelines for creating the ground truth¶
The ground truth for letters is created by filling out the ground_truth
tab of the Google Sheet.
Metadata schema¶
The ground_truth
tab of the Google Sheet uses the following metadata schema:
Field Name | Description | Data Type |
---|---|---|
transkribus_doc_url | URL link to the letter on Transkribus. | string (URL) |
document_number | The letter's number is between 1 and 68, inclusive (1 ≤ i ≤ 68). | zero padded integer |
done | Indicates whether the creation of the ground truth is completed. | boolean |
checked_by | Identifier of the person who is responsible for creating the ground truth. | string |
send_date | Date when the letter was sent. | ISO 8601 date or None |
letter_title | Title of the letter as diplomatically inscribed. | string or None |
sender_persons_inscribed | Sender person(s) as diplomatically inscribed. | string or None |
sender_persons | Individuals explicitly mentioned as senders in the document. | string or None |
receiver_persons_inscribed | Receiver person(s) as diplomatically inscribed. | string or None |
receiver_persons | Individuals associated with receiving the document, inferred or explicitly stated. | string |
has_signatures | Indicates whether the document contains signatures. | boolean |
signatures_recognised | Indicates whether all signatures have been mapped to persons as per ground_truth/persons.json . |
boolean |
comment | Additional comments or annotations about the document. | string or None |
action_required | Indicates what action is required to get to document done. | string |
Persons¶
Persons are recorded in the persons
tab of the Google Sheet. The metadata schema and workflow for persons is described in ground_truth_persons_organizations.md
.
For sender_persons_inscribed
, sender_persons
receiver_persons_inscribed
, receiver_persons
:
- Pipe |
is used to separate multiple values: Mustermann, Hans | Musterfrau, Maria
.
- Indicate persons inferred from function & date with angle brackets: <Mustermann, Hans>
- Indicate persons inferred from the correspondence history with double angle brackets: <<Musterfrau, Maria>>
The sender_persons
and receiver_persons
fields use the names of persons as recorded in the normalized persons
tab. Be sure to add inscribed variants to their respective alternateName
fields.
Importing the ground truth from Google Sheets¶
Letters that are done are exported from the ground_truth_export
tab of the Google Sheet as a CSV file and saved to ground_truth/letters.csv
.
Scoring¶
We score the metadata extraction of letters from the ground truth using the predictions of LLMs. Intuitively, we want to tell if the extracted metadata is correct and complete.
Scoring a letter¶
We score each letter for the metadata categories send_date
, sender_persons
and receiver_persons
by checking true positives (TP), false positives (FP), and false negatives (FN) against the ground truth.
Example¶
Consider the first letter as an example. The letter is composed of three pages:
![]() |
![]() |
![]() |
The scoring of the letter is as follows:
Metric | Ground Truth | Prediction | TP | FP | FN |
---|---|---|---|---|---|
send_date |
1926-02-16 | 1926-02-16 | 1 | 0 | 0 |
sender_persons |
Groschupf-Jaeger, Louis Ritter-Dreier, Fritz |
Basler Rheinschiffahrt-Aktiengesellschaft | 0 | 1 | 2 |
receiver_persons |
Christ-Wackernagel, Paul | Herr Christ i/Fa. Paravicini, Christ & Co. |
1 | 1 | 0 |
send_date
: The prediction matches the ground truth (1 TP).sender_persons
: The prediction is incorrect (1 FP) as "Basler Rheinschiffahrt-Aktiengesellschaft" is not a sender person, and the two actual sender persons "(Groschupf-Jaeger, Louis" and "Ritter-Dreier, Fritz") are missing (2 FN).receiver_persons
: The prediction is partly correct as "Herr Christ" is mentioned as a receiver person (1 TP), and the prediction is partly incorrect as "i/Fa. Paravicini, Christ & Co." is not a receiver person (1 FP).
Scoring the collection¶
With scores for each letter in place, we can calculate the overall performance of an LLM on the collection. We calculate F1-micro and F1-macro:
- F1 is the harmonic mean of precision and recall, where precision is TP / (TP + FP) and recall is TP / (TP + FN).
- F1-micro is the harmonic mean of precision and recall across all categories.
- F1-macro is the average of F1 scores across all categories.
Rule parameters¶
inferred_from_function
: If true, the person is inferred from their function and the date (e.g., a letter from the Basler Personenschifffahrtsgesellschaft in 1925 signed by "der Präsident" was penned by Max Vischer-von Planta ).inferred_from_correspondence
: If true, the person is inferred from the correspondence history (e.g., "referring to your letter from last week").skip_signatures
: If true, then letters with signatures are not scored.skip_non_signatures
: If true, then letters without signatures are not scored.
To Dos¶
- [ ] Add more fields to the metadata schema, namely
sender_organization
(inscribed and normalized),receiver_organization
(inscribed and normalized), fields for entities mentioned (persons, places, organizations, ships; inscribed and normalized).