Skip to content

Metadata Extraction from Historic Rheinschifffahrt Letters

Table of Contents

Overview

This benchmark focuses on the extraction of metadata from historic letters. It provides a ground truth for the metadata categories send_date, sender_persons and receiver_persons for a collection of letters from the Basler Rheinschifffahrt-Aktiengesellschaft between 1926 and 1932, and F1-micro and F1-macro scores across these categories.

Creator

This benchmark was created by the University of Basel's Research and Infrastructure Support RISE (rise@unibas.ch) between 2022 and 2025.

Historic Rheinschifffahrt Letters

This benchmark uses as input the digital collection "Basler Rheinschifffahrt-Aktiengesellschaft, insbesondere über die Veräusserung des Dieselmotorbootes 'Rheinfelden' und die Gewährung eines Darlehens zur Finanzierung der Erstellung des Dieselmotorbootes 'Rhyblitz' an diese Firma" (shelf mark: CH SWA HS 191 V 10, persistent link: http://dx.doi.org/10.7891/e-manuscripta-54917, referred to as "the collection" in what follows) of the Schweizer Wirtschaftsarchiv.

The collection consists of 68 letters of various length (mostly 1-3 pages). The letters are dated between 1926 and 1932 and are written in German. The letters are mostly typewritten, with some handwritten annotations. The letters reflect the correspondence of the Basler Rheinschifffahrt-Aktiengesellschaft and are mostly signed by individuals or companies. The letters cover a variety of topics, including business transactions, shipping schedules, and personnel matters. In this benchmark, a subset of 57 letters have been ground truthed (see below).

Ground Truth

The ground truth for the collection is created in the RSF Letters Ground Truth Google Sheet. It is then imported and used to benchmark LLMs with respect to information extraction tasks.

Guidelines for creating the ground truth

The ground truth for letters is created by filling out the ground_truth tab of the Google Sheet.

Metadata schema

The ground_truth tab of the Google Sheet uses the following metadata schema:

Field Name Description Data Type
transkribus_doc_url URL link to the letter on Transkribus. string (URL)
document_number The letter's number is between 1 and 68, inclusive (1 ≤ i ≤ 68). zero padded integer
done Indicates whether the creation of the ground truth is completed. boolean
checked_by Identifier of the person who is responsible for creating the ground truth. string
send_date Date when the letter was sent. ISO 8601 date or None
letter_title Title of the letter as diplomatically inscribed. string or None
sender_persons_inscribed Sender person(s) as diplomatically inscribed. string or None
sender_persons Individuals explicitly mentioned as senders in the document. string or None
receiver_persons_inscribed Receiver person(s) as diplomatically inscribed. string or None
receiver_persons Individuals associated with receiving the document, inferred or explicitly stated. string
has_signatures Indicates whether the document contains signatures. boolean
signatures_recognised Indicates whether all signatures have been mapped to persons as per ground_truth/persons.json. boolean
comment Additional comments or annotations about the document. string or None
action_required Indicates what action is required to get to document done. string

Persons

Persons are recorded in the persons tab of the Google Sheet. The metadata schema and workflow for persons is described in ground_truth_persons_organizations.md.

For sender_persons_inscribed, sender_persons receiver_persons_inscribed, receiver_persons: - Pipe | is used to separate multiple values: Mustermann, Hans | Musterfrau, Maria. - Indicate persons inferred from function & date with angle brackets: <Mustermann, Hans> - Indicate persons inferred from the correspondence history with double angle brackets: <<Musterfrau, Maria>>

The sender_persons and receiver_persons fields use the names of persons as recorded in the normalized persons tab. Be sure to add inscribed variants to their respective alternateName fields.

Importing the ground truth from Google Sheets

Letters that are done are exported from the ground_truth_export tab of the Google Sheet as a CSV file and saved to ground_truth/letters.csv.

Scoring

We score the metadata extraction of letters from the ground truth using the predictions of LLMs. Intuitively, we want to tell if the extracted metadata is correct and complete.

Scoring a letter

We score each letter for the metadata categories send_date, sender_persons and receiver_persons by checking true positives (TP), false positives (FP), and false negatives (FN) against the ground truth.

Example

Consider the first letter as an example. The letter is composed of three pages:

Letter page 1 Letter page 2 Letter page 3

The scoring of the letter is as follows:

Metric Ground Truth Prediction TP FP FN
send_date 1926-02-16 1926-02-16 1 0 0
sender_persons Groschupf-Jaeger, Louis
Ritter-Dreier, Fritz
Basler Rheinschiffahrt-Aktiengesellschaft 0 1 2
receiver_persons Christ-Wackernagel, Paul Herr Christ
i/Fa. Paravicini, Christ & Co.
1 1 0
  • send_date: The prediction matches the ground truth (1 TP).
  • sender_persons: The prediction is incorrect (1 FP) as "Basler Rheinschiffahrt-Aktiengesellschaft" is not a sender person, and the two actual sender persons "(Groschupf-Jaeger, Louis" and "Ritter-Dreier, Fritz") are missing (2 FN).
  • receiver_persons: The prediction is partly correct as "Herr Christ" is mentioned as a receiver person (1 TP), and the prediction is partly incorrect as "i/Fa. Paravicini, Christ & Co." is not a receiver person (1 FP).

Scoring the collection

With scores for each letter in place, we can calculate the overall performance of an LLM on the collection. We calculate F1-micro and F1-macro:

  • F1 is the harmonic mean of precision and recall, where precision is TP / (TP + FP) and recall is TP / (TP + FN).
  • F1-micro is the harmonic mean of precision and recall across all categories.
  • F1-macro is the average of F1 scores across all categories.

Rule parameters

  • inferred_from_function: If true, the person is inferred from their function and the date (e.g., a letter from the Basler Personenschifffahrtsgesellschaft in 1925 signed by "der Präsident" was penned by Max Vischer-von Planta ).
  • inferred_from_correspondence: If true, the person is inferred from the correspondence history (e.g., "referring to your letter from last week").
  • skip_signatures: If true, then letters with signatures are not scored.
  • skip_non_signatures: If true, then letters without signatures are not scored.

To Dos

  • [ ] Add more fields to the metadata schema, namely sender_organization (inscribed and normalized), receiver_organization (inscribed and normalized), fields for entities mentioned (persons, places, organizations, ships; inscribed and normalized).

Test Results