Asking Questions: finding answers in long oral-history recordings

Long oral-history recordings are rich historical sources, but their length and structure make them difficult to search and reuse. On 13 July 2023, CLARIN published the impact story State-of-the-Art Speech Recognition for Understanding Oral Histories, presenting the Semantic Search / Asking Questions framework developed around speech recognition, generated questions and semantic search. The practical goal is to help researchers and visitors ask a question, find the relevant passage and play the answer directly from the original testimony.

Long interviews are valuable precisely because they preserve detail, hesitation and context. The same qualities also make them hard to search, cite and reuse. The CLARIN impact story State-of-the-Art Speech Recognition for Understanding Oral Histories presents one of the ways I have worked on this problem: speech recognition combined with generated questions and semantic search.

CLARIN published the story on 13 July 2023. It describes Semantic Search / Asking Questions as a framework for navigating long oral-history recordings through pre-generated, timestamped questions. Instead of forcing users to guess the exact word they need, the interface helps them move through the testimony by meaning, context and the structure of the interview.

We worked on the application at the Department of Cybernetics, University of West Bohemia in Pilsen: me, Martin Bulín, Pavel Ircing, Adam Frémund and Filip Polák. We used recordings available through the Malach Center for Visual History at Charles University and focused on a very practical question: how to get from a long testimony to the passage a person is actually looking for.

For me, the central design constraint was respect for the recordings. Testimonies are sensitive material, and any technical layer around them should help people navigate the original speech without flattening its meaning. Generated questions are useful here because they act as signposts: they guide the user to a passage, while the answer remains anchored in the original audio and video.

A domain-specific speech recogniser made sense for this material. Holocaust testimonies contain elderly speakers, non-native speech, emotion, historical vocabulary and registers that differ from generic web data. In such conditions, measuring the system on the target material and keeping the processing under control matters.

The later semantic-search interface added another useful step: users can ask a question and receive passages whose meaning is close to the query. The CLARIN page links both the public demo and code examples, so the story also points to material that can be inspected and tried.

The practical goal is simple: ask a question, get to the right place in a long recording, and play the answer directly from the original testimony.

Links

Read next