Seminar in Computational Linguistics

  • Datum: –14.30
  • Plats:
  • Föreläsare: Jussi Karlgren
  • Kontaktperson: Artur Kulmizev
  • Föreläsning

The TREC Podcasts Track - One New Dataset, Two New Shared Tasks,
and Many New Interesting Questions

Podcasts are spoken documents across a wide-range of genres and styles, 
with skyrocketing listenership across the world. To promote research on 
podcast material, which differs from other sets of language in 
interesting ways, Spotify has released a data set of 100 000 English 
language podcast episodes with full audio, full automatic 
transcriptions, and metadata. Transcribed podcast material differs from 
other collections of English language data in some respects. The talk 
will describe some of the observed differences between this collection 
and some other collections and discuss how some further differences 
might be studied.

The data set was used at the TREC Podcasts Track organised by the US 
National Institutes for Standards and Technology with two shared tasks: 
segment retrieval and summarisation. This talk will describe the data 
set, give a brief overview of the 2020 Podcasts Track, and describe how 
the tasks will develop for this year's track.