Task description

Speech summarization has been of great interest to the community because speech is the principal modality of human communications and it is not as easy to skim, search or browse speech transcripts as it is for textual messages. Speech recorded from call centers offers a great opportunity to study goal-oriented and focused conversations between an agent and a caller. The Call Centre Conversation Summarization (CCCS) task consists in automatically generating summaries of spoken conversations in the form of textual synopses that shall inform on the content of a conversation and might be used for browsing a large database of recordings. Compared to news summarization where extractive approaches have been very successful, the CCCS task's objective is to foster work on abstractive summarization in order to depict what happened in a conversation instead of what people actually said.

The MultiLing'15 CCCS track leverages conversations from the DECODA and LUNA corpora of French and Italian call center recordings, both with transcripts available in their original language as well as English translation (both manual and automatic). Recording duration range from a few minutes to 15 minutes, involving two or sometimes more speakers. In the public transportation and help desk domains, the dialogs offer a rich range of situations (with emotions such as anger or frustration) while staying in a coherent domain.

Given transcripts, participants to the task shall generate abstractive summaries informing a reader about the main events of the conversations, such as the objective of the caller, whether and how it was solved by the agent, and the attitude of both parties. Evaluation will be performed by comparing submissions to reference synopses written by experts. Both conversations and reference summaries are kindly provided by the SENSEI project.

Data

Training data: can be downloaded from a rel="nofollow" href="http://multiling.iit.demokritos.gr/

Test data: will be available on April 10th

Submission format: TBD

Submission form: TBD

Dates

  • Training data available: Feb 1st
  • Test data available: Apr 10th
  • Submission of system output: April 24th (note that this is different from the other Multiling tasks)
  • Paper submission: May 1st