Task description

Speech summarization has been of great interest to the community because speech is the principal modality of human communications and it is not as easy to skim, search or browse speech transcripts as it is for textual messages. Speech recorded from call centers offers a great opportunity to study goal-oriented and focused conversations between an agent and a caller. The Call Centre Conversation Summarization (CCCS) task consists in automatically generating summaries of spoken conversations in the form of textual synopses that shall inform on the content of a conversation and might be used for browsing a large database of recordings. Compared to news summarization where extractive approaches have been very successful, the CCCS task's objective is to foster work on abstractive summarization in order to depict what happened in a conversation instead of what people actually said.

The MultiLing'15 CCCS track leverages conversations from the DECODA and LUNA corpora of French and Italian call center recordings, both with transcripts available in their original language as well as English translation (both manual and automatic). Recording duration range from a few minutes to 15 minutes, involving two or sometimes more speakers. In the public transportation and help desk domains, the dialogs offer a rich range of situations (with emotions such as anger or frustration) while staying in a coherent domain.

Given transcripts, participants to the task shall generate abstractive summaries informing a reader about the main events of the conversations, such as the objective of the caller, whether and how it was solved by the agent, and the attitude of both parties. Evaluation will be performed by comparing submissions to reference synopses written by experts. Both conversations and reference summaries are kindly provided by the SENSEI project.

To participate to this task, please contact the Multiling organisers.

Participants have to submit one synopsis for each conversation with a length limit of 7% in term of words. They can participate to any of the English, Italian, French track and submit up to 3 runs per track. Participants have to write a paper describing their system and submit it to the SIGDIAL special session.

Data

  • Training data: can be downloaded from here
  • Test data: can be downloaded from here (since April 10)
  • Reference data: will be added after submission deadline
  • Submission format: guidelinesscript for checking conformance
  • Submission form: get password from organizers and upload files here (before April 24th)

Results

Dates

  • Training data available: Feb 1st
  • Test data available: Apr 10th
  • Submission of system output: April 24th May 1st (note that this is different from the other Multiling tasks)
  • Paper submission: May 7st, using the SIGDIAL submission procedure