MultiLing 2015: Multilingual Summarization of Multiple Documents, Online Fora and Call Centre Conversations


From Caesar’s 'Veni, Vidi, Vici' to 'What can be in a summary?' (Karen Sparck-Jones, 1993), summarization techniques have been key to successfully grasping the main points of large amounts of information, and much research has been devoted to improving such techniques. In the past two decades, the progress of summarization research has been supported by evaluation exercises and shared tasks such as DUC, TAC and, more recently, MultiLing (2011, 2013). Multiling is a community-driven initiative for benchmarking multilingual summarization systems, nurturing further research, and pushing the state-of-the-art in the area. The aim of MultiLing 2015 is to continue this evolution and, in addition, to introduce new tasks promoting research on summarizing free human interaction in online fora and customer call centres. With this call we wish to invite the summarization research community to participate in MultiLing 2015.

The Tasks

MultiLing 2015 will feature the Multilingual Multi-document Summarization task familiar from previous editions and its predecessor, the Multilingual Single-document Summarization. In addition, we will pilot two new tracks, Online Forum Summarization (OnForumS) and Call Centre Conversation Summarization (CCCS), in collaboration with the SENSEI EU project ( We describe each task in turn below.

Multilingual Multi-document Summarization (MMS)

The multilingual multi-document summarization track aims to evaluate the application of (partially or fully) language-independent summarization algorithms on a variety of languages. Each system participating in the track will be called to provide summaries for a range of different languages, based on a news corpus. Participating systems will be required to apply their methods to a minimum of two languages. Evaluation will favor systems that apply their methods to more languages.

The corpus used in the Multilingual multi-document summarization track will be based on WikiNews texts ( Source texts will be UTF-8, clean texts (without any mark-up, images,etc.).

The task requires systems to generate a single, fluent, representative summary from a set of documents describing an event sequence. The language of the document set will be within a given range of languages and all documents in a set share the same language. The output summary should be of the same language as its source documents. The output summary should be 250 words at most.

Multilingual Single-document Summarization (MSS)

Following the pilot task of 2013, the multi-lingual single-document summarization task will be to generate a single document summary for all the given Wikipedia feature articles from one of about 40 languages provided. The provided training data will be the 2013 Single-Document Summarization Pilot Task data from MultiLing 2013. A new set of data will be generated based on additional Wikipedia feature articles. For each language 30 documents are given. The documents will be UTF-8 without mark-ups and images. For each document of the training set, the human-generated summary is provided. For MultiLing 2015 the character length of the human summary for each document will be provided, called the target length. Each machine summary should be as close to the target length provided as possible. For the purpose of evaluation all machine summaries greater than the target length will be truncated to the target length. The summaries will be evaluated via automatic methods and participants will be required to perform some limited summarization evaluations.

The manual evaluation will consist of pairwise comparisons of machine-generated summaries. Each evaluator will be presented the human-generated summary and two machine-generated summaries. The evaluation task is to read the human summary and then judge if the one machine-generated summary is significantly closer to the human generated summary information content (e.g. system A > system B or system B > system A) or if the two machine-generated summaries contain comparable quanties of information as the human-generated summary.

Online Forum Summarization (OnForumS)

Most major on-line news publishers, such as The Guardian or Le Monde, publish articles on different topics and encourage reader engagement through the provision of an on-line comment facility. A given news article can often give rise to thousands of reader comments — some related to specific points within the article, others that are replies to previous comments. The great volume of such user-supplied comments suggests the need for automated methods to summarize this content, which in turn poses an exciting and novel challenge for the summarization community.

The purpose of the Online Forum Summarization (OnForumS) track at MultiLing 2015 is to set the ground for investigating how such a mass of comments can be summarised. We posit that a crucial initial step in developing reader comment summarization systems is to determine what comments relate to, be that either specific points within the text of the article, the global topic of the article, or comments made by other users. This constitutes a linking task. Furthermore, a set of link types or labels may be articulated to capture whether, for example, a comment agrees with, elaborates, disagrees with, etc., the point made in the commented-upon text. Solving this labelled linking problem should facilitate the creation of reader comment summaries by allowing, for example, that comments relating to the same article content can be clustered, points attracting the most comment can be identified, representative comments can be chosen for each key point, and the implications of labelled links can be digested (e.g., numbers for or against a particular point), etc.

The SMS task at MultiLing 2015 is a particular specification of the linking task, in which systems will take as input a news article with a reduced set of comments (sifted, according to predefined criteria, from what could otherwise be thousands of comments) and are asked to link and label each comment to sentences in the article (which, for simplification, are assumed to be the appropriate units here), to the article topic as a whole, or to preceding comments. Precise guidelines for when to link and for the link types, will be released as part of the formal task specification, but we anticipate the condition for linking will require sentences addressing the same assertion, and that link types will include at least agreement, disagreement, and sentiment indicators. The data will cover at least three languages (English, Italian, and French); a small set of link-labelled articles will be provided by the SENSEI project for each of these languages for illustration and for development. Additional languages may be covered if the data for these are provided by the participants in the task. These data could be either translations of the data for other languages, or comparable articles on the same topics.

Evaluation will be based on the results of a crowd-sourcing exercise, in which crowd workers are asked to judge whether potential links, and associated labels, are correct for each given test article plus associated comments.

Call Centre Conversation Summarization (CCCS)

Speech summarization has been of great interest to the community because speech is the principal modality of human communications and it is not as easy to skim, search or browse speech transcripts as it is for textual messages. Speech recorded from call centers offers a great opportunity to study goal-oriented and focused conversations between an agent and a caller. The Call Centre Conversation Summarization (CCCS) task consists in automatically generating summaries of spoken conversations in the form of textual synopses that shall inform on the content of a conversation and might be used for browsing a large database of recordings. Compared to news summarization where extractive approaches have been very successful, the CCCS task's objective is to foster work on abstractive summarization in order to depict what happened in a conversation instead of what people actually said.

The MultiLing'15 CCCS track leverages conversations from the DECODA and LUNA corpora of French and Italian call center recordings, both with transcripts available in their original language as well as English translation (both manual and automatic). Recording duration range from a few minutes to 15 minutes, involving two or sometimes more speakers. In the public transportation and help desk domains, the dialogs offer a rich range of situations (with emotions such as anger or frustration) while staying in a coherent domain.

Given transcripts, participants to the task shall generate abstractive summaries informing a reader about the main events of the conversations, such as the objective of the caller, whether and how it was solved by the agent, and the attitude of both parties. Evaluation will be performed by comparing submissions to reference synopses written by experts. Both conversations and reference summaries are kindly provided by the SENSEI project.

How can I participate?

For now you only need to fill in your contact details in the following form:

Make sure you also visit the MultiLing community website:


Finalization pending.

(PLEASE PROVIDE FEEDBACK on the submission dates, if you plan to participate, by e-mailing: ggianna AT iit DOT demokritos DOT gr.)

  • Training data ready: (date to be finalized per task) Dec 12th, 2014
  • Test data available: Feb 15th, 2015
  • System submissions due: Feb 28th, 2015
  • Evaluation starts: Mar 1st, 2015
  • Evaluation ends: Mar 31st, 2015
  • Paper submission due: May 1st, 2015
  • Paper reviews due: May 15th, 2015
  • Camera-ready due: Jun 15th, 2015
  • Workshop: 1st week of Sep , 2015

NOTE: Individual task dates may differ. Please check the MultiLing
website ( for more information.


(Finalization pending) Collocated with SIGDIAL, Prague, Czech Republic

Program Committee Members

(Full list of PC members pending) The Program Committee members are:

  • George Giannakopoulos - NCSR Demokritos (overall chair, MMS Task chair)
  • Jeff Kubina, John Conroy - IDA Center for Computing Sciences (MSS Task chairs)
  • Mijail Kabadjov - University of Essex (OnForumS Task co-chair)
  • Josef Steinberger - University of West Bohemia, Czech Republic (OnForumS Task co-chair)
  • Benoit Favre - University of Marseille (CCCS Task co-chair)
  • Udo Kruschwitz and Massimo Poesio - University of Essex
  • Emma Barker, Rob Gaizauskas and Mark Hepple - University of Sheffield
  • Vangelis Karkaletsis - NCSR Demokritos
  • Fabio Celli - University of Trento
  • Lucy Vanderwende - Microsoft Research
  • Karolina Owczarzak - Oracle

Data Contributors (from MultiLing 2013)

  • Georgios Petasis, George Giannakopoulos - NCSR "Demokritos", Greece
  • Josef Steinberger - University of West Bohemia, Czech Republic
  • Mahmoud El-Haj - Lancaster University, UK
  • Ahmad Alharthi - King Saud University, Saudi Arabia
  • Maha Althobaiti - Essex University, UK
  • Corina Forascu - Romanian Academy Research Institute for Artificial Intelligence (RACAI), and Alexandru Ioan Cuza University of Iasi (UAIC), Romania
  • Jeff Kubina, John Conroy, Judith Shleshinger - IDA/Center for Computing Sciences, USA
  • Lei Li - Beijing University of Posts and Telecommunications (BUPT), China
  • Marina Litvak - Sami Shamoon College of Engineering, Israel
  • Sabino Miranda - Center for Computing Research, Instituto Politécnico Nacional, Mexico