From Caesar’s 'Veni, Vidi, Vici'
to 'What can be in a summary?'
(Karen Sparck-Jones, 1993), summarization techniques have been key to successfully grasping the main points of large amounts of information, and much research has been devoted to improving such techniques. In the past two decades, the progress of summarization research has been supported by evaluation exercises and shared tasks such as DUC, TAC and, more recently, MultiLing (2011, 2013). Multiling is a community-driven initiative for benchmarking multilingual summarization systems, nurturing further research, and pushing the state-of-the-art in the area. The aim of MultiLing 2015 is to continue this evolution and, in addition, to introduce new tasks promoting research on summarizing free human interaction in online fora and customer call centres. With this call we wish to invite the summarization research community to participate in MultiLing 2015.
MultiLing 2015 will feature the Multilingual Multi-document Summarization task familiar from previous editions and its predecessor, the Multilingual Single-document Summarization. In addition, we will pilot two new tracks, Online Forum Summarization (OnForumS) and Call Centre Conversation Summarization (CCCS), in collaboration with the SENSEI EU project (http://www.sensei-conversation.eu). We describe each task in turn below.
The multilingual multi-document summarization track aims to evaluate the application of (partially or fully) language-independent summarization algorithms on a variety of languages. Each system participating in the track will be called to provide summaries for a range of different languages, based on a news corpus. Participating systems will be required to apply their methods to a minimum of two languages. Evaluation will favor systems that apply their methods to more languages.
The corpus used in the Multilingual multi-document summarization track will be based on WikiNews texts (http://www.wikinews.org/). Source texts will be UTF-8, clean texts (without any mark-up, images,etc.).
The task requires systems to generate a single, fluent, representative summary from a set of documents describing an event sequence. The language of the document set will be within a given range of languages and all documents in a set share the same language. The output summary should be of the same language as its source documents. The output summary should be 250 words at most.
Following the pilot task of 2013, the multi-lingual single-document summarization task will be to generate a single document summary for all the given Wikipedia feature articles from one of about 40 languages provided. The provided training data will be the 2013 Single-Document Summarization Pilot Task data from MultiLing 2013. A new set of data will be generated based on additional Wikipedia feature articles. For each language 30 documents are given. The documents will be UTF-8 without mark-ups and images. For each document of the training set, the human-generated summary is provided. For MultiLing 2015 the character length of the human summary for each document will be provided, called the target length. Each machine summary should be as close to the target length provided as possible. For the purpose of evaluation all machine summaries greater than the target length will be truncated to the target length. The summaries will be evaluated via automatic methods and participants will be required to perform some limited summarization evaluations.
The manual evaluation will consist of pairwise comparisons of machine-generated summaries. Each evaluator will be presented the human-generated summary and two machine-generated summaries. The evaluation task is to read the human summary and then judge if the one machine-generated summary is significantly closer to the human generated summary information content (e.g. system A > system B or system B > system A) or if the two machine-generated summaries contain comparable quanties of information as the human-generated summary.
Most major on-line news publishers, such as The Guardian or Le Monde, publish articles on different topics and encourage reader engagement through the provision of an on-line comment facility. A given news article can often give rise to thousands of reader comments — some related to specific points within the article, others that are replies to previous comments. The great volume of such user-supplied comments suggests the need for automated methods to summarize this content, which in turn poses an exciting and novel challenge for the summarization community.
The purpose of the Online Forum Summarization (OnForumS) track at MultiLing 2015 is to set the ground for investigating how such a mass of comments can be summarised. We posit that a crucial initial step in developing reader comment summarization systems is to determine what comments relate to, be that either specific points within the text of the article, the global topic of the article, or comments made by other users. This constitutes a linking task. Furthermore, a set of link types or labels may be articulated to capture whether, for example, a comment agrees with, elaborates, disagrees with, etc., the point made in the commented-upon text. Solving this labelled linking problem should facilitate the creation of reader comment summaries by allowing, for example, that comments relating to the same article content can be clustered, points attracting the most comment can be identified, representative comments can be chosen for each key point, and the implications of labelled links can be digested (e.g., numbers for or against a particular point), etc.
The SMS task at MultiLing 2015 is a particular specification of the linking task, in which systems will take as input a news article with a reduced set of comments (sifted, according to predefined criteria, from what could otherwise be thousands of comments) and are asked to link and label each comment to sentences in the article (which, for simplification, are assumed to be the appropriate units here), to the article topic as a whole, or to preceding comments. Precise guidelines for when to link and for the link types, will be released as part of the formal task specification, but we anticipate the condition for linking will require sentences addressing the same assertion, and that link types will include at least agreement, disagreement, and sentiment indicators. The data will cover at least three languages (English, Italian, and French); a small set of link-labelled articles will be provided by the SENSEI project for each of these languages for illustration and for development. Additional languages may be covered if the data for these are provided by the participants in the task. These data could be either translations of the data for other languages, or comparable articles on the same topics.
Evaluation will be based on the results of a crowd-sourcing exercise, in which crowd workers are asked to judge whether potential links, and associated labels, are correct for each given test article plus associated comments.
Speech summarization has been of great interest to the community because speech is the principal modality of human communications and it is not as easy to skim, search or browse speech transcripts as it is for textual messages. Speech recorded from call centers offers a great opportunity to study goal-oriented and focused conversations between an agent and a caller. The Call Centre Conversation Summarization (CCCS) task consists in automatically generating summaries of spoken conversations in the form of textual synopses that shall inform on the content of a conversation and might be used for browsing a large database of recordings. Compared to news summarization where extractive approaches have been very successful, the CCCS task's objective is to foster work on abstractive summarization in order to depict what happened in a conversation instead of what people actually said.
The MultiLing'15 CCCS track leverages conversations from the DECODA and LUNA corpora of French and Italian call center recordings, both with transcripts available in their original language as well as English translation (both manual and automatic). Recording duration range from a few minutes to 15 minutes, involving two or sometimes more speakers. In the public transportation and help desk domains, the dialogs offer a rich range of situations (with emotions such as anger or frustration) while staying in a coherent domain.
Given transcripts, participants to the task shall generate abstractive summaries informing a reader about the main events of the conversations, such as the objective of the caller, whether and how it was solved by the agent, and the attitude of both parties. Evaluation will be performed by comparing submissions to reference synopses written by experts. Both conversations and reference summaries are kindly provided by the SENSEI project.
For now you only need to fill in your contact details in the following form: http://go.scify.gr/multiling2015participation
Make sure you also visit the MultiLing community website: http://multiling.iit.demokritos.gr/
Finalization pending.
(PLEASE PROVIDE FEEDBACK on the submission dates, if you plan to participate, by e-mailing: ggianna AT iit DOT demokritos DOT gr.)
NOTE: Individual task dates may differ. Please check the MultiLing
website (http://multiling.iit.demokritos.gr) for more information.
(Finalization pending) Collocated with SIGDIAL, Prague, Czech Republic
(Full list of PC members pending) The Program Committee members are: