MultiLing Community Site: Task: MMS - Multi-document Summarization

Pages
George Giannakopoulos (Admin)
MultiLing 2015
Task: MMS - Multi-document Summarization - Data and information

Task: MMS - Multi-document Summarization - Data and information

Last updated 4152 days ago by George Giannakopoulos (Admin)

Categories: corpora

Test data

ATTENTION: Before you can download please contact ggianna AT iit DOT demokritos DOT gr to get you team's username and password.

Test data downloadable from this link.

We note that 3 of the test data topics (M001,M002,M003) will not be taken into account during evaluation, since they have been used as training data (see below).

Training data - ** NEW **

Training data downloadable from this link, using the participant username and password provided via e-mail.

Task overview

This MultiLing task aims to evaluate the application of (partially or fully) language-independent summarization algorithms on a variety of languages. Each system participating in the task will be called to provide summaries for a range of different languages, based on corresponding corpora. In the MultiLing Pilot of 2011 the languages used were 7, while in the MultiLing 2015 8 languages will be used. Participating systems will be required to apply their methods on a minimum of two languages. Evaluation will favour systems that apply their methods in more languages.

The MultiLing task requires to generate a single, fluent, representative summary from a set of documents describing an event sequence. The language of the document set will be within a given range of languages and all documents in a set share the same language. The output summary should be of the same language as its source documents. The output summary should be 250 words (for non-Chinese languages) or 750 bytes (for Chinese language, in UTF-8 encoding) at most.

Sample input and output

The input and output data samples are based on the MultiLing 2013 equivalent task.

Sample input files for several languages (UTF-8 encoded, plain text files). Unzip the provided file to see the sample input files.

Sample output files for several languages (UTF-8 encoded, plain text files, 250 words max for non-Chinese languages or 750 bytes max for Chinese language). Unzip the provided file to see the sample output files.

References:

Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., and Varma, V. (2011). TAC2011 MultiLing Pilot Overview.
Giannakopoulos, George. "Multi-document multilingual summarization and evaluation tracks in ACL 2013 MultiLing Workshop." MultiLing 2013 (2013): 20.

Task: MMS - Multi-document Summarization - Data and information

Task: MMS - Multi-document Summarization - Data and information

George Giannakopoulos (Admin)

Navigation