Test data

ATTENTION: Before you can download please contact ggianna AT iit DOT demokritos DOT gr to get you team's username and password.

Test data downloadable from this link.


Task overview 

This MultiLing task aims to evaluate the application of (partially or fully) language-independent summarization algorithms on a variety of languages. Each system participating in the task will be called to provide summaries for a range of different languages, based on corresponding corpora.  In the MultiLing Pilot of 2011 the languages used were 7, while in the MultiLing 2015 8 languages will be used. Participating systems will be required to apply their methods on a minimum of two languages. Evaluation will favour systems that apply their methods in more languages.

The MultiLing task requires to generate a single, fluent, representative summary from a set of documents describing an event sequence. The language of the document set will be within a given range of languages and all documents in a set share the same language. The output summary should be of the same language as its source documents. The output summary should be 250 words (for non-Chinese languages) or 750 bytes (for Chinese language, in UTF-8 encoding) at most.

Sample input and output

The input and output data samples are based on the MultiLing 2013 equivalent task.

Sample input files for several languages (UTF-8 encoded, plain text files). Unzip the provided file to see the sample input files.

Sample output files for several languages (UTF-8 encoded, plain text files, 250 words max for non-Chinese languages or 750 bytes max for Chinese language). Unzip the provided file to see the sample output files.

References:

  • Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., and Varma, V. (2011). TAC2011 MultiLing Pilot Overview.
  • Giannakopoulos, George. "Multi-document multilingual summarization and evaluation tracks in ACL 2013 MultiLing Workshop." MultiLing 2013 (2013): 20.