(2020-06-22) A training set for the Wikipedia headline (as well as single document summary) data is here.

(2020-06-22) The testing data with the headlines for the Wikipedia data is here.

(2019-12-16) Summary evaluation dataset versions available: v1 and v2
(2019-07-10) Submission process details updated.

This task aims to examine how well automated systems can evaluate summaries from different languages. This task takes as input the summaries generated from automatic systems and humans in the summarization Tasks of MultiLing 2015, but also in the Single document summarization tasks of 2015 and 2017.
This year we plan to employ variations of system and gold-standard summaries as inputs, to identify challenges beyond informativeness (e.g. sentence ordering, coherence, etc.) and understand how robust systems are with respect to these challenges.
The output should be a grading of the summaries. Ideally, we would want the automatic evaluation to maximally correlate to human judgment, thus the evaluation will be based on correlation measurement between estimated grades and human grades.

 Please refer to the call for participation regarding important dates for the task.

The task coordinator is dr. George Giannakopoulos (ggianna at iit dot demokritos dot gr)  and the task mailing list is multiling19se at scify dot org.

Participants should submit results via email to the task coordinator with the subject line:
[multiling19][summary evaluation results][systemid]
where [systemid]  an identifier or description of the participating organization or system.

If the results are accompanied by a paper or technical report of the system, it should be submitted by the process described in the call for community task participation.