This task aims to examine how well automated systems can evaluate summaries from different languages. This task takes as input the summaries generated from automatic systems and humans in the summarization Tasks of MultiLing 2015, but also in the Single document summarization tasks of 2015 and 2017.
This year we plan to employ variations of system and gold-standard summaries as inputs, to identify challenges beyond informativeness (e.g. sentence ordering, coherence, etc.) and understand how robust systems are with respect to these challenges.
The output should be a grading of the summaries. Ideally, we would want the automatic evaluation to maximally correlate to human judgment, thus the evaluation will be based on correlation measurement between estimated grades and human grades.
Submission procedure, datasets and important dates will be announced once finalized.
Revision created 1648 days ago by John M. Conroy
Revision created 1836 days ago by Nikiforos Pittaras
Revision created 1996 days ago by Nikiforos Pittaras
Revision created 1996 days ago by Nikiforos Pittaras
Revision created 1996 days ago by Nikiforos Pittaras
Revision created 1996 days ago by Nikiforos Pittaras
Revision created 2001 days ago by Nikiforos Pittaras
Revision created 2054 days ago by Nikiforos Pittaras
Revision created 2054 days ago by Nikiforos Pittaras
Revision created 2058 days ago by Nikiforos Pittaras
Revision created 2115 days ago by Nikiforos Pittaras
Revision created 2124 days ago by Nikiforos Pittaras
Revision created 2125 days ago by Nikiforos Pittaras
Revision created 2184 days ago by George Giannakopoulos (Admin)