The objective of the Headline Generation (HG) task is to explore some of the challenges highlighted by current state of the art approaches on creating informative headlines to news articles: non-descriptive headlines, out-of-domain training data, and generating headlines from long documents which are not well represented by the head heuristic.

We propose to make available a large set of training data for headline generation, and create evaluation conditions which emphasize those challenges.  Our data sets will draw from Wikinews as well as Wikipedia.  The latter set will leverage data previously released for the MultiLing 2015 and 2017 tasks.  For Wikinews systems will attempt to generate news headlines and for the Wikipedia both title and main section headings given the respective documents with the headlines or title and subject headings removed. Both automatic and human evaluation will be performed on the data which will be drawn from at least 8 languages.

The manual evaluation will consist of pairwise comparisons of machine(-generated) summaries. Each evaluator will be presented the human(-generated) summary and two machine summaries. The evaluation task is to read the human summary and judge if the one machine summary is significantly closer to the human summary information content (e.g. system A > system B) or if the two machine summaries contain comparable quantity of information as the human summary.

The automatic evaluation will be primarily performed by HEVas system, which measures the quality of a headline both in terms of informativeness and readability. Secondary automatic evaluations may include other established automatic evaluation methods.