Below we provide an overview of the Workshop. Extended information (roadmap, calls, per-task pages) can be found in the links advertised as sub-pages in the "Navigation" index on the right of this page.
MultiLing covers a variety of topics on Natural Language Processing, focused on the multi-
lingual aspect of summarization:
- Multilingual summarization across genres and sources: Summarization has been receiving increasing attention during the last years. This is mostly due to the increasing volume and redundancy of available online information but also due to the user created content. Recently, more and more interest arises for methods that will be able to function on a variety of languages and across different types of content and genres (news, social media, transcripts).
This topic of research is mapped to different community tasks, covering different genres and source types: Multilingual single-document summarization [Giannakopoulos et al., 2015]; news headline generation (new task in MultiLing 2017); user-supplied comments summarization (OnForumS task [Kabadjov et al., 2015]); conversation transcripts summarization (see also [Favre et al., 2015]). The spectrum of the tasks covers a variety of real settings, identifying individual requirements and intricacies, similarly to previous MultiLing endeavours [Giannakopoulos et al., 2011, Giannakopoulos, 2013, Elhadad et al., 2013, Giannakopoulos et al., 2015].
- Multilingual summary evaluation: Summary evaluation has been an open question for
several years, even though there exist methods that correlate well to human judgement, when
called upon to compare systems. In the multilingual setting, it is not obvious that these methods
will perform equally well to the English language setting. In fact, some preliminary results have
shown that several problems may arise in the multilingual setting [Giannakopoulos et al., 2011].
The same challenges arise across different source types and genres. This section of the workshop
aims to cover and discuss these research problems and corresponding solutions.
The workshop will build upon the results of a set of research community tasks, which are
elaborated on in the following paragraphs.
Single document summarization
Following the pilot task of 2015, the multi-lingual single-document summarization task will be to generate a single document summary for all the given Wikipedia feature articles from one of about 40 languages provided. The provided training data will be the Single-Document Summarization Task data from MultiLing 2015. A new set of data will be generated based on additional Wikipedia feature articles. The summaries will be evaluated via automatic methods and participants will be required to perform some limited summarization evaluations.
The manual evaluation will consist of pairwise comparisons of machine(-generated) summaries. Each evaluator will be presented the human(-generated) summary and two machine summaries. The evaluation task is to read the human summary and judge if the one machine summary is significantly closer to the human summary information content (e.g. system A > system B) or if the two machine summaries contain comparable quantity of information as the human summary.
Headline Generation
The objective of the
Headline Generation (HG) task is to explore some of the challenges
highlighted by current state of the art approaches on creating informative headlines to news
articles: non-descriptive
headlines, out-of-domain training data, and generating headlines from
long documents which are not well represented by the head heuristic.
We propose to make available a large set of training data for headline
generation, and create evaluation conditions which emphasize those
challenges. We will also rerun the task in DUC 2004 conditions in
order to create comparable results.
Summary Evaluation
This task aims to examine how well automated systems can evaluate summaries from
different languages. This task takes as input the summaries generated from automatic systems
and humans in the Summarization Tasks of MultiLing 2015, but also in the Single document summarization tasks of 2015 and 2017 (when the latter is completed).
The output should be a grading of the summaries. Ideally, we would want the automatic evaluation to maximally correlate to human
judgement, thus the evaluation will be based on correlation measurement between estimated grades and human grades.
Online Forum Summarization (OnForumS)
Further to the successful pilot of OnForumS at MultiLing 2015, we are organizing the task again
in 2017 with a brand new dataset. The OnForumS task investigates
how the mass of comments found on news providers web sites (e.g., The Guardian)
can be summarized. We posit that a crucial initial step towards that goal is to determine
what comments link to, be that either specific news snippets or comments by other users.
Furthermore, a set of labels for a given link may be articulated to capture phenomena
such as agreement and sentiment with respect to the comment target. Solving this labelled
linking problem can enable recognition of salience (e.g., snippets/comments with most links)
and relations between comments (e.g., agreement). The evaluation will focus on how many of
the links and labels were correctly identified, as in the previous OnForumS run.
Call Centre Conversation Summarization (CCCS)
The Call Centre Conversation Summarization (CCCS) task --- run for the first time as a pilot task
in 2015 --- consists in automatically
generating summaries of spoken conversations in the form of textual
synopses that shall inform on the content of a conversation and might
be used for browsing a large database of recordings.
As in CCCS 2015, participants to the task shall generate abstractive
summaries from conversation transcripts that inform a reader about the
main events of the
conversations, such as the objective of the participants and how they are met.
Evaluation will be performed by ROUGE-like measures based on
human-written summaries as in CCCS 2015,
and --- if possible --- will be coupled by manual evaluation, depending on
the funding we can secure for the task.
Revision created 2965 days ago by George Giannakopoulos (Admin)