Categories: publications, news, calls
Call for Data Contributors - MultiLing 2013
(Please feel free to forward this message. Apologies for cross-postings.)
Overview
=======
MultiLing 2013 is a workshop, held within ACL 2013, which covers three subdomains of
Natural Language Processing, focused on the multilingual aspect of summarization.
The MultiLing 2013 workshop builds upon the Text Analysis Conference (TAC)
MultiLing Pilot task of 2011, where systems were asked to generate fluent, representative
summaries (around 250 words) for each of 10 predefined topics per language.
Each topic was described by10 source documents.
The set of documents was in one of the following 7 languages: Arabic, Czech, English,
French, Greek, Hebrew and Hindi.
Based on the challenges revealed in MultiLing 2011, this year we also address the problems
of multilingual summary evaluation and data collection. This call asks for contributors for
the data collection process.
Workshop URL: http://multiling.iit.demokritos.gr/pages/view/662/multiling-2013
Scope of the workshop:
================
- Multilingual Summarization
- Multilingual Summarization Evaluation
- Multilingual Summarization Data Collection and Exploitation
The workshop will be based on a multilingual summarization task, as in the original MultiLing Pilot,
adding an evaluation task and presentations related to Data Collection and Exploitation.
Call for Contributors:
==============
This call requests your support and help as Contributors. The Contributors are meant to
help co-ordinate and realize the MultiLing data collection for different languages.
The data collection process involves translation across languages (from English to a selected
language) of news texts (from the WikiNews site, most probably), the generation of human
summaries in each language, as well as the evaluation of summaries (from humans and/or systems).
Each Contributor is expected to undertake a single language.
For details on the required effort see the section "Details concerning the effort" later in this document.
The Contributors are expected to co-author an overview paper, summarizing the data collection effort
for their language and the lessons learned. The paper will be reviewed, improved and submitted as part
of the ACL MultiLing 2013 Workshop proceedings. If the overview paper ends up being too long,
there may be several overview papers, covering different languages. The Contributors can request a slot in the
"Multilingual Summarization Data Collection and Exploitation" part of the workshop for a presentation.
The Contributors will also receive full visibility and will be acknowledged in the MultiLing site and communication.
How to apply to be a Contributor:
======================
Contact George Giannakopoulos via e-mail ( ggianna@iit.demokritos.gr ) describing the language you are
interested in and providing your contact information by March 1st, 2013.
Details concerning the Contributor effort:
============================
This effort is meant to be community-based, thus the cost of gathering data for a single language
will be split among several Contributors, if possible.
For existing languages (Arabic, Czech, English, French, Greek, Hebrew and Hindi) we mean to increase
the current corpus by 50% (i.e., add 5 more topics).
The effort in the languages existing in TAC 2011 (Arabic, Czech, English, French, Greek, Hebrew
or Hindi) will be:
- to check inconsistencies and problems with the existing data (gold summaries, evaluations) and correct them,
- translate 50 new source documents from English to the target language (10 documents per new topic)
- provide 3 human summaries for each of the 5 new topics.
- have 3 judges evaluate each of the automatic summaries that will be submitted (in TAC 2011 we had
30 to 80 summaries per language, depending on the language).
The effort in new languages (e.g., Chinese, Japanese, German, Italian, Spanish, Hungarian, Swedish, Turkish, ...) will be:
- translate 150 source documents (between from English to the target language (10 documents per topic, for 15 topics)
- provide 3 human summaries for each of 15 new topics.
- have 3 judges evaluate each of the automatic summaries that will be submitted (in TAC 2011 we evaluated 30 to 80 summaries per language, depending on the language).
The data collection process will start by mid-February and may need to be concluded by mid-April.
Please feel free to forward this call, if you consider that there is someone that will want to help add a new language.