Download

You can download the sample data by clicking here


Online Forum Summarization (OnForumS), MultiLing 2015

README for the sample data release


The sample data is formed of one news article from The Guardian and a select set of readers' comments.

There are four files constituting the sample data release:
 1. 81043636.ofs.in.xml
 2. 81043636.ofs.out.xml
 3. 81043636.utf8.txt
 4. outputFormatOFS.txt

Participants will be expected to take file 1 as input and produce file 2 as output.
File 3 is provided as an auxiliary text version of the input and file 4 is a sketch
of the XML format with comments.

The test data to be handed out for the final evaluation will be formed of a set of
news articles, where for each article there will be a pair of files, one XML file like
file 1 above and one auxiliary text file like file 3 above.

Please note that the links provided within file 2 is a non-exhaustive set of links,
which was the result of pre-pilot crowdsourcing evaluations using Crowd Flower.

Information

For questions on OnForumS, please contact:

  • Mijail Kabadjov - University of Essex (OnForumS Task co-chair), malexa @ essex.ac.uk or
  • Josef Steinberger - University of West Bohemia, Czech Republic (OnForumS Task co-chair), jstein @ kiv.zcu.cz

History