Download the data here.


Online Forum Summarization (OnForumS), MultiLing 2015
README for the sample data release

The sample data is formed of one news article from The Guardian and a select set of readers' comments.

There are four files constituting the sample data release:
 1. 81043636.ofs.in.xml
 2. 81043636.ofs.out.xml
 3. 81043636.utf8.txt
 4. outputFormatOFS.txt

Participants will be expected to take file 1 as input and produce file 2 as output.
File 3 is provided as an auxiliary text version of the input and file 4 is a sketch
of the XML format with comments.

The test data to be handed out for the final evaluation will be formed of a set of
news articles, where for each article there will be a pair of files, one XML file like
file 1 above and one auxiliary text file like file 3 above.

Please note that the links provided within file 2 is a non-exhaustive set of links,
which was the result of pre-pilot crowdsourcing evaluations using Crowd Flower.

 

For questions on OnForumS, please contact:

  • Mijail Kabadjov - University of Essex (OnForumS Task co-chair), malexa @ essex.ac.uk or
  • Josef Steinberger - University of West Bohemia, Czech Republic (OnForumS Task co-chair), jsteini @ kiv.zcu.cz

History