OnForumS System Reports

  • OnForumS: A Shared Task on On-line Forum Summarisation
  • CIST: CIST System Report for SIGdial MultiLing 2015
  • JRC: Tackling the OnForumS Challenge
  • USFD_UNITNSheffield-Trento System for Sentiment and Argument Structure Enhanced Comment-to-Article Linking in the Online News Domain
  • UWB: UWB Participation in the Multiling’s OnForumS Task

OnForumS Gold Data Set

A gold data set out of the test data set and the input from the crowdsourcing evaluation has been compiled and released. Please get in touch with the organisers if you would like to have a copy of the data set.


OnForumS Evaluation (including P/R/F1 measures per link-label)

The evaluation spread sheets have been updated to include Precision, Recall and F1  measures for every link-label per system run, macro-averaged over the full set of documents. Please get in touch with the organisers if you want to have a copy of the spreadsheets.


OnForumS Submission package

In order to validate your submission, please use the following software package:  onforums-submission-validation-0.2 (click the button 'Download this' at the top right corner ).

Test Data release

The test data set for our evaluation campaign has been released (if you haven’t received a notification email, please get in touch).

System submissions are due by March 8th, 2015.


You can download the sample data, release 0.1, by going here and clicking the button 'Download this' at the top right corner (and just in case, the initial release is still here)

Online Forum Summarization (OnForumS), MultiLing 2015

README for the sample data release

The sample data is formed of one news article from The Guardian and a select set of readers' comments.

There are five files constituting the sample data release:
1. 81043636.ofs.in.xml
2. 81043636.ofs.out.xml
3. 81043636.utf8.txt
4. outputFormatOFS.txt
5. ofs.dtd

Participants will be expected to take file 1 as input and produce file 2 as output
by populating the section accordingly. File 3 is provided as an
auxiliary text version of the input, file 4 is a sketch of the XML format with
comments and file 5 is a DTD specification of the XML format. The text in file 1
is sentence-split and pre-tokenised (i.e., with spaces between tokens), whereas in
file 3 it is not.

The test data to be handed out for the final evaluation will be formed of a set of
news articles, where for each article there will be a pair of files, one XML file like
file 1 above and one auxiliary text file like file 3 above.

In addition to the data, participants will receive a validation program that they
can run over their outputs in order to make sure these conform with the OnForumS
format expectations (DTD + some specific checks, see * below for DTD validation).

Please note that the set of links provided within file 2 in order to illustrate the
task is a non-exhaustive set of links which was the result of pre-pilot crowdsourcing
evaluations using Crowd Flower.


* A Java DTD validator that can be used is the DOMValidator class at the following link:


Download the class, compile it and run it as follows:

java -Xmx1000M -Xms1000M -cpDOMValidator 81043636.ofs.out.xml


For questions on OnForumS, please contact:

Mijail Kabadjov - University of Essex: http://privatewww.essex.ac.uk/~malexa/

Josef Steinberger - University of West Bohemia: http://textmining.zcu.cz/?section=member&id=1