Categories: corpora, publications, news, tools
OnForumS Gold Data Set
A gold data set out of the test data set and the input from the crowdsourcing evaluation has been compiled and released. It can be downloaded by going here and clicking the button 'Download this' at the top right corner .
OnForumS Evaluation (including P/R/F1 measures per link-label)
The evaluation spread sheets have been updated to include Precision, Recall and F1 measures for every link-label per system run, macro-averaged over the full set of documents. The updated spread sheets can be downloaded here.
OnForumS Submission package
In order to validate your submission, please use the following software package: onforums-submission-validation-0.2 (click the button 'Download this' at the top right corner ).
Test Data release
The test data set for our evaluation campaign has been released (if you haven’t received a notification email, please get in touch).
System submissions are due by March 8th, 2015.
Download
You can download the sample data, release 0.1, by going here and clicking the button 'Download this' at the top right corner (and just in case, the initial release is still here)
Online Forum Summarization (OnForumS), MultiLing 2015
README for the sample data release
The sample data is formed of one news article from The Guardian and a select set of readers' comments.
There are five files constituting the sample data release:
1. 81043636.ofs.in.xml
2. 81043636.ofs.out.xml
3. 81043636.utf8.txt
4. outputFormatOFS.txt
5. ofs.dtd
Participants will be expected to take file 1 as input and produce file 2 as output
by populating the section accordingly. File 3 is provided as an
auxiliary text version of the input, file 4 is a sketch of the XML format with
comments and file 5 is a DTD specification of the XML format. The text in file 1
is sentence-split and pre-tokenised (i.e., with spaces between tokens), whereas in
file 3 it is not.
The test data to be handed out for the final evaluation will be formed of a set of
news articles, where for each article there will be a pair of files, one XML file like
file 1 above and one auxiliary text file like file 3 above.
In addition to the data, participants will receive a validation program that they
can run over their outputs in order to make sure these conform with the OnForumS
format expectations (DTD + some specific checks, see * below for DTD validation).
Please note that the set of links provided within file 2 in order to illustrate the
task is a non-exhaustive set of links which was the result of pre-pilot crowdsourcing
evaluations using Crowd Flower.
--
* A Java DTD validator that can be used is the DOMValidator class at the following link:
http://www.herongyang.com/XML/DTD-Validation-of-XML-with-DTD-Using-DOM.html
Download the class, compile it and run it as follows:
java -Xmx1000M -Xms1000M -cpDOMValidator 81043636.ofs.out.xml
Information
For questions on OnForumS, please contact:
Revision created 3231 days ago by Mijail A. Kabadjov
Revision created 3234 days ago by Mijail A. Kabadjov
Revision created 3349 days ago by Mijail A. Kabadjov
Revision created 3349 days ago by Mijail A. Kabadjov
Revision created 3399 days ago by Mijail A. Kabadjov
Revision created 3404 days ago by Mijail A. Kabadjov
Revision created 3404 days ago by Mijail A. Kabadjov
Revision created 3426 days ago by Mijail A. Kabadjov
Revision created 3426 days ago by Mijail A. Kabadjov
Revision created 3426 days ago by Mijail A. Kabadjov
Revision created 3426 days ago by Mijail A. Kabadjov
Revision created 3426 days ago by Mijail A. Kabadjov
Revision created 3499 days ago by Mijail A. Kabadjov
Revision created 3504 days ago by Mijail A. Kabadjov
Revision created 3504 days ago by Mijail A. Kabadjov
Revision created 3504 days ago by Mijail A. Kabadjov
Revision created 3504 days ago by Mijail A. Kabadjov
Revision created 3504 days ago by Mijail A. Kabadjov
Revision created 3504 days ago by Mijail A. Kabadjov
Revision created 3579 days ago by Mijail A. Kabadjov