Categories: corpora, publications, news, tools
Test Data release
The test data set for our evaluation campaign has been released (if you haven’t received a notification email, please get in touch).
System submissions are due by March 8th, 2015.
Download
You can download the sample data, release 0.1, by going here and clicking the button 'Download this' at the top right corner (and just in case, the initial release is still here)
Online Forum Summarization (OnForumS), MultiLing 2015
README for the sample data release
The sample data is formed of one news article from The Guardian and a select set of readers' comments.
There are five files constituting the sample data release:
1. 81043636.ofs.in.xml
2. 81043636.ofs.out.xml
3. 81043636.utf8.txt
4. outputFormatOFS.txt
5. ofs.dtd
Participants will be expected to take file 1 as input and produce file 2 as output
by populating the section accordingly. File 3 is provided as an
auxiliary text version of the input, file 4 is a sketch of the XML format with
comments and file 5 is a DTD specification of the XML format. The text in file 1
is sentence-split and pre-tokenised (i.e., with spaces between tokens), whereas in
file 3 it is not.
The test data to be handed out for the final evaluation will be formed of a set of
news articles, where for each article there will be a pair of files, one XML file like
file 1 above and one auxiliary text file like file 3 above.
In addition to the data, participants will receive a validation program that they
can run over their outputs in order to make sure these conform with the OnForumS
format expectations (DTD + some specific checks, see * below for DTD validation).
Please note that the set of links provided within file 2 in order to illustrate the
task is a non-exhaustive set of links which was the result of pre-pilot crowdsourcing
evaluations using Crowd Flower.
--
* A Java DTD validator that can be used is the DOMValidator class at the following link:
http://www.herongyang.com/XML/DTD-Validation-of-XML-with-DTD-Using-DOM.html
Download the class, compile it and run it as follows:
java -Xmx1000M -Xms1000M -cpDOMValidator 81043636.ofs.out.xml
Information
For questions on OnForumS, please contact:
Revision created 3201 days ago by Mijail A. Kabadjov
Revision created 3204 days ago by Mijail A. Kabadjov
Revision created 3319 days ago by Mijail A. Kabadjov
Revision created 3319 days ago by Mijail A. Kabadjov
Revision created 3369 days ago by Mijail A. Kabadjov
Revision created 3374 days ago by Mijail A. Kabadjov
Revision created 3374 days ago by Mijail A. Kabadjov
Revision created 3396 days ago by Mijail A. Kabadjov
Revision created 3396 days ago by Mijail A. Kabadjov
Revision created 3396 days ago by Mijail A. Kabadjov
Revision created 3396 days ago by Mijail A. Kabadjov
Revision created 3396 days ago by Mijail A. Kabadjov
Revision created 3469 days ago by Mijail A. Kabadjov
Revision created 3474 days ago by Mijail A. Kabadjov
Revision created 3474 days ago by Mijail A. Kabadjov
Revision created 3474 days ago by Mijail A. Kabadjov
Revision created 3474 days ago by Mijail A. Kabadjov
Revision created 3474 days ago by Mijail A. Kabadjov
Revision created 3474 days ago by Mijail A. Kabadjov
Revision created 3549 days ago by Mijail A. Kabadjov