szte.io
Class WikiDocSet

java.lang.Object
  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractSet<E>
          extended by java.util.HashSet<Document>
              extended by szte.io.WikiDocSet
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, java.lang.Iterable<Document>, java.util.Collection<Document>, java.util.Set<Document>, DocumentSet

public class WikiDocSet
extends java.util.HashSet<Document>
implements DocumentSet

The reader and container class for the Wikipedia soocer corpus.

See Also:
Serialized Form

Constructor Summary
WikiDocSet()
           
 
Method Summary
 void readDocumentSet(java.lang.String file)
          The documents are listed in one file separated by a -DOCSTART- line (extracted from the Wikipedia dump) and the gold-standard labels are in an txt file (in a "documentid TAB label" format).
 
Methods inherited from class java.util.HashSet
add, clear, clone, contains, isEmpty, iterator, remove, size
 
Methods inherited from class java.util.AbstractSet
equals, hashCode, removeAll
 
Methods inherited from class java.util.AbstractCollection
addAll, containsAll, retainAll, toArray, toArray, toString
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.Collection
add, addAll, clear, contains, containsAll, equals, hashCode, isEmpty, iterator, remove, removeAll, retainAll, size, toArray, toArray
 
Methods inherited from interface java.util.Set
addAll, containsAll, equals, hashCode, removeAll, retainAll, toArray, toArray
 

Constructor Detail

WikiDocSet

public WikiDocSet()
Method Detail

readDocumentSet

public void readDocumentSet(java.lang.String file)
The documents are listed in one file separated by a -DOCSTART- line (extracted from the Wikipedia dump) and the gold-standard labels are in an txt file (in a "documentid TAB label" format).

Specified by:
readDocumentSet in interface DocumentSet
Parameters:
file - consist of the path to the document (*.txt files will be read from this directory) and the label XML file path separated by a |