This system implements the multi-pass sieve coreference resolution system of Raghunathan et al. at EMNLP 2010.
Note that the current code in this package does not implement mention detection. All results reported here use gold mentions (just as in the paper). However, the DeterministicCorefAnnotator in StanfordCoreNLP implements a simple mention detection component, so this code can be used to perform coreference resolution on raw text.
Note that this code is already different from the system reported in the paper. After the EMNLP paper, two additional sieves were included. The current code gives slightly better scores than those in the paper.
---------------------------------------------------------------------------- MUC B cubed Pairwise P R F1 P R F1 P R F1 ---------------------------------------------------------------------------- ACE2004 dev | 84.5 75.7 79.8 | 88.0 75.8 81.4 | 78.6 53.8 63.9 ACE2004 test | 80.4 72.9 76.4 | 85.1 76.4 80.5 | 68.7 48.9 57.1 ACE2004 nwire | 83.8 74.3 78.8 | 86.9 73.7 79.7 | 78.1 51.7 62.2 MUC6 test | 90.5 69.0 78.3 | 90.5 62.5 73.9 | 89.3 56.1 68.9 ----------------------------------------------------------------------------
This release is generally similar to the code used for EMNLP 2010,
with one additional sieve: relaxed exact string match.
The score may differ also due to the change in Parser or NER.
Results:
---------------------------------------------------------------------------- MUC B cubed Pairwise P R F1 P R F1 P R F1 ---------------------------------------------------------------------------- ACE2004 dev | 84.1 73.9 78.7 | 88.3 74.2 80.7 | 80.0 51.0 62.3 ACE2004 test | 80.5 72.3 76.2 | 85.4 75.9 80.4 | 68.7 47.8 56.4 ACE2004 nwire | 83.8 72.8 77.9 | 87.5 72.1 79.0 | 79.3 47.6 59.5 MUC6 test | 90.3 68.9 78.2 | 90.5 62.3 73.8 | 89.4 55.5 68.5 ----------------------------------------------------------------------------
annotators = tokenize, ssplit, pos, lemma, ner, parse, dcorefThe required properties for dcoref are the following:
dcoref.demonym dcoref.animate dcoref.inanimate dcoref.male dcoref.neutral dcoref.female dcoref.plural dcoref.singular sievePasses // If omitted, default value will be used.
See StanfordCoreNLP for more details.
java -Xmx8g edu.stanford.nlp.dcoref.SieveCoreferenceSystem -props <properties file>A sample properties file (coref.properties) is included in dcoref package. The properties file includes the following:
annotators = pos, lemma, ner // annotators needed for coreference resolution pos.model // For POS model ner.model.3class ner.model.7class // For NER ner.model.MISCclass parser.model // For parser parser.maxlen = 100 dcoref.demonym // The path for a file that includes a list of demonyms dcoref.animate // The list of animate/inanimate mentions (Ji and Lin, 2009) dcoref.inanimate dcoref.male // The list of male/neutral/female mentions (Bergsma and Lin, 2006) dcoref.neutral // Neutral means a mention that is usually referred by 'it' dcoref.female dcoref.plural // The list of plural/singular mentions (Bergsma and Lin, 2006) dcoref.singular sievePasses // Sieve passes - each class is defined in dcoref/sievepasses/ logFile // Path for log file for coref system evaluation ace2004 or mucfile // Use either ace2004 or mucfile (not both) // ace2004: path for the directory containing ACE2004 files // mucfile: path for the MUC fileThis system can process both ACE2004 and MUC6 corpora in their original formats. Examples of corpus are given below. MUC6:
... <s> By/IN proposing/VBG <COREF ID="13" TYPE="IDENT" REF="6" MIN="date"> a/DT meeting/NN date/NN</COREF> ,/, <COREF ID="14" TYPE="IDENT" REF="0"> <ORGANIZATION> Eastern/NNP</ORGANIZATION></COREF> moved/VBD one/CD step/NN closer/JJR toward/IN reopening/VBG current/JJ high-cost/JJ contract/NN agreements/NNS with/IN <COREF ID="15" TYPE="IDENT" REF="8" MIN="unions"><COREF ID="16" TYPE="IDENT" REF="14"> its/PRP$</COREF> unions/NNS</COREF> ./. </s> ...ACE2004:
... <document DOCID="20001115_AFP_ARB.0212.eng"> <entity ID="20001115_AFP_ARB.0212.eng-E1" TYPE="ORG" SUBTYPE="Educational" CLASS="SPC"> <entity_mention ID="1-47" TYPE="NAM" LDCTYPE="NAM"> <extent> <charseq START="475" END="506">the Globalization Studies Center</charseq> </extent> <head> <charseq START="479" END="506">Globalization Studies Center</charseq> </head> </entity_mention> ...