Description
The training data includes 100,834 documents from 2003-2006, with 19,261,118 annotated entities. The evaluation data includes 9,706 documents from 2007, with 187,080 annotated entities.
An empty line separates each document annotation. The first line of a document's annotation contains the NYT document id followed by the title. Each subsequent line refers to an entity, with the following tab-separated fields:
- entity index
- automatically inferred salience {0,1}
- mention count (from our coreference system)
- first mention's text
- byte offset start position for the first mention
- byte offset end position for the first mention
- MID (from our entity resolution system)