AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
Rates of interannotator agreement for Automatic Content Extraction named entities are comparable to rates shown in previous programs like MUC
The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation.
The objective of the ACE program is to develop technology to automatically infer from human language data the entities being mentioned, the relations among these entities that are directly expressed, and the events in which these entities participate. Data sources include audio and image data in addition to pure text, and Arabic and Chine...More
PPT (Upload PPT)
- Introduction and Background
Today’s global web of electronic information, including most notably the www, provides a resource of unbounded information-bearing potential.
- These tasks were identified in general as the extraction of the entities, relations and events being discussed in the language.
- In ACE, on the other hand, the corresponding task is to identify the entity so named.
- The ACE research targets, namely entities, relations, and events, are represented in terms of their underlying attributes and constituents.
- Introduction and Background
Today’s global web of electronic information, including most notably the www, provides a resource of unbounded information-bearing potential
- The Automatic Content Extraction program is a “technocentric” research effort, meaning that the emphasis is on developing core enabling technologies rather than solving the application needs that motivate the research
- The Automatic Content Extraction program, attempts to take the task “off the page” in the sense that the research objectives are defined in terms of the target objects rather than in terms of the words in the text
- Annotation Tasks There are three primary Automatic Content Extraction annotation tasks corresponding to the three research tasks: Entity Detection and Tracking (EDT), Relation Detection and Characterization (RDC), and Event Detection and Characterization (VDC)
- In addition to multiple passes over all Automatic Content Extraction data, an additional 5% to 10% of the data is completely re-annotated from scratch by different annotators
- Rates of interannotator agreement for Automatic Content Extraction named entities are comparable to rates shown in previous programs like MUC (NIST 1999)
- Under the ACE (NIST 2003) and DARPA TIDES (TIDES 2004) Programs, the Linguistic Data Consortium at the University of Pennsylvania develops annotation guidelines, corpora and other linguistic resources to support information extraction research (LDC 2004).
- LDC's ACE annotators tag broadcast transcripts, newswire and newspaper data in English, Chinese and Arabic, producing both training and test data for common research task evaluations.
- Annotation Tasks There are three primary ACE annotation tasks corresponding to the three research tasks: Entity Detection and Tracking (EDT), Relation Detection and Characterization (RDC), and Event Detection and Characterization (VDC).
- During RDC tagging, annotators identify relations that exist between the entities tagged during the EDT task.
- In VDC, annotators identify and characterize five types of events in which EDT entities participate.
- In future phases of ACE, annotators will identify additional event types as well as characterizing relations between events.
- Particular challenges to annotators include the coreference of generic entities and the use of metonymy, characterization of GPEs, distinguishing certain relation types, and identifying implicit vs explicit relations.
- ACE evaluation requires meaningful and helpful scoring of entities, relations and events.
- If the output entity is mapped, the minimum value for the sys entity and its corresponding ref entity is used.
- Entity_Value is discounted for errors in entity type, subtype and class.
- If the output relation is mapped, the minimum value for the sys relation and its corresponding ref relation is used.
- Relation_Value is discounted for errors in relation type and subtype.
- 6 In order for a system output argument to be reasonably considered to represent its corresponding reference argument it is required to exhibit a reasonable overlap with the reference, in terms of Entity_Value.
- If the output event is mapped, the minimum value for the sys event and its corresponding ref event is used.
- Event_Value is discounted for errors in event type and modality.
- Those event entity mentions that appear in these documents are used to compute Participant_Value, .
- Table1: List of Corpora developed for and used to support ACE research
- The objective of the ACE program is to develop technology to automatically infer from human language data the entities being mentioned, the relations among these entities that are directly expressed, and the events in which these entities participate
- 1 While the ACE program is directed toward extraction of information from audio and image sources in addition to pure text, the research effort is restricted to information extraction from text
- An ACE event can have a number of participants, and each participant is characterized by a role that it plays in the event
- The performance measure for all three tasks is formulated in terms of a synthetic application value, where value is accrued by correctly detecting the target objects and correctly recognizing their attributes, and where value is lost by falsely detecting target objects or incorrectly determining attributes of the target objects
- 2 The mapping of system output mentions to reference mentions is chosen so as to maximize the total value of the mentions. 3 All mentions of a system output entity are unmapped for entities that are themselves unmapped. 4 The coreference discount is intended to reduce the penalty for mentions that are valid mentions of an entity but that are incorrectly associated at the entity level