General Architecture for Text Engineering (GATE) is a Java suite of natural language processing (NLP) tools for man tasks, including information extraction in many languages.[1] It is now used worldwide by a wide community of scientists, companies, teachers and students. It was originally developed at the University of Sheffield beginning in 1995.
As of May 28, 2011, 881 people are on the gate-users mailing list at SourceForge.net, and 111,932 downloads from SourceForge are recorded since the project moved to SourceForge in 2005.[2] The paper "GATE: A framework and graphical development environment for robust NLP tools and applications"[3] has received over 2000 citations since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide,[4] include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady,[5] and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock.[6]
GATE community and research has been involved in several European research projects including: Transitioning Applications to Ontologies, SEKT, NeOn, Media-Campaign, Musing, Service-Finder, LIRICS and KnowledgeWeb.
JAPE transducers are used within GATE to manipulate annotations on text. Documentation is provided in the GATE User Guide.[8] A tutorial has also been written by Press Association Images.[9]
GATE Developer
The screenshot shows the document viewer used to display a document and its annotations. In pink are <a> hyperlink annotations from an HTML file. The right list is the annotation sets list, and the bottom table is the annotation list. In the center is the annotation editor window.
GATE Mímir
GATE generates vast quantities of information including; natural language text, semantic annotations, and ontological information. Sometimes the data itself is the end product of an application but often the information would be more useful if it could be efficiently searched. GATE Mimir provides support for indexing and searching the linguistic and semantic information generated by such applications and allows for querying the information using arbitrary combinations of text, structural information, and SPARQL.
Pheme, a major EU project managed by the GATE group on early detection of false information in social media
References
^Languages mentioned on https://gate.ac.uk/gate/plugins/ include Arabic, Bulgarian, Cebuano, Chinese, French, German, Hindi, Italian, Romanian and Russian.