The only technology which is ready for the language of social media, robust and independent of upper case words
Our entity extraction service has been developed using our NaturalExtractor technology. This service detects and extracts proper names (Mitt Romney, Justin Bieber, Puerto Rico, United Nations…), numeric entities (bank accounts, phone numbers…), and alphanumeric entities (car plates, web addresses…) and they are classified according to types (person, company, city, street…).
NaturalExtractor performs the recognition and classification of entities using a combination of linguistic analysis (full parsing), alphanumeric pattern detection and dictionaries (user dictionaries and NaturalExtractor dictionaries, monolingual and multilingual ones).
The only technology which can distinguish between George Washington (person) and George Washington (avenue)
NaturalExtractor detects entities although they may be written in various forms (for example: 20:00, 20 hours, 20h, 8pm, 8 in the evening…). In addition, NaturalExtractor applies a normalization process to the entities and they are normalized to a standard form in order to consistently handle all instances of the same entity (NYSE, New York Stock Exchange, NY Stock Exchange…).
In addition, thanks to our technology for linguistic analysis, NaturalExtractor can assign various types to entities depending on syntactic rules: “I live at Barack Obama” (place), “As Barack Obama said” (person), and detect entities even with absence of upper case: “I am in new york”.