Architecture
SENTRA’s Intelligent News / Blogs / Forums / Social Media Analysis component.
At the heart of SENTRA’s Intelligent News/Blogs/Forums/Social Media Analysis component is a sentiment analysis engine. High level - Sentiment analysis engine processing pipeline consists of several steps that converts normalized plain text into a semantic network with concepts as nodes and relations between these concepts as edges. Sentiment detection is performed as a logical inference over such a semantic graph.
Each step of SENTRA’s transactional, NLP / AI based sentiment analysis extraction engine is described in more detail below:
1) During sentence detection phase raw text is split into separate sentences. Quotations blocks and direct speech blocks are also detected.
2) Syntax parser converts each sentence into a tree of constituents, where each constituent is a word or a group of words that functions as a single unit within a hierarchical structure of a sentence.
Parser enriches each word in the sentence with linguistic information such as part of speech, lemma, gender, noun number etc that is used in later processing phases.
Mix of grammar based and probabilistic based approaches is used and boosted to perform parsing. Probabilistic parser allows quick and rich high quality coverage of common sentence structures, while extensive grammar rules provide the ability to handle complex, unusual sentences that are exceptions from common sentence structure.
3) Entity extraction and Filter Matching utilizing substantial knowledge bases and machine learning algorithms
Mentions of companies, products and people are detected at this step. Entity extraction is used to resolve ambiguities between mentions of a person and a location or a person and a company or even two people with same name but different professions. Both local context of a sentence and a global context of an article are used to determine the correct entity type and its match to user defined filters.
Sequence labeling (machine learning) approach together with pattern matching over grammatical structures is used and boosted to detect and classify user defined filter mentions.
4) Sentiment detection
Granular sentiment of both short phrases and overall sentence is calculated at this step.
SENTRA’s Sentiment Analysis Engine is built upon principles of compositional semantics that allows calculating sentiment of bigger phrase using sentiment of its parts and context of its use. Engine handles a variety of language phenomena like conditional sentences, infinitive, gerund, complex sentence with subordinate clauses etc.
Sentiment detector incorporates a rich knowledge base of a priori words, phrases sentiment from a variety of domains like finance, economics, telecommunication, consumer electronics, retail, pharmacy, medicine etc. Structure of knowledge base allows easily adopting and extending the engine to new domains. Knowledge base is constantly updated using automatic tools to collect data and manual review and tuning of ‘knowledge’ to achieve high quality results.
5) Construction of sentence's semantic graph
Semantic representation of a situation described in a sentence is constructed at this step utilizing SENTRA’s dependency parser. In order to do this, syntactic tree is transformed into a graph of words and dependencies between them, verb arguments and their roles are detected, complex facts like comparison are extracted and added to semantic graph. Semantic graph helps to simplify further analysis as it's possible to work with highly generalized representation of situations(frames) instead of great number of their syntactic representations.
6) Anaphora resolution
At this step indirect reference to entities are found in text. SENTRA anaphora resolution engine is based on state-of-the art rule-based anaphora resolution algorithms and is extended to support possessive pronouns, definite descriptions, quoted speech, cross-sentence reference, appositions etc. As all components mentioned above anaphora resolver incorporates domain-specific lexicons and rules in order to achieve higher quality.
7) Object to sentiment association
The subsystem calculates and assigns sentiment to each mention of an entity in a sentence. It relies on results of sentiment detector and anaphora resolver. This sub-component is implemented as separate inference engine that also relies on principle of compositional semantic but uses broader sets of facts from semantic graph including words senses, semantic roles of verb's arguments etc.
8) Transaction generation
Converts internal, derived semantic knowledge representation into a format appropriate to store in a structured database (DB).




