How it Works

  • A User is a developer or a third party application that uses InfoExtractor’ s functionality via REST API to perform a task.
  • Control Server includes main components of the system that accept tasks from the client applications via REST API, creates tasks for the Processing Stations, distributes the workload among the available Processing Stations, and interacts with the InfoExtractor Console. The Control Server can interact with multiple Processing Stations, thus ensuring high performance of the system.
  • Processing Station is a component of the system that accepts tasks from the Control Server and performs information extraction.
  • Information Extraction Module is a component that sets the rules for information extraction (including a set of available ontologies). A user may upload an Information Extraction Module via InfoExtractor Console.
  • InfoExtractor Console is a component that is used for administering the Control Server. The InfoExtractor Console is used for managing user accounts and licenses and to configure and monitor the entire system and each separate task.

Supported languages

ABBYY InfoExtractor SDK analyses information in the following languages:

  • English
  • Russian

User Interface and Documentation Languages:

  • English
  • Russian

Input File Formats

  • RTF documents (*.rtf)
  • Microsoft Word 97-2003 documents (*.doc)
  • Microsoft Word documents (*.docx)
  • Microsoft Word XML documents (*.xml)
  • Microsoft Word macro-enabled documents (*.docm)
  • Plain text files (*.txt)
  • Web pages (*.html; *.htm)
  • Microsoft PowerPoint 97-2003 presentations (*.ppt; *.pps)
  • Microsoft PowerPoint presentations (*.pptx; *.ppsx)
  • Microsoft PowerPoint XML presentations (*.xml)
  • Microsoft PowerPoint macro-enabled presentations (*.pptm)
  • Microsoft Excel 97-2003 workbooks (*.xls)
  • Microsoft Excel workbooks (*.xlsx)
  • Microsoft Excel macro-enabled workbooks (*.xlsm)
  • Adobe InDesign Markup (IDML) documents (*.idml)
  • OpenDocument texts (*.odt)
  • OpenDocument presentations (*.odp)
  • OpenDocument spreadsheets (*.ods)
  • Adobe FrameMaker documents (*.mif)
  • Portable Document Format (*pdf with text layout)
  • Image files (*.pdf, *.jpeg, *.jpg, *.bmp, *.gif, *.tif, *.tiff, *.png, *.djvu, *.dcx, *.dib, *.jb2, *.j2k, *.jpf, *.jpx, *.pcx, *.wdp)