Sugarcube | Document Information Technologies
0
home,blog,ajax_fade,page_not_loaded,,qode_grid_1300,qode-theme-ver-9.5,wpb-js-composer js-comp-ver-4.12,vc_responsive

automation-gearsMunicipalities as well as big companies (insurance, bank) may have to handle huge amount of invoices coming from few providers (phone, internet connection, electricity, gas, water, etc). Manually retrieving all the invoices data on a computer is a tedious task which can now be alleviated ! Based on sugarcube‘s APIs, a French startup has elaborated a complete workflow which automatically retrieves all relevant billing information from PDF invoices in order to feed their customer databases. Want to know more about it? Contact us

AddAnalysisUsing sugarcube‘s APIs, our team developped a standalone tool able to dynamically extract official advertisements from newspapers (in PDF format). By dropping a PDF file into a “hotfolder”, a background process seamlessly extracts all newspaper official ads in order to dynamically generate structured XML content. Beside image precisely shows our tool recognizing ads locations and structures (colored lines and frames). Our solution now saves tons of cumbersome work to our customer. Hence, manually spotting each ad from newspapers and copying/pasting their fields into data files has become a thing of the past ! Want to know more about it? Contact us  

ThesisTOCGetting data from PDF is far from straightforward. To ease the pain, sugarcube‘s team developped a tool, called Dexter, which converts any PDF file to OCD, an XML PDF description. Extracting and handling data from PDF has never been so easy. Starting from OCD, a partner developped a proprietary tool able to spot and extract specific table of contents from PDF documents. Want to know more about it? Contact us  

TabularDataGetting data from PDF is far from straightforward. To ease the pain, sugarcube‘s team developped a tool, called Dexter, which converts any PDF file to OCD, an XML PDF description. Extracting and handling data from PDF has never been so easy. Starting from OCD, we developped an ad-hoc solution to extract tabular data from a 10 years logging document for a US institute. The result was written in a CSV file containing about 25’000 records. Want to know more about it? Contact us      

FedlexPreprocessingExcerpt

Task Target

  • Batch process images in order to get optimized contrast, i.e., dark contents written on white backgrounds.
  • Caution ! We do not want to binarize an image, but remove its background while keeping all text and image contents dark and antialiased !