ImaproFX – Image Processing Framework

Grasping the essence of image processing and analysis algorithms may prove to be quite difficult. To simplify the comprehension of such algorithms, sugarcubeIT proposes ImaproFX, a Java FX user interface providing a dynamic visual feedback thanks to interactive controllers.

ImaproFX – JavaFX Image Processing Framework

The real power of ImaproFX comes from it’s Java coding framework… Coders can create their own image analysis ribbons simply by extending and loading an abstract “ImaproRibbon” class :

 

Scan Background Removal

Removing the paper background from scanned document is not a trivial task. Here is how sugarcube addresses the problem.

Objectives

  • Remove paper background from scanned documents in order to create searchable PDF files composed of binary images together with transparent text layers
  • Binary images assume black text over white background in order to get :
    • a good reading experience
    • a compact file size
  • Develop an automatic process to apply on a batch of tif files

Facts

  • 420’000 TIF images, scanned pages from “Recueil des lois fédérales” from 1947 to 1998 (german, french and italian versions)
  • a total amound of 6,8 TB (TeraBytes)
  • image resolution : 300 DPI, 24-bits RGB

Results

Here above are some representative input samples of the corpus with their respective output counterparts :

How-to

  1. First, we get a TIFF image from the scanned Swiss “Bundesarchiv”.bg_original
  2. Our algorithm then computes the mean background colors for small image tiles.bg_tile
  3. The resulting blocky effect is filtered using a bilinear interpolation.bg_interpol
  4. The algorithm subtracts the background image from the original one, resulting in a non homogeneous light background.bg_subtract
  5. A final dynamic gamma correction is applied to get rid of remaining artefacts.bg-gamma

Conclusion

Getting rid of scanned document paper background is not a straightforward process. Our experience shows that scanned documents variability forces the implementation of an adaptive algorithm.

For instance, tuning a binary threshold from grey level images is clearly not a viable solution with such a heterogenous corpus containing images subject to luminosity, contrast and hue defects.