ABC's of High Speed Document Scanner

Paper scanners are a critical element of document imaging or document management systems as they are the primary method of converting paper to a digitized representation. The quality of scanned documents can make or break a document management system. However, potential users often just accept the scanner that the selected vendor offers and do not include it as part of the vendor selection criteria. We believe that it is essential to evaluate the scanner you intend to use as an integral part of the overall imaging system prior to purchase.

High speed scanners will convert a letter sized document to digital representation in approximately 1-2 seconds at 200 dpi depending on the model of the scanner (very high speed scanners will scan more than 2 pages per second). Typically a mid- to high-speed scanner can be justified when more than 500 pages a day are to be scanned, but large installations will scan 10,000 or more pages a day. As these scanners assume a level of centralized scanning, a user may need to consider purchasing such a scanner when moving from a small individual filing system to a more centralized document imaging system.

The term "scanner" is used by many different industries to define their product. Even when the term is used for document scanners, it can mean different things:- image scanner or OCR scanner, flatbed scanner or high speed scanner with a paper transport, hand-held scanner, film scanner etc..

Most document scanners however are composed of a few similar components: camera, image processor, illumination, paper transport, interface to host, possibly an automatic feeder and a stacker and sometimes certain additional processes or features such as compression and Optical Character Recognition, grayscale or color processing, endorsers and double feed detection. The objective of this paper is to describe these components with emphasis on high-speed document scanning.

The Camera:
To create an image of a document, one must convert the information on a page to a representation using small dots. In a black and white scanner the document is broken into 200 or 300 black and white dots per inch (known as dpi or dots/inch) which are in turn converted into on and off bits in the computer memory. In the case of grayscale scanners each 'dot' is given a value usually consisting of one byte (8 bits) which can be used to represent up to 256 different shades and in the case of a color scanner each dot contains the a red, green and blue value (RGB), each of which is represented by a byte with up to 256 shades -- this gives a potential of representing up to 16 million different colors and is known as 24-bit color.

To create these, the document is moved in front of the camera or in the case of a flatbed scanner, the camera is moved in front of the page. The complete array is built into a page memory, which represents the complete digital image of the page.

Most document scanners today use CCD (charge coupled device) sensors. A linear CCD is a line of sensors looking simultaneously at each line on the document. Contact CCD's, which are often used in low to mid-range scanners, are built with the illumination in the sensor and require that the paper moves right over the sensor, while regular CCD's require a lens to project the image of the document onto the sensor. Either way the CCD sensor captures the reflected light from each single spot on the document to determine its shade. This creates an electrical charge, which is converted internally to a digital representation and depending on the type of scanner to a value. In the case of black and white scanners this sets a switch on or off.

VisionShape, and some other vendors, combine both technologies in the same scanner, using a high speed contact CCD for the back side of the document. This significantly reduces the size of the transport compared to two complete optical paths used otherwise and allows for a simple upgrade.

Currently 2D arrays, which are used in video and still digital cameras simultaneously capturing vertically and horizontally, do not really have enough resolution for document scanning but this technology is moving fast and high end still camera are just around the corner.

Image Resolution:
Using linear sensors, the horizontal image resolution is determined by the number of "sensors" sampling the line. The maximum horizontal resolution therefore is controlled by the number of sensors in the CCD. Vertical resolution is controlled by how fast the paper moves past the CCD.

For example, if the linear CCD has 2,000 elements, it will convert a line 20 cm long to an image with a resolution of 10 points per mm (p/mm). Sensors used today usually have 1728, 2048, or 4096 pixels depending on the scanner and will drop the extra pixels when less than the maximum resolution is required. A 4K array (4096 pixels) allows the scanning of an A4 (8 1/2 x 11 inch) sized page at 400 dots per inch or 16 points per mm (p/mm), but by dropping half of its pixels it captures the image at 200 DPI (8 p/mm). Most modern monochrome document scanners offer 8, 10, 12p/mm and 16p/mm on up to A3 formats using 5K arrays. Larger document scanners (A0, A1) us several arrays stitching the images together. Since resolution affects the image size in both directions (8x8, 12x12), an 8p/mm (200 dpi) raw black and white image of an A4 page will take up nearly 0.5 MB of storage while a 12p/mm raw image will take up almost 1.0 MB. From a viewing standpoint the resolutions required depends on a combination of the numbers of shades captured and the resolution. While in principle, the more points per mm the better, when scanning images in black and white for viewing only, 8p/mm will do in most cases. On the other hand, for OCR applications, scanning at 12p/mm is recommended, especially if the text is smaller than 12 points.

When scanning using grayscale of color a lower resolution can be used for viewing, but effective OCR will required conversion to black and white and a higher resolution.Most scanners offering higher than 600 dpi resolution are doing so by artificially increasing the number of dots using a technique called interpolation which adds dots in-between based on what is on each side. In digital cameras this is known as 'digital zoom.' Sometimes scanner manufacturers --particularly Canon -- will quote a speed based on interpolation, so they will be scanning at say 150 dpi vertical resolution and interpolate this to a 300 dpi quoting the speed for 300 dpi. Obviously this method can eliminate some pixels and legibility of small fonts -- potential users should ensure that the speeds being quoted apply to the optical resolution in order to gain a valid comparison. VisionShape quotes all its speeds at true optical resolution.

The high performance color scanner from Kodak scans at a maximum of 150 dpi in order to gain performance and reduce the size of images. In most cases this provides a perfectly legible image, but it can cause problems if the user wants to convert the image to text using OCR.

Image Enhancement, Gray Scale and Color:
As noted above image cameras are providing more than bitonal (black or white) results. Grayscale or color images take up a lot more memory and storage space than black and white. Most imaging systems are not directly using grayscale images although some are now beginning to use color for archiving. Grayscale has been used internally by OMR scanners for a number of years because the larger number of choices provided allows more processing choices. For example, a pencil mark scanned in black and white may not show up dark enough to be converted to a black pixel, but if scanned in 16 shades of gray (4 bits/pixel), it may be dark enough to appear in the image.

Filters, image enhancement and thresholding programs are used to convert grayscale images to bitonal, allowing better separation between background and actual data, light data and noise etc. A variety of thresholding algorithms are usually offered with scanners, but they act on a complete image. Thus, because most documents have a large degree of text, image enhancement is normally optimized for the curves found on characters. As a result, image enhancement can sometimes have an adverse effect on other applications.

For example, when attempting edge enhancement (to better define actual data from background), the algorithm may tend to fill in gaps between bars of a barcode and make it impossible for the barcode to be read. Some applications -- particularly those that contain photographs -- require color or gray scale. One example is publishing, but little publishing software actually uses gray scale. Most publishing programs use pseudo-gray or half-toned images to represent gray level by using a pattern of dots in a screen.

The high cost of rescanning
Despite high performance image processing, scanners tend to operate best in a narrow range of contrast. This means that for a certain setting of darkness (contrast) the image processor in the scanner tends to perform very well but once the range of background and darkness of the documents scanned varies too much, regardless of the amount of processing, the image will be too dark or too light. This results in a trial and error situation where the operator needs to change the darkness setting on the scanner when the image is too dark or too light.

It's expensive because the images must first be checked for quality control (although there are some techniques to partially automate this) in mixed batches and each image must be looked at. Then the operator must extract the bad page from the stack and rescan it using changed settings. Sometimes it is necessary to do this 3 or 4 times.

New technologies try to alleviate this problem by either pre or post processing the images. Kofax introduced a technology called VRS (Virtual ReScan) which is supported by a few scanner manufacturers. With VRS, grayscale is captured from the scanner onto a proprietary Kofax board in the PC. Once there, doubtful images can be identified automatically from the histogram and individually processed (thresholded) to select a better level of contrast without rescanning. It requires a proprietary video interface cable and quite an expensive board.

VisionShape and Kodak have developed alternative technologies known as NRS (NoReScan) in the case of VisionShape and Perfect Page in the case of Kodak. In these cases the scanner directly analyzes multiple levels of gray and selects the best level, before thresholding eliminating operator intervention. These solutions are designed to acquire a good quality, readable black and white image -- because black and white provides compact images that can be compressed in a lossless fashion.

Color scanning goes some way towards removing the need for these types of technologies. Because color provides multiple shades, it is easy for the eye to identify the different elements even on the most challenging document without rescanning. But color images are very large and color compression standards are lossy (see compression below).

Illumination:
For a CCD to see a dot on a page, this page must be illuminated with a bright light source. It is the reflection of that light which is captured and analyzed by the CCD. When paper is too thin, the light may go through the paper and reflect on a white background, creating character bleed-through. Black background scanners do not have this problem. However, a black background creates a black boarder in the image at the papers edge. Most high-speed scanners use fluorescent lamps running at a very high frequency, so as not to create patterns. The light spectrum of fluorescent lamps is strongest in the middle fading out to the edges. For this reason, the lamps are often wider than the maximum paper width and the paper is always centrally fed (see autofeeders below). Some scanners have other light sources such as tungsten bulbs or LEDs which allows them to align the edges of the paper.

Since the color of the light source determines which colors on the paper are visible or not to the scanner, white lamps will be neutral and see all colors on the paper. However, white fluorescent lamps tend to loose brightness faster than the green phosphor lamps, which are used in most high-speed scanners. When using green light sources, scanners tend to drop light green and light blue colors. When performing forms processing for OCR or OMR purposes, it is often better to remove the background of the form. By using a pastel blue or green color form with green lamps, this can be easily achieved. The traditional drop out color is however red. Many health claim forms, questionnaires, or tax forms use red for the background of the form. Most scanner vendors thus sell special models or offer options for their scanners, such as "red-ink drop out scanners" which use red lamps for illumination.

And now color scanners:
In a forms processing application one of the most important justifications for color scanners is to drop a color. Forms processing software utilizes recognition technologies to eliminate key entry and this is often impaired by lines and boxes in the background. Using pastel colors or the right color of lamp will allow a bitonal scanner to drop a given color, but this is rather cumbersome when different colors of a form need to be scanned on the same machine and colors need to be matched to the scanner. Using a color image, software algorithms can eliminate specific colors from the image.

Color scanners have other uses too. Separator pages can be color-coded making it faster to insert, colored graphs or printed presentations that need to be archived can be easily viewed. Key entry of certain documents such as airline tickets, which have red text on a pink background, can be improved. Colored endorsements, such as those on the backs of checks, can be easily discerned. And of course, a color image is more human friendly than a black and white one.

Most color scanners are low cost, high resolution, low-speed devices or very expensive fast production machines for forms. However mid-range color scanners are now appearing on the market, opening the door for non-form applications such as archiving and web publishing. As the number of applications increases, so will the number of scanners sold and the price will consequently decrease. Expect 2001 to be the year of the mid-range color scanner for many vendors.

Interfacing the Scanner:
Scanners need a certain degree of intelligence to perform their basic functions. Usually these functions are commanded by a PC host computer. A scanning sub-system is composed of a scanner, an interface card (to the host) and a scanning program which communicates the user's requirements to the scanner and generates the appropriate output. However newer technologies have now appeared on the market eliminating the requirement for proprietary interface cards. USB, for example, is now available in many lower end scanners and in VisionShape's 90 PPM scanners.

A selection menu, is available on the scanner for most options such as resolution, contrast selection, paper format, whether the image is bitonal or half-toned, or whether an autofeeder is used or not. However, most scanning programs override these scanner choices controlling the selection from the host.

Two types of data commute between the host computer and the scanner via the interface card: image data and control data. The control data, initiated by the host, selects the scanning options and tells the scanner when to start and stop scanning. Scanned image lines are then sent back to the host. This may require that a special interface card be present in the PC. Most modern scanners however provide a standard SCSI interface. Depending on the variant of SCSI used this may imply that the image is compressed before it is transferred to the PC. Video/RS232 is a method that has the advantage of almost unlimited speed but requires management of the image including compression in the PC.

Most scanners support one type of interface although Fujitsu now offers both video (through a special plug-in board) and SCSI. Capture applications have been developed assuming a certain interface. VisionShape supports all three interfaces:- USB, SCSI and Video on its tri-bus architecture.

Where to Compress:
A scanner interface card provides a hand-shake between the scanner and the computer, sending the command from the host and then receiving the image data from the scanner and placing it into the computer memory.

Raw image records are very large but contain a lot of irrelevant data (usually the white space between the text). Compression consists of reducing the space between relevant data to the number of pixels of the same color rather than the pixels themselves. An algorithm calculates the number of consecutive black or white dots and stores that value. The more white space between text, the better the compression. On the other hand, grayish background may compress very poorly depending on the algorithms used and the scanner's threshold settings.

In bitonal (black and white) images compression can be one dimensional in the horizontal direction (Group/3 -- as used by FAX machines) or two dimensional in both horizontal and vertical directions (Group/4). G3 or G4 refers to the compression method used and are standards set by the International Telecommunication Commission (CCITT). In addition to the compressed image, an image record usually contains a header that identifies the image parameters such as resolution, height, width and other relevant information. Most vendors use a standard called TIFF, which stands for Tagged Image File Format. A G4 TIFF image is thus an image compressed using the bi-directional G4 algorithm with a header that complies with the TIFF convention. Several vendors sell systems that use proprietary header formats but still use a G3 or G4 compression methods. TIFF compression is a lossless compression, which means that when the image is reconstructed all the original information is still there. A TIFF group 4 textual image size will normally be about 10% of the original.

Grayscale or Color is normally compressed using JPEG (Joint Photographers Expert Group). As the name implies, this compression was developed to reduce the size of photographs and so encodes the change of shades. JPEG is a lossy compression method because it deliberately normalizes subtle shadings into the same color. The user can select the amount of compression and therefore how many shades are lost -- in a document, a 10% loss factor will normally result in an image size 10% of the original. 10% loss factor will normally not result in loss of data.

In the early days of imaging, special chips were required to compress a raw image into a compressed record at scanner speeds. However with the advent of today's high speed microprocessors, these chips are of little value for the compression and decompression of images. Simple interface cards such as Dunord or Xionics can transfer image data to the PC memory using a compression software program to compress the image at the same speed if not faster than most compression boards for a fraction of the cost. Most scanners offer the option to compress the image in the scanner. In reality, a complete compression board is integrated in the scanner. One of the advantages of this method is that it allows the use of a more standard communication interface such as slow SCSI or USB to transfer the data to the host PC.

Click here to continue. Daniel Borrey