Glossary of Terms



Ad Hoc Capture Capturing individual documents as needed or capturing documents for individual use, as opposed to batch, production-level capture.


ADF The Automatic Document Feeder on a scanner. The feeder holds multiple pages and automatically loads one page after another into the scanner. Sometimes called an auto feeder.


Annotation For scan operations, a text annotation is a string of characters permanently added to an image before the image is compressed and written to disk.


Aspect Ratio The size relationship between the width and height of an image. When images are resized, distortion will occur if the aspect ratio is not maintained.


Auto-Length Detection A scanner feature that detects when it reaches the end of a page. If additional pages remain, scanning continues on the following page.


Automated Document Separation A technique that relies upon software, rather than separator sheets, to determine the break point between multiple documents scanned in the same batch. Typically results in significant savings, both for consumables no longer needed to print separator sheets, and for labour no longer needed to insert and remove them.


Automatic Discrimination A scanner’s ability to sense the difference between text and photos and adjust itself accordingly. When enabled, this feature allows the scanner to switch between line mode and half tone scanning in one pass.



Back-Office Capture Capturing and processing batches of documents in a mid- to high-volume, centralised, production-level environment, as opposed to a front-office environment where the documents originate.


Backfile Conversion The process of scanning and indexing a repository of documents, most commonly paper-based but including microfilm and microfiche, then storing them in a digital format.


Bar Code A method of portraying information as a sequence of machine-readable vertical lines of varying widths. The rectangular black bars and white spaces between them are known as elements. Predetermined groupings of elements form characters as defined for specific bar code types.


Batch A group of one or more documents processed in a single scan operation.


Batch Class A definition of all the settings for processing a batch, including the types of documents in the batch and the processing queues through which they will pass.


Bitonal An image made up of pixels that either are white or black (i.e., an image that includes no shades of gray or colour).


Bleed-Through Text or images printed on one side of a page that are visible, in whole or in part, on the other side of the page, often because the paper weight chosen was not heavy enough for the printer. Can cause an imaging system to mistakenly believe that the blank side of a page actually holds content.


Business Process Automation Business Process Automation, or BPA, is the process a business uses to contain costs. It consists of integrating applications, cutting labour wherever possible, and using software applications throughout the organization.



Cache Memory used for the temporary storage of images during scanning. Caching images to an accelerator board can significantly improve scanner performance. Sometimes called prescan cache.


Capture Forms Also known as data capture forms. Typical data capture forms have a fill-in-the-gaps format, often having boxes to enter details of company name, requisition number, etc. The data captured on the forms is then scanned or manually entered into a database.


Capture Software Software licences sold through resellers or directly to end users as a primary capture solution, as opposed to OEM or Point of Sale (POS) Software.


Character Reconstruction The process of rebuilding text characters after a line removal operation (e.g., removing the signature line of a form, typically by using colour dropout, so that only the signature remains.


Colour Dropout The removal of a specific colour from a scanned image. Typically used for pre-printed forms, so that the form itself does not appear in the scanned image, leaving only the information filled in by the user. This increases OCR effectiveness, and decreases data storage and bandwidth requirements.


Compression A software or hardware process that shrinks images so they occupy less storage space and can be transmitted faster. Generally, compression is accomplished by removing data that define blank spaces and other redundant information, and replacing them with a smaller symbolic code.


Content Management Content Management or CM, is a set of processes and technologies that support the evolutionary life cycle of digital information. This digital information is often referred to as content or, to be precise, digital content. Digital content may take the form of text, such as documents, multimedia files, such as audio or video files, or any other file type which follows a content lifecycle which requires management.


Content Management Systems A Content Management System (CMS) such as a document management system (DMS) is a computer application used to manage work flow needed to collaboratively create, edit, review, index, search, publish and archive various kinds of digital media and electronic text.



Data Capture Data capture is the identification and extraction of data from a scanned document, often to be sent to a workflow for routing and action as part of a business process.


Data Capture Forms Also known as simply capture forms. Typical data capture forms have a fill-in-the-gaps format, often having boxes to enter details of company name, requisition number, etc. The data captured on the forms is then scanned or manually entered into a database.


Data Entry Software Data Entry Software automates your document flow, which results in reduced processing costs, reduced processing time and improved accuracy. Automatic data entry software enables enormous savings in time and money when entering invoices and forms into a computer system.


Data Management Data management comprises all the disciplines related to managing data as a valuable resource.


Departmental Scanners Scanners that fall between workgroup scanners and production scanners in their page per minute capacity. Departmental scanners typically can scan from 26 to 40 pages per minute.


Distributed Capture Capturing documents at geographically distributed locations and automatically transmitting them to a central location for further processing or archiving. The process of scanning documents at remote sites (a.k.a., branch offices), then transmitting their digital images via the Internet or WAN to a central location for additional processing or storage.


Document Capture The process of capturing a digital image of an entire document. Most frequently used for archival storage. (see also: data capture)


Document Imaging Document imaging transforms paper documents into electronic images that can be used in computer-based business applications and archives. A document imaging system includes a document scanner, multifunction peripheral (MFP), fax machine or other scanning device, plus software to handle the image produced by the device.


Document Management Document management controls the life cycle of documents in your organization — how they are created, reviewed, published, and consumed, and how they are ultimately disposed of or retained.


Document Management Software Document management software generally consists of a repository and client software designed for the storage and retrieval of documents in electronic formats, including TIFF images, PDF files and a variety of office document formats. Document management differs from content management in that it focuses on document formats, while content management systems are designed to handle many other types of information, including graphics, multimedia and so on.


Document Management Systems A document management system is a computer system (or set of computer programs) used to track and store electronic documents and/or images of paper documents.


Document Scanning Document scanning transforms paper documents into electronic images that can be used in computer-based business applications and archives. A document scanning system includes a document scanner, multifunction peripheral, fax machine or other scanning device, plus software to handle the image produced by the device.


Document Scanning Software Document scanning software transforms paper documents into electronic images that can be used in computer-based business applications and archives.


DPI Dots Per Inch. See resolution.


Driver Software code that enables a computer to communicate with peripheral devices (e.g., scanner, printer).


Duplex Scanning Scanning both sides of a page simultaneously on a single pass through the scanner. Requires a scanner that supports this feature. Compare to: double-sided scanning.


Dynamic Scaling A set of display options that allow scaling factors to be determined by the size of the display window. For example, an image can be displayed so that the entire width of the image fits the width of the display. Or, it can be displayed so that the entire height fits the height of the display



Edge Enhancement A digital processing filter that makes lines and characters in an image appears sharper, making them easier to read or understand.


Electronic Data Capture An Electronic Data Capture system is a computerized system designed for the collection of data in electronic format. Typically, EDC systems provide 1) a graphical user interface component for data entry, 2) a validation component to check user data, and 3) a reporting tool for analysis of the collected data.


Element The rectangular black bars, or spaces between the bars, that make up a bar code.


Endorser A scanner attachment that prints a3 text string on a page as it is scanned. Sometimes called an imprinter.


Endorser String A text string mechanically printed on a page as it is scanned. See annotation.


Engine Run-time software that enables the functionality integrated into an application through the use of a toolkit.


Enterprise Content Management Enterprise Content Management (ECM) is the strategies, methods and tools used to capture, manage, store, preserve, and deliver content and documents related to organizational processes. ECM tools allow the management of an enterprise level organization’s information.



Fax Server – Fax over IP Increasingly, fax servers forgo telephone lines entirely in favour of “Fax over IP”, which sends faxes over an IP network such as the Internet.


Fax-Over-IP Enabling a fax machine or MFP to send and receive faxes via an internet connection rather than a phone line.


Fixed Scaling An option that allows images to be resized to a precise scale ratio.


Forms Processing The ability for software to accept scanned forms and extract data from the boxes and lines to populate databases. Software usually includes the ability to drop out the form so that recognition accuracy improves.


Freeform Data Capture Capturing data from unstructured documents.


Front Office Capture Capturing and processing documents as close as possible to where they originate, as opposed to in centralised, back-office environments.



GUI Graphical user interface. The icons, symbols and other graphical elements workers see on their computer or MFP screens that allow them to interact with the device.



High Availability A high availability system will continue to operate even if one or more of its components fail. Refers to the percentage of time a computer system remains in continuous operation for mission-critical applications.



ICR Intelligent Character Recognition. The ability of software to recognize and translate handwritten characters into machine-readable text.


Image The digitized representation of a picture, graphic, or document.


Import The process of loading existing electronic image files into the system or application.


Indexing The process of assigning descriptive data to a captured document (e.g., name, date, location, transaction number, customer ID), either by having a worker type in terms or by having software extract them automatically.


Information Management Information management (IM) is the collection and management of information from one or more sources and the distribution of that information to one or more audiences.


Input Management Input management is a term coined by a capture company in 2005 in an attempt to differentiate their products from those of their competitors. In fact, input management is merely another term for document capture and data capture from paper and electronic sources.


ISIS Image and Scanner Interface Specification. A driver used primarily with production scanners. First released in 1990 by Pixel Translations (now a part of EMC).



Jog position The position of one side or the other in a printer’s output tray for the printed copies. Jog positions typically are used for collating operations, and are not available on all printers.



Key from Image Refers to a data entry method in which operators manually key data from the scanned image of a document appearing on their computer screens, rather than from the paper original.



Landscape Page orientation where the page width exceeds the page length. See orientation.


Learn-by-Example A technique for improving the performance of an automated system by “teaching” it the correct results for a set of known documents, and then improving its knowledge further as new examples are encountered during actual use.


LOB Line of Business.



Multifunction Product (MFP) Office equipment that acts as a scanner, fax machine, printer and copier, often attached to a network and supporting a workgroup, department or branch office. Sometimes called a multifunction peripheral, multifunction printer, multifunction device (MFD) or all-in-one.



Network Load Balancing A Microsoft-developed technology that balances network traffic across a number of hosts, enhancing the scalability and availability of mission critical applications. It also provides high availability by detecting host failures and automatically redistributing traffic to operational hosts.


Noise Extraneous speckles on an image.



OCR OCR stands for optical character recognition. OCR transforms printed or hand-printed text into electronic data that can be used in a computer system. All OCR starts with an electronic image of the text, usually created with a document scanner.


OCR Software OCR software transforms printed or hand-printed text into electronic data that can be used in a computer system. (OCR stands for optical character recognition.)


ODBC Open Database Connectivity (ODBC) is a standard software API specification for using database management systems (DBMS). ODBC is independent of programming language, database system and operating system.


OEM Software Software that is licensed for distribution by other manufacturers for incorporation into their own products.


OMR Optical mark recognition. Recognizes pencil or pen marks in specific document positions (such as filled-in check boxes or circles) and translates the marks into computer-readable data.


Optical Character Recognition Optical Character Recognition software transforms printed or hand-printed text into electronic data that can be used in a computer system. (Optical character recognition is usually abbreviated as OCR.)


Optical Character Recognition Software Optical Character Recognition software transforms printed or hand-printed text into electronic data that can be used in a computer system. (Optical character recognition is usually abbreviated as OCR).


Orientation The relative direction of a page, either horizontal called landscape or vertical called portrait.



Padding Images The process of adding white space from the end of the scanned image to the length of the selected paper size. This can only occur when using the ADF to feed pages into the scanner, and only with scanners that support this feature.


Panel The area on a scanner or MFP where instructions to control its operation are entered.


Panning The ability to move a zoomed image within the display window with the mouse.


Patch Code A pattern of horizontal black bars separated by spaces. Typically, a patch code is placed near the top centre of a paper document to be scanned and used as a document separator.


Patch Code Trigger A patch code that when detected causes the application to perform a predefined action.


Point of Sale (POS) Software Packaged software that is licensed at the point of sale by a reseller or distributor. Sometimes called shrink-wrap software.


Portrait Page orientation where the page length exceeds the page width.


Production Scanners High volume scanners, typically capable of processing from 40 to 100+ pages per minute. See also workgroup scanners and departmental scanners.



Quiet Zone In a bar code, an unmarked area preceding the leading bar and following the last bar.



Rated Speed The theoretical maximum number of pages that can pass through a printer each minute. Often lowered by real-world considerations such as paper jams, poorly scanned documents that need rescanning, document prep, loading time and maintenance.


Redaction A type of document annotation that conceals from view specific portions of sensitive documents.


Release Script Software needed to export documents and data from capture software to other programs (e.g., ECM, CRM, and ERP) for additional processing or storage. It’s called a release script because a computer programming language known as a scripting language is used to write it.


Remote Station Workstation at a remote site on which a Capture program and a Remote Synchronization Agent are installed.


Resolution The fineness or coarseness of an image as it is scanned, printed, or displayed. It is measured in dots per inch (DPI), typically from 200 to 400 DPI.



Saturation The ability to emphasize horizontal and/or vertical pixels when scaling an image for display or print.


Scan-EnableScan-Enable Adding scanning functionality to a software program that previously lacked this capability. A Document Scan Server (DSS) allows developers to add a “SCAN” button to numerous software programs. Workers using these scan-enabled programs no longer need to switch to a dedicated scan application in order to scan a document.


Scan-On-Demand A method of implementing a document imaging system where documents are scanned only when they’re needed and retrieved from storage. Compare to back – file and day forward processes.


Semi-Structured Document A document that includes known types of data, but where on the page this data is positioned is not known. An example is an invoice. It’s known that an invoice must include an amount and date due, but since every company is free to create their own invoices, there’s no way to know where they might position this information on the page.


SOA Service Oriented Architecture. SOA isn’t a product, technology or standard. Rather, it’s a style of software architecture that supports building applications out of linked together services. Services are components that perform a business process, such as extracting data from an invoice.


Straight Through Processing An automated workflow that a scanned document can pass through to completion without the need for manual intervention.


Structured Document A document for which both the type (number, letter, check mark, etc.) and location of data is known before scanning. For example, the data field for line 35 of IRD tax form IR-5, positioned on the lower right corner of the page, will always contain a number.


Synchronisation Process by which batch classes and other settings at the central site are downloaded to remote stations. If completed batches exist at the remote station, the batches are uploaded to the central site.



TIFF Tagged Image File Format, the primary format used by document imaging systems. Incorporates several forms of compression. Can store multiple page documents as a single file (as opposed to creating separate files for each page).


Transaction Capture Capturing information or data from documents and forms specifically to initiate continue or conclude a business process.


Transformation The process of automatically and intelligently extracting, classifying, indexing and validating information from documents and forms.



Unstructured Documents A type of document for which, prior to scanning, both the type and location of the information it contains is unknown. Documents that cannot be identified as structured or semi-structured are assigned to this category They could be virtually any type of document: correspondence, petitions, advertisements, manuals, brochures or annual reports.


Upload Process by which completed batches are transferred from the remote site to the central site.



Validation A process that raises the probability that captured data is correct. Although sometimes performed manually, faster and more accurate results are achieved when validation is automated. Frequently this is accomplished by comparing captured data to information contained in a database.


Verification A procedure used when captured information absolutely must be correct. Typically, two or more operators will key data from the same captured image.



Web Server A computer that runs an application which allows files to be transferred over the Internet to a client machine. Web server programs operate by accepting HTTP requests from the network, and providing an HTTP response to the requester. The HTTP response typically consists of an HTML document, but can be a raw text file, an image, or some other type of document.


Web Services Web services provide a standard means of interoperating between different software applications, running on a variety of platforms and/or frameworks. Web services are characterized by their great interoperability and extensibility, as well as their machine-processable descriptions thanks to the use of XML.


Workflow Automating the procedures for handling business processes. Workflow systems are usually based on electronic versions of documents – how they are routed through departments in a company; which transactions have to be accomplished in which order, what to do about executions and mistakes – are all workflow concerns.


Workgroup Scanner A low volume scanner, typically capable of scanning 10 to 25 pages per minute. See also departmental scanner and production scanner.


Glossary compiled from a variety of sources, including Kofax, MS Tech Net and Wikipedia.