| A |
|
| Ad Hoc Capture |
Capturing individual documents as needed, or capturing documents for individual use, as opposed to batch, production-level capture. |
| ADF |
The Automatic Document Feeder on a scanner. The feeder holds multiple pages and automatically loads one page after another into the scanner. Sometimes called an auto feeder. |
| Annotation |
For scan operations, a text annotation is a string of characters permanently added to an image before the image is compressed and written to disk. Annotations typically are used with scanners that support endorsers so the scanned image matches the endorsed hard copy document. For print operations, an annotation can be any text or graphic printed on the output (printed page or output file). |
| Aspect Ratio |
The size relationship between the width and height of an image. When images are resized, distortion will occur if the aspect ratio is not maintained. |
| Auto-Length Detection |
A scanner feature that detects when it reaches the end of a page. If additional pages remain, scanning continues on the following page. |
| Automated Document Separation |
A technique that relies upon software, rather than separator sheets, to determine the break point between multiple documents scanned in the same batch. Typically results in significant savings, both for consumables no longer needed to print separator sheets, and for labor no longer needed to insert and remove them. |
| Automatic Discrimination |
A scanner's ability to sense the difference between text and photos and adjust itself accordingly. When enabled, this feature allows the scanner to switch between line mode and half tone scanning in one pass. |
| B |
|
| Back-Office Capture |
Capturing and processing batches of documents in a mid- to high-volume, centralised, production-level environment, as opposed to a front-office environment where the documents originate. |
| Backfile Conversion |
The process of scanning and indexing a repository of documents, most commonly paper-based but including microfilm and microfiche, then storing them in a digital format. Typically done when an organization first begins using a digital imaging system so that past, as well as future, records all will be in the same digital format. |
Ball-Point Pen Filtering
|
A scanner's ability to sense the difference between printed text and ink. When enabled, the scanner uses an algorithm that detects the reflective light characteristics of ink and adjusts itself to produce a better-quality scanned image. |
Bar Code |
A method of portraying information as a sequence of machine-readable vertical lines of varying widths. The rectangular black bars and white spaces between them are known as elements. Predetermined groupings of elements form characters as defined for specific bar code types. There are dozens of types of bar codes, with the UPC found on most items sold at retail probably the best known. In document imaging, bar codes typically are used to encode index data and/or separate documents. |
| Batch |
A group of one or more documents processed in a single scan operation. |
| Batch Class |
A definition of all the settings for processing a batch, including the types of documents in the batch and the processing queues through which they will pass. |
| Bitonal |
An image made up of pixels that either are white or black (i.e., an image that includes no shades of gray or color). |
| Black Border Cropping |
Removal of black border pixels from a scanned image. Cleans up an image and reduces the height and width by the size of the black border. Helps reduce the size of a document, and the space needed to store it |
| Black Border Removal |
Replacement of black border pixels on a scanned image with white pixels. Cleans up an image without changing its height and width. |
| Bleed-Through |
Text or images printed on one side of a page that are visible, in whole or in part, on the other side of the page, often because the paper weight chosen was not heavy enough for the printer. Can cause an imaging system to mistakenly believe that the blank side of a page actually holds content. |
| Business Process Automation |
Business Process Automation, or BPA, is the process a business uses to contain costs. It consists of integrating applications, cutting labor wherever possible, and using software applications throughout the organization.
Delivering Business Process Automation
There are four main techniques for delivering automation of a process.
- Extension of existing IT systems
As most IT systems are inherently automation engines in themselves, a valid option is to extend their functionality to enable the desired automation, creating customized linkages between the disparate application systems where needed. This approach means that the automation can be tailored specifically to the exact environment of the organization, on the down-side it can be time-consuming to find the necessary skills either internally or in the marketplace.
- Purchase of a specialist BPA tool
Specialist companies are now bringing toolsets to market which are purpose-built for the function of BPA. These companies tend to focus on different industry sectors but their underlying approach tends to be similar in that they will attempt to provide the shortest route to automation by exploiting the user interface layer rather than going deeply into the application code or databases sitting behind them. They also simplify their own interface to the extent that these tools can be used directly by non-technically qualified staff. The main advantage of these toolsets is therefore their speed of deployment, the drawback is that it brings yet another IT supplier to the organization.
- Purchase of a Business Process Management solution with BPA extensions
From the discussion below, it can be seen that a Business Process Management system is quite a different animal from BPA, however it is possible to build automation on the back of a BPM implementation. The actual tools to achieve this will vary, from writing custom application code to using specialist BPA tools as described above. The advantages and disadvantages of this approach are inextricably linked – the BPM implementation provides an architecture for all processes in the business to be mapped, but this in itself delays the automation of individual processes and so benefits may be lost in the meantime.
- Purchase of a Middleware solution
Business Process Automation (BPA) vs Business Process Management (BPM)
An area of discussion exists as to whether Business Process Automation is a distinct field of activity in its own right or merely a subset of a wider activity known as BPM. Given the similarity in terminology it is not surprising that most casual observers would believe them to be closely related if not identical. However, to experts in these areas they carry very distinct meanings, even if they are ultimately complementary concepts. To explain this further it is necessary to summarize the views of each camp:
The BPM camp asserts that before any process can be automated, it is necessary to define (often at a very strategic level or enterprise-wide) all of the business processes running inside an organization. From this the processes can be re-defined and where necessary optimized, including automation.
The BPA camp state that until a process is automated, there is no real value in analyzing and defining it, and that the cycle of business change is so rapid that there simply isn’t time to define every process before choosing which ones to address with automation, and that delivering immediate benefits creates more value.
There is no consensus amongst which view will prevail, however it can be seen that both perspectives are at least complementary to some extent. Process improvement methodologies such as Lean manufacturing and Six Sigma appear to align well with the Business Process Automation view of the world, as they constantly look for incremental opportunities to make processes more efficient and reduce defects, however these methodologies can also be used downstream of a BPM deployment. |
| C |
|
| Cache |
Memory used for the temporary storage of images during scanning. Caching images to an accelerator board can significantly improve scanner performance. Sometimes called prescan cache. |
Capture
|
The automated process of capturing images of documents from scanners, fax machines and multifunction products, as well as electronic documents from various sources; transforming them into accurate and valid information; and delivering the images and information into business applications and repositories to support business processes and archives. |
| Capture Forms |
Also known as data capture forms. Typical data capture forms have a fill-in-the-gaps format, often having boxes to enter details of company name, requisition number, etc. The data captured on the forms is then scanned or manually entered into a database. |
| Capture Software |
Software licences sold through resellers or directlly to end users as a primary capture solution, as opposed to OEM or Point of Sale (POS) Software. |
| Central Site |
The location where Capture Controlling Software is installed on a server. Typically, the central site processes captured documents and data sent to it from remote sites.
|
| Character Reconstruction |
The process of rebuilding text characters after a line removal operation (e.g., removing the signature line of a form, typically by using color dropout, so that only the signature remains). Helps maintain the integrity of the image to improve readability and OCR recognition.
|
| Checksum |
An additional character added to a bar code providing a mathematical method for determining the integrity of the bar code. Not all bar code types support a checksum. Of the bar code types that do, some support an optional checksum while others support a mandatory checksum. |
| Classification |
Automatically determining a document's type by examining its format and/or content. The process of sorting documents by type (e.g., invoice, purchase order, vacation request form) to determine which capture techniques and subsequent processing methods (if any) should be applied. Automated document classification utilizes two main categories of techniques: text-based, which analyzes the content of the document, and image-based, which evaluates its layout. Each category can be divided into self-learning (probabilistic) or manual (rules-based) techniques, giving four methods overall. No one method works best for all types of documents, so optimal results are achieved by applying a mix of techniques. |
| COB |
Continuity Of Business. Companies develop COB plans to determine how, while recovering from a natural or man-made disaster, they can continue to operate and serve customers. High availability computing typically is an essential component of COB planning. |
| Color Dropout |
The removal of a specific color from a scanned image. Typically used for preprinted forms, so that the form itself does not appear in the scanned image, leaving only the information filled in by the user. This increases OCR effectiveness, and decreases data storage and bandwidth requirements. |
| Communication Software |
Communication software is used to provide remote access to systems and exchange files and real-time messages in text, audio and/or video formats between different computers or user IDs. This includes terminal emulators and file transfer programs. |
| Compression |
A software or hardware process that shrinks images so they occupy less storage space and can be transmitted faster. Generally, compression is accomplished by removing data that define blank spaces and other redundant information, and replacing them with a smaller symbolic code.
|
| Compression Format |
The algorithm used to compress an image. There are a wide variety of compression formats such as CCITT Group 4, LZW, JPEG, PackBits, PCX, and others.
|
| Content Management |
Content Management, or CM, is a set of processes and technologies that support the evolutionary life cycle of digital information. This digital information is often referred to as content or, to be precise, digital content. Digital content may take the form of text, such as documents, multimedia files, such as audio or video files, or any other file type which follows a content lifecycle which requires management.
As of May 2009, the world's digital content is estimated at 487 billion gigabytes, the equivalent of a stack of books stretching from Earth to Pluto ten times.
The process of content management
Content management practices and goals vary with mission. News organizations, e-commerce websites, and educational institutions all use content management, but in different ways. This leads to differences in terminology and in the names and number of steps in the process. Typically, though, the digital content life cycle consists of 6 primary phases: create, update, publish, translate, archive and retrieve. For example, an instance of digital content is created by one or more authors. Over time that content may be edited. One or more individuals may provide some editorial oversight thereby approving the content for publication. Publishing may take many forms. Publishing may be the act of pushing content out to others, or simply granting digital access rights to certain content to a particular person or group of persons. Later that content may be superseded by another form of content and thus retired or removed from use.
Content management is an inherently collaborative process. It often consists of the following basic roles and responsibilities:
- Creator - responsible for creating and editing content.
- Editor - responsible for tuning the content message and the style of delivery, including translation and localization.
- Publisher - responsible for releasing the content for use.
- Administrator - responsible for managing access permissions to folders and files, usually accomplished by assigning access rights to user groups or roles. Admins may also assist and support users in various ways.
- Consumer, viewer or guest- the person who reads or otherwise takes in content after it is published or shared.
A critical aspect of content management is the ability to manage versions of content as it evolves (see also version control). Authors and editors often need to restore older versions of edited products due to a process failure or an undesirable series of edits.
Another equally important aspect of content management involves the creation, maintenance, and application of review standards. Each member of the content creation and review process has a unique role and set of responsibilities in the development and/or publication of the content. Each review team member requires clear and concise review standards which must be maintained on an ongoing basis to ensure the long-term consistency and health of the knowledge base.
A content management system is a set of automated processes that may support the following features:
- Import and creation of documents and multimedia material
- Identification of all key users and their roles
- The ability to assign roles and responsibilities to different instances of content categories or types.
- Definition of workflow tasks often coupled with messaging so that content managers are alerted to changes in content.
- The ability to track and manage multiple versions of a single instance of content.
- The ability to publish the content to a repository to support access to the content. Increasingly, the repository is an inherent part of the system, and incorporates enterprise search and retrieval.
Content management systems take the following forms:
- a web content management system is software for web site management - which is often what is implicitly meant by this term
- the work of a newspaper editorial staff organization
- a workflow for article publication
- a document management system
- a single source content management system - where content is stored in chunks within a relational database
Implementations
Content management implementation must be able to manage content distributions and digital rights in content life cycle. Content management systems are usually involved with Digital Rights Management Systems to be able to control user access and digital right. In this step the read only structures of Digital Rights Management Systems force some limitations on Content Management implementations as they do not allow the protected contents to be changed in their life cycle. Creation of new contents using the managed(protected) ones is also another issue which will get the protected contents out of management controlling systems. There are a few Content Management implementations covering all these issues.
|
| Content Management Systems |
A Content Management System (CMS) such as a document management system (DMS) is a computer application used to manage work flow needed to collaboratively create, edit, review, index, search, publish and archive various kinds of digital media and electronic text.
Content Management Systems are frequently used for storing, controlling, versioning, and publishing industry-specific documentation such as news articles, operators' manuals, technical manuals, sales guides, and marketing brochures. The content managed may include computer files, image media, audio files, video files, electronic documents, and Web content. These concepts represent integrated and interdependent layers. There are various nomenclatures known in this area: Web Content Management, Digital Asset Management, Digital Records Management, Electronic Content Management and so on. The bottom line for these systems is managing content and publishing, with a workflow if required.
Types of CMS
There are six main categories of CMS, with their respective domains of use:
- Enterprise CMS (ECMS)
- Web CMS (WCMS)
- Document management system (DMS)
- Mobile CMS
- Component CMS
- Media content management system
|
| Continuous Scanning |
The ability to scan extra long images. Requires a scanner that supports this feature. Software like ImageControls supports continuous scanning for images up to 32K scan lines in length or width, which is about 9 feet at 300 DPI or 13 feet at 200 DPI. Sometimes called long scanning. |
| Contrast |
The range between the lightest and darkest shades in an image. A high-contrast image has fewer gray shades between black and white; a low contrast-image has more gray shades. Contrast determines how many gray shades are scanned and density determines the intensity of those shades. Contrast is sometimes called sensitivity. |
| D |
|
| Data Capture |
Data capture is the identification and extraction of data from a scanned document, often to be sent to a workflow for routing and action as part of a business process.
Data capture involves and is sometimes confused with optical character recognition (OCR). However, data capture software is more complex and valuable because it captures specific, targeted data – usually from a form – that is required to support a business process. In comparison, OCR is the basic conversion of any scanned alphanumeric information into a machine readable digital form.
Why Data Capture Alone is Not Enough
The basic capture of data from structured forms is a well-understood process. (A structured form is one where both the type of information and its location on the form are known in advance.) However, most companies also receive a large number of forms such as invoices from other organizations; the relevant data on these forms could be almost anywhere on the page.
In the case of invoices, data capture alone does not identify where the important pieces of information (vendor, address, items, prices, payment terms, and so on) are on the page. And it does not match the invoices with the corresponding purchase orders.
Also, data capture results depend on the image quality of the scanned documents. Documents that have colored or patterned backgrounds, that have been marked with highlighter pens, or that are crooked when scanned can yield poor OCR results. Fixing these bad results means either adjusting the scanner settings and rescanning the document (perhaps multiple times) or manually keying in corrections to the electronic data.
|
| Data Capture Forms |
Also known as simply capture forms. Typical data capture forms have a fill-in-the-gaps format, often having boxes to enter details of company name, requisition number, etc. The data captured on the forms is then scanned or manually entered into a database. |
| Data Entry Software |
Data Entry Software automates your document flow, which results in reduced processing costs, reduced processing time and improved accuracy. Automatic data entry software enables enormous savings in time and money when entering invoices and forms into a computer system. This software also reduces human errors to a minimum and thus increases the data quality. By using data entry software for Automatic Document Capture (or Automatic Data Capture as it is also called), information from forms or invoices is automatically captured, interpreted and transferred into your computer system. The automatic data entry software does this in two ways: OCR (Optical Character Recognition) deals with machine print while ICR (Intelligent Character Recognition) allows data entry software to recognize handwritten characters. |
| Data Management |
Data management comprises all the disciplines related to managing data as a valuable resource. |
| Day Forward |
A document imaging implementation option in which only records received or created after a cut-off date are scanned and converted to a digital format. Under this option, typically there is no backfile conversion of currently stored documents, usually because of cost, volume (too many or too few) or logistical considerations. However, the user can elect to scan documents created before the cut-off date and add them to its digital depository whenever they are needed and pulled from storage. |
| Density |
The balance of light and dark shades in a scanned image. A high density setting produces a light image; a low density setting produces a dark image. Contrast determines how many gray shades are scanned and density determines the intensity of those shades. Density is sometimes called brightness or intensity. |
| Departmental Scanners |
Scanners that fall between workgroup scanners and production scanners in their page per minute capacity. Departmental scanners typically can scan from 26 to 40 pages per minute. |
| Deshade |
The removal of unwanted shaded areas from an image. Removes speckles that make up a shaded area while preserving any text it contains. |
| Deskew |
The straightening of an image that appears crooked. Skewed images typically result when a misaligned document passes through a scanner. |
| Despeckle |
The removal of unwanted dots or speckles from an image. |
| Destreak |
The removal of unwanted streaks from an image. Each horizontal scan line of an image is processed to detect runs of black pixels. Any series of black pixels less than or equal to the specified streak width is removed. Although streak removal processes images horizontally, its primary purpose is to eliminate vertical streaks. |
| Distributed Capture |
Capturing documents at geographically distributed locations and automatically transmitting them to a central location for further processing or archiving. The process of scanning documents at remote sites (a.k.a., branch offices), then transmitting their digital images via the Internet or WAN to a central location for additional processing or storage. This eliminates the need—and accompanying infrastructure and cost—to ship paper documents from place to place. |
Dithering
|
The process of simulating a color unavailable on a system or program’s palette by using two or more available colors that, when combined, approximate it. |
| Document Capture |
The process of capturing a digital image of an entire document. Most frequently used for archival storage. (see also: data capture) |
| Document Imaging |
Document imaging transforms paper documents into electronic images that can be used in computer-based business applications and archives. A document imaging system includes a document scanner, multifunction peripheral (MFP), fax machine or other scanning device, plus software to handle the image produced by the device. A successful document imaging system also includes image enhancement or image processing software such as VRS (VirtualReScan) to automatically improve the images produced by the scanner.
Why Document Imaging Alone is Not Enough
Companies that rely on a document driven business process must do more than simply convert paper documents to electronic images. For example, a company that wants to automate its invoice process must also:
- convert printed or hand-printed text from the invoice into electronic data that can be used in a computer system (using OCR software or ICR software);
- identify the important pieces of information (vendor, address, items, prices, payment terms, and so on);
- verify that the OCR software has converted the information correctly;
- match the invoices with the corresponding purchase orders; and
deliver the validated information into the appropriate business system in the correct format.
|
| Document Imaging System |
A document imaging system transforms paper documents into electronic images that can be used in computer-based business applications and archives. A document imaging system includes a document scanner, multifunction peripheral (MFP), fax machine or other scanning device, plus software to handle the image produced by the device. A successful document imaging system also includes image enhancement or image processing software such as VRS (VirtualReScan) to automatically improve the images produced by the scanner.
Why Document Imaging Alone is Not Enough
Companies that rely on a document driven business process must do more than simply convert paper documents to electronic images. For example, a company that wants to automate its invoice process must also:
- convert printed or hand-printed text from the invoice into electronic data that can be used in a computer system (using OCR software or ICR software);
- identify the important pieces of information (vendor, address, items, prices, payment terms, and so on);
- verify that the OCR software has converted the information correctly;
- match the invoices with the corresponding purchase orders; and
- deliver the validated information into the appropriate business system in the correct format.
With a document imaging system alone, these important steps require time-consuming and costly manual effort. |
| Document Management |
Document management controls the life cycle of documents in your organization — how they are created, reviewed, published, and consumed, and how they are ultimately disposed of or retained. Although the term "management" implies top-down control of information, an effective document management system should reflect the culture of the organization using it. The tools you use for document management should be flexible, allowing you to tightly control documents' life cycles if that fits your enterprise's culture and goals, but also letting you implement a more loosely structured system if that better suits your enterprise. A well-designed document management system promotes finding and sharing information easily. It organizes content in a logical way, and makes it easy to standardize content creation and presentation across an enterprise. It promotes knowledge management and information mining. It helps your organization meet its legal responsibilities. It provides features at each stage of a document's life cycle, from template creation to document authoring, reviewing, publishing, auditing, and ultimately destroying or archiving.
The elements of a document management system
An effective solution specifies:
- What types of documents and other content can be created within an organization.
- What templates to use for each type of document.
- What metadata to provide for each type of document.
- Where to store documents at each stage of a document's life cycle.
- How to control access to a document at each stage of its life cycle.
- How to move documents within the organization as team members contribute to the documents' creation, review, approval, publication, and disposition.
- What policies to apply to documents so that document-related actions are audited, documents are retained or disposed of properly, and content important to the organization is protected.
- How documents are converted as they transition from one stage to another during their life cycles.
- How documents are treated as corporate records, which must be retained according to legal requirements and corporate guidelines.
|
| Document Management Software |
Document management software generally consists of a repository and client software designed for the storage and retrieval of documents in electronic formats, including TIFF images, PDF files and a variety of office document formats. Document management differs from content management in that it focuses on document formats, while content management systems are designed to handle many other types of information, including graphics, multimedia and so on. Software for document management is often seen as a way to avoid the cost and effort of maintaining paper documents in physical filing systems. This software is often superior to physical storage and manual retrieval because it enables immediate access to stored documents across departmental and geographic boundaries. Document management software can be used to replace filing cabinets as a long term archive, but it is increasingly used in conjunction with workflow software to drive real-time business processes that would otherwise require the physical processing of paper-based information.
Why Document Management Software is Not Enough
Because the files stored in a document management system are electronic, paper documents must be converted to an electronic format for storage in such a system. This is generally done with a document scanner and document capture software, which captures an electronic image of the document and enables the addition of index terms or keywords that make it easier to find and retrieve the document when it is needed |
| Document Management Systems |
A document management system (DMS) is a computer system (or set of computer programs) used to track and store electronic documents and/or images of paper documents. The term has some overlap with the concepts of content management systems and is often viewed as a component of enterprise content management (ECM) systems and related to digital asset management, document imaging, workflow systems and records management systems. |
| Document Scanning |
Document scanning transforms paper documents into electronic images that can be used in computer-based business applications and archives. A document scanning system includes a document scanner, multifunction peripheral (MFP), fax machine or other scanning device, plus software to handle the image produced by the device. A successful document scanning system also includes image enhancement or image processing software such as VRS (VirtualReScan) to automatically improve the images produced by the scanner.
Why Document Scanning Alone is Not Enough
Companies that rely on a document driven business process must do more than simply convert paper documents to electronic images. For example, a company that wants to automate its invoice process must also:
- convert printed or hand-printed text from the invoice into electronic data that can be used in a computer system (using OCR software or ICR software);
- identify the important pieces of information (vendor, address, items, prices, payment terms, and so on);
- verify that the OCR software has converted the information correctly;
- match the invoices with the corresponding purchase orders; and
- deliver the validated information into the appropriate business system in the correct format.
With document scanning alone, these important steps require time-consuming and costly manual effort.
|
| Document Scanning Software |
Document scanning software transforms paper documents into electronic images that can be used in computer-based business applications and archives. A document scanning system includes a document scanner, multifunction peripheral (MFP), fax machine or other scanning device, plus software to handle the image produced by the device. A successful document scanning system also includes image enhancement or image processing software such as VRS (VirtualReScan) to automatically improve the images produced by the scanner.
Why Document Scanning Software Alone is Not Enough
Companies that rely on a document driven business process must do more than simply convert paper documents to electronic images. For example, a company that wants to automate its invoice process must also:
- convert printed or hand-printed text from the invoice into electronic data that can be used in a computer system (using OCR software or ICR software);
- identify the important pieces of information (vendor, address, items, prices, payment terms, and so on);
- verify that the OCR software has converted the information correctly;
- match the invoices with the corresponding purchase orders; and
- deliver the validated information into the appropriate business system in the correct format.
With document scanning software alone, these important steps require time-consuming and costly manual effort. |
| Document Scanning System |
A document scanning system transforms paper documents into electronic images that can be used in computer-based business applications and archives. A document scanning system includes a document scanner, multifunction peripheral (MFP), fax machine or other scanning device, plus software to handle the image produced by the device. A successful document scanning system also includes image enhancement or image processing software such as VRS (VirtualReScan) to automatically improve the images produced by the scanner.
Why a Document Scanning System Alone is Not Enough
Companies that rely on a document driven business process must do more than simply convert paper documents to electronic images. For example, a company that wants to automate its invoice process must also:
- convert printed or hand-printed text from the invoice into electronic data that can be used in a computer system (using OCR software or ICR software);
- identify the important pieces of information (vendor, address, items, prices, payment terms, and so on);
- verify that the OCR software has converted the information correctly;
- match the invoices with the corresponding purchase orders; and
- deliver the validated information into the appropriate business system in the correct format.
With a document scanning system alone, these important steps require time-consuming and costly manual effort.
|
| Double-Sided Scanning |
A scanning process that scans one side of a page, flips the page, then scans the other side. Compare to: duplex scanning. |
| Download |
Process by which published batch classes and settings are transferred from the central site to a remote station. |
| DPI |
Dots Per Inch. See resolution.
|
| Driver |
Software code that enables a computer to communicate with peripheral devices (e.g., scanner, printer). |
| Duplex Scanning |
Scanning both sides of a page simultaneously on a single pass through the scanner. Requires a scanner that supports this feature. Compare to: double-sided scanning. |
| Dynamic Scaling |
A set of display options that allow scaling factors to be determined by the size of the display window. For example, an image can be displayed so that the entire width of the image fits the width of the display. Or, it can be displayed so that the entire height fits the height of the display. |
| E |
|
| Edge Enhancement |
A digital processing filter that makes lines and characters in an image appear sharper, making them easier to read or understand. |
| eDocument |
An electronic document. Such documents might have been converted from paper to a digital format, or originally created in a digital format.
|
| Electronic Data Capture |
An Electronic Data Capture (EDC) system is a computerized system designed for the collection of data in electronic format. Typically, EDC systems provide 1) a graphical user interface component for data entry, 2) a validation component to check user data, and 3) a reporting tool for analysis of the collected data. |
| Element |
The rectangular black bars, or spaces between the bars, that make up a bar code.
|
| Endorser |
A scanner attachment that prints a text string on a page as it is scanned. Sometimes called an imprinter. |
| Endorser String |
A text string mechanically printed on a page as it is scanned. See annotation. |
| Engine |
Run-time software that enables the functionality integrated into an application through the use of a toolkit. |
| Enterprise Content Management |
Enterprise Content Management (ECM) is the strategies, methods and tools used to capture, manage, store, preserve, and deliver content and documents related to organizational processes. ECM tools allow the management of an enterprise level organization's information.
Enterprise content management systems combine a wide variety of technologies and components, some of which can also be used as stand-alone systems without being incorporated into an enterprise-wide system.
The five Enterprise Content Management components and technologies
The ECM model were first defined by AIIM as follows:
- capture
- manage
- store
- preserve
- deliver
The model includes in the "Manage" category five traditional application areas:
- document management (DM),
- collaboration (or collaborative software, groupware),
- web content management (WCM) (including web portals),
- records management (RM) (archive and filing management systems on long-term storage media), and
- workflow/business process management (BPM).
These "Manage" components connect capture, store, deliver and preserve and can be used in combination or separately. While document management, web content management, collaboration, workflow and business process management are more for the dynamic part of the life cycle of information, records management takes care of information which will no longer be changed. The utilization of the information is paramount throughout, whether through independent clients of the ECM system components, or by enabling existing applications that access the functionality of ECM services and the stored information. The integration of existing technologies makes it clear that enterprise content management is not a new product category, but an integrative force.
|
| F |
|
| Fax Server |
A fax server is designed to replace one or more fax machines with dedicated software/hardware that interfaces with other computer systems, such as email. A standard fax machine scans one or more sheets of paper and sends an image of those pages across telephone lines to another fax machine, which prints them. In comparison, a fax server can often enable a user to send a fax directly from their everyday office software and receive faxes to their email inbox. |
| Fax Server - Fax over IP |
Increasingly, fax servers forgo telephone lines entirely in favor of “Fax over IP”, which sends faxes over an IP network such as the Internet. A fax server using Fax over IP can save a company significant costs, as the company no longer needs to pay for multiple dedicated telephone lines and multiple physical fax machines. |
| Fax-Over-IP |
Enabling a fax machine or MFP to send and receive faxes via an internet connection rather than a phone line. |
| Fixed Scaling |
An option that allows images to be resized to a precise scale ratio. |
| Foot Pedal |
A mechanical lever on some scanners used to control its operation. For example, some foot pedals allow operators to start and stop the scan operation by pressing it with their foot. |
| Forms Processing |
The ability for software to accept scanned forms and extract data from the boxes and lines to populate databases. Software usually includes the ability to drop out the form so that recognition accuracy improves. |
| Freeform Data Capture |
Capturing data from unstructured documents. |
| Front Office Capture |
Capturing and processing documents as close as possible to where they originate, as opposed to in centralised, back-office environments. |
| G |
|
| Grayscale Image |
An image that consists of white, black and shades of gray.
|
GUI
|
Graphical user interface. The icons, symbols and other graphical elements workers see on their computer or MFP screens that allows them to interact with the device. |
| H |
|
| High Availability |
A high availability system will continue to operate even if one or more of its components fail. Refers to the percentage of time a computer system remains in continuous operation for mission-critical applications. Often referred to in “nines.” “Five nines” is a system available 99.999% of the time, which equates to 5 minutes and 44 seconds of allowable downtime per year, or less than one second per day. “Four nines” is a system available 99.99% of the time, which equates to 53 minutes of allowable downtime. |
| I |
|
| ICR |
Intelligent Character Recognition. The ability of software to recognize and translate handwritten characters into machine-readable text.
|
| Image |
The digitized representation of a picture, graphic, or document. |
| Import |
The process of loading existing electronic image files into the system or application. |
Indexing
|
The process of assigning descriptive data to a captured document (e.g., name, date, location, transaction number, customer ID), either by having a worker type in terms or by having software extract them automatically, so it can be located and retrieved easily at a later time. |
| Information Management |
Information management (IM) is the collection and management of information from one or more sources and the distribution of that information to one or more audiences. This sometimes involves those who have a stake in, or a right to that information. Management means the organization of and control over the structure, processing and delivery of information. Information management entails organizing, retrieving, acquiring and maintaining information. It is closely related to and overlapping with the practice of Data Management.
|
| Input Management |
Input management is a term coined by a capture company in 2005 in an attempt to differentiate their products from those of their competitors. In fact, input management is merely another term for document capture and data capture from paper and electronic sources.
Input management does not focus on the difficult problem of intelligently identifying the important pieces of information on the incoming documents and forms. For example, on an invoice, it does not intelligently identify the vendor, address, items, prices, payment terms, and so on, and it does not match the invoices with the corresponding purchase orders. |
| ISIS |
Image and Scanner Interface Specification. A driver used primarily with production scanners. First released in 1990 by Pixel Translations (now a part of EMC). |
J
|
|
Jog position
|
The position of one side or the other in a printer's output tray for the printed copies. Jog positions typically are used for collating operations, and are not available on all printers. |
K
|
|
Key from Image
|
Refers to a data entry method in which operators manually key data from the scanned image of a document appearing on their computer screens, rather than from the paper original. |
| L |
|
| Landscape |
Page orientation where the page width exceeds the page length. See orientation.
|
| Learn-by-Example |
A technique for improving the performance of an automated system by "teaching" it the correct results for a set of known documents, and then improving its knowledge further as new examples are encountered during actual use. |
| LOB |
Line of Business. |
| M |
|
| MFP |
Multi-function peripheral. Office equipment that incorporates two or more functions (print, fax, scan, copy, etc.) in a single device. Also referred to as a multi-function device (MFD). |
| Multifunction Product (MFP) |
Office equipment that acts as a scanner, fax machine, printer and copier, often attached to a network and supporting a workgroup, department or branch office. Sometimes called a multifunction peripheral, multifunction printer, multifunction device (MFD) or all-in-one. |
| N |
|
| Network Load Balancing |
A Microsoft-developed technology that balances network traffic across a number of hosts, enhancing the scalability and availability of mission critical applications. It also provides high availability by detecting host failures and automatically redistributing traffic to operational hosts.
|
| Noise |
Extraneous speckles on an image. |
| O |
|
| OCR |
OCR stands for optical character recognition. OCR transforms printed or hand-printed text into electronic data that can be used in a computer system.
All OCR starts with an electronic image of the text, usually created with a document scanner. Some people think of this as an OCR scanner, but the OCR is actually performed by OCR software after scanning. The scanner only produces an image of the document, much like taking a picture of it.
The OCR software then examines the image of the scanned document; identifies each letter, number and punctuation mark; and produces equivalent text in a machine-readable digital form that can be used by a computer system.
Why OCR Alone is Not Enough
OCR is extremely accurate for machine-printed or typewritten text. A related technology, ICR (intelligent character recognition) can convert clearly written hand-printed text. But OCR alone is not enough when a company or government agency must deal with documents as part of a business process.
Companies that rely on a document-driven business process must do more than simply convert written text to digital text. For example, just doing OCR for invoices does not identify the important pieces of information (vendor, address, items, prices, payment terms, and so on). It does not verify that the OCR software has converted the information correctly. And it does not match the invoices with the corresponding purchase orders. With OCR alone, these important steps require time-consuming and costly manual effort.
Also, OCR results depend on the image quality of the scanned documents. Documents that have colored or patterned backgrounds, that have been marked with highlighter pens, or that are crooked when scanned can yield poor OCR results. Fixing these bad results means either adjusting the scanner settings and rescanning the document (perhaps multiple times) or manually keying in corrections to the electronic data. |
| OCR Software |
OCR software transforms printed or hand-printed text into electronic data that can be used in a computer system. (OCR stands for optical character recognition.)
All OCR software starts with an electronic image of the text, usually created with a document scanner. Some people think of this as an OCR scanner, but the OCR is actually performed by optical character recognition software after scanning. The scanner only produces an image of the document, much like taking a picture of it.
The OCR software then examines the image of the scanned document; identifies each letter, number and punctuation mark; and produces equivalent text in a machine-readable digital form that can be used by a computer system.
Why OCR Software Alone is Not Enough
OCR software is extremely accurate for machine-printed or typewritten text. A related technology, ICR (intelligent character recognition) can convert clearly written hand-printed text. But OCR alone is not enough when a company or government agency must deal with documents as part of a business process.
Companies that rely on a document-driven business process must do more than simply convert written text to digital text. For example, just doing OCR for invoices does not identify the important pieces of information (vendor, address, items, prices, payment terms, and so on). It does not verify that the OCR software has converted the information correctly. And it does not match the invoices with the corresponding purchase orders. With OCR alone, these important steps require time-consuming and costly manual effort.
Also, OCR software results depend on the image quality of the scanned documents. Documents that have colored or patterned backgrounds, that have been marked with highlighter pens, or that are crooked when scanned can yield poor OCR results. Fixing these bad results means either adjusting the scanner settings and rescanning the document (perhaps multiple times) or manually keying in corrections to the electronic data. |
| ODBC |
Open Database Connectivity (ODBC) is a standard software API specification for using database management systems (DBMS). ODBC is independent of programming language, database system and operating system. The goal of ODBC is to make it possible to access any data from any application, regardless of which DBMS handles the data. |
| OEM Software |
Software that is licensed for distribution by other manufacturers for incorporation into their own products. |
| OMR |
Optical mark recognition. Recognizes pencil or pen marks in specific document positions (such as filled-in check boxes or circles) and translates the marks into computer-readable data. |
| Optical Character Recognition |
Optical Character Recognition software transforms printed or hand-printed text into electronic data that can be used in a computer system. (Optical character recognition is usually abbreviated as OCR.) All optical character recognition software starts with an electronic image of the text, usually created with a document scanner. Some people think of this as an optical character recognition scanner, but the optical character recognition is actually performed by optical character recognition software after scanning. The scanner only produces an image of the document, much like taking a picture of it. The optical character recognition software then examines the image of the scanned document; identifies each letter, number and punctuation mark; and produces equivalent text in a machine-readable digital form that can be used by a computer system.
Why Optical Character Recognition Software Alone is Not Enough
Optical character recognition software is extremely accurate for machine-printed or typewritten text. A related technology, ICR (intelligent character recognition) can convert clearly written hand-printed text. But optical character recognition alone is not enough when a company or government agency must deal with documents as part of a business process. Companies that rely on a document-driven business process must do more than simply convert written text to digital text. For example, just doing optical character recognition for invoices does not identify the important pieces of information (vendor, address, items, prices, payment terms, and so on). It does not verify that the optical character recognition software has converted the information correctly. And it does not match the invoices with the corresponding purchase orders. With optical character recognition alone, these important steps require time-consuming and costly manual effort. Also, optical character recognition software results depend on the image quality of the scanned documents. Documents that have colored or patterned backgrounds, that have been marked with highlighter pens, or that are crooked when scanned can yield poor optical character recognition results. Fixing these bad results means either adjusting the scanner settings and rescanning the document (perhaps multiple times) or manually keying in corrections to the electronic data. |
| Optical Character Recognition Software |
Optical Character Recognition software transforms printed or hand-printed text into electronic data that can be used in a computer system. (Optical character recognition is usually abbreviated as OCR.) All optical character recognition software starts with an electronic image of the text, usually created with a document scanner. Some people think of this as an optical character recognition scanner, but the optical character recognition is actually performed by optical character recognition software after scanning. The scanner only produces an image of the document, much like taking a picture of it. The optical character recognition software then examines the image of the scanned document; identifies each letter, number and punctuation mark; and produces equivalent text in a machine-readable digital form that can be used by a computer system.
Why Optical Character Recognition Software Alone is Not Enough
Optical character recognition software is extremely accurate for machine-printed or typewritten text. A related technology, ICR (intelligent character recognition) can convert clearly written hand-printed text. But optical character recognition alone is not enough when a company or government agency must deal with documents as part of a business process. Companies that rely on a document-driven business process must do more than simply convert written text to digital text. For example, just doing optical character recognition for invoices does not identify the important pieces of information (vendor, address, items, prices, payment terms, and so on). It does not verify that the optical character recognition software has converted the information correctly. And it does not match the invoices with the corresponding purchase orders. With optical character recognition alone, these important steps require time-consuming and costly manual effort. Also, optical character recognition software results depend on the image quality of the scanned documents. Documents that have colored or patterned backgrounds, that have been marked with highlighter pens, or that are crooked when scanned can yield poor optical character recognition results. Fixing these bad results means either adjusting the scanner settings and rescanning the document (perhaps multiple times) or manually keying in corrections to the electronic data. |
| Orientation |
The relative direction of a page, either horizontal (called landscape) or vertical (called portrait). |
| P |
|
| Padding Images |
The process of adding white space from the end of the scanned image to the length of the selected paper size. This can only occur when using the ADF to feed pages into the scanner, and only with scanners that support this feature. |
| Panel |
The area on a scanner or MFP where instructions to control its operation are entered. |
| Panning |
The ability to move a zoomed image within the display window with the mouse. |
| Patch Code |
A pattern of horizontal black bars separated by spaces. Typically, a patch code is placed near the top center of a paper document to be scanned and used as a document separator. |
| Patch Code Trigger |
A patch code that when detected causes the application to perform a predefined action. |
| Point of Sale (POS) Software |
Packaged software that is licensed at the point of sale by a reseller or distributor. Sometimes called shrink-wrap software. |
| Portrait |
Page orientation where the page length exceeds the page width. |
| Production Scanners |
High volume scanners, typically capable of processing from 40 to 100+ pages per minute. See also workgroup scanners and departmental scanners. |
| Q |
|
Quiet Zone
|
In a bar code, an unmarked area preceding the leading bar and following the last bar.
|
| R |
|
| Rated Speed |
The theoretical maximum number of pages that can pass through a printer each minute. Often lowered by real-world considerations such as paper jams, poorly scanned documents that need rescanning, document prep, loading time and maintenance. |
| Redaction |
A type of document annotation that conceals from view specific portions of sensitive documents. |
| Release |
Process by which completed batches are transferred from the Capture program to another application. Once released, the batches are not available to the Capture program. Batches can be released only from the the Capture Release module installed at the central site. |
| Release Script |
Software needed to export documents and data from capture software to other programs (e.g., ECM, CRM, ERP) for additional processing or storage. It’s called a release script because a computer programming language known as a scripting language is used to write it. Scripting languages (e.g., Visual Basic for Applications, Perl, Tcl) are used to accomplish a common computer programming task known as scripting. Scripting is a means of connecting existing software components to accomplish a new, but related task. For that reason, scripts often are considered the glue that holds components together. In this case, they’re the glue that connects the capture component to whatever other software component (e.g., ECM, CRM, ERP, etc.) an organization might like to release captured documents and data to.
|
| Remote Site |
Site where an Capture remote station is located. |
| Remote Station |
Workstation at a remote site on which a Capture program and a Remote Synchronization Agent are installed. |
| Remote Synchronization Agent |
Remote station software used to synchronise Capture batch processing activity between the central site and the remote site. Includes software that runs in the browser and also a system tray application.
|
| Resolution |
The fineness or coarseness of an image as it is scanned, printed, or displayed. It is measured in dots per inch (DPI), typically from 200 to 400 DPI, although ImageControls supports lower and higher resolutions. A higher resolution results in a better image; however, more storage space is required for the high resolution image files. |
| S |
|
| Saturation |
The ability to emphasize horizontal and/or vertical pixels when scaling an image for display or print. |
| Scale to Gray |
A process that makes a scanned image more readable on a computer display. When a scaled image is displayed on a low resolution monitor (for example, VGA), some of the image data is lost causing slanted lines and text characters to appear jagged. The scale to gray feature causes blocks of pixels around the edges of lines and letters to be replaced with a representative pixel of a gray shade, thus improving the appearance of jagged lines and text.
|
Scan-Enable
|
Adding scanning functionality to a software program that previously lacked this capability. A Document Scan Server (DSS) allows developers to add a “SCAN” button to numerous software programs. Workers using these scan-enabled programs no longer need to switch to a dedicated scan application in order to scan a document. |
| Scan-On-Demand |
A method of implementing a document imaging system where documents are scanned only when they’re needed and retrieved from storage. Compare to backfile and day forward processes. |
| Semi-Structured Document |
A document that includes known types of data, but where on the page this data is positioned is not known. An example is an invoice. It’s known that an invoice must include an amount and date due, but since every company is free to create their own invoices, there’s no way to know where they might position this information on the page. (See also structured document and unstructured document.) |
| Separation |
Automatically determining where one document ends and the next begins. |
| Separator Sheets |
A sheet of paper manually inserted between documents before scanning that indicates the start of a new batch or document. Typically has a barcode or patch code printed on it. |
| Skew |
The result of feeding a document into the scanner at an angle, producing an image that is not square with the page.
|
| SOA |
Service Oriented Architecture. SOA isn’t a product, technology or standard. Rather, it’s a style of software architecture that supports building applications out of linked together services. Services are components that perform a business process, such as extracting data from an invoice. Services are loosely coupled and do not depend on one another. Individual components can be changed without impacting the entire system. Web services are a way to implement components that make up a SOA. With web services, applications can call one another over a network in a secure fashion wherever they reside, whatever operating system they’re on, using whatever architecture is under the hood. SOA will help organizations keep up with an accelerating business environment by allowing them to look past technological limitations and instead focus on business needs and businesses processes.
|
| Speckle |
A group of black pixels of a defined height and width surrounded by white pixels (or white pixels surrounded by black pixels). |
| Straight Through Processing |
An automated workflow that a scanned document can pass through to completion without the need for manual intervention.
|
| Structured Document |
A document for which both the type (number, letter, check mark, etc.) and location of data is known before scanning. For example, the data field for line 35 of IRD tax form IR-5, positioned on the lower right corner of the page, will always contain a number. (See also semi-structured document and unstructured document.)
|
| Synchronisation |
Process by which batch classes and other settings at the central site are downloaded to remote stations. If completed batches exist at the remote station, the batches are uploaded to the central site. |
| |
|
| T |
|
| Thick Client |
Fully featured computer connected to a network. Can perform most processing functions on its own, and becomes a “client” of the server only when it needs to access programs or files not stored on its hard disk. |
| Thin Client |
A thin client is a computer with little or no processing power. Instead, thin clients rely on the central server to which they are connected for processing activities, and are used primary to exchange data with the server. Thin clients frequently lack a hard disk, and thus are cheaper to buy and maintain than thick clients. |
Thresholding
|
When converting a pixel from grayscale to black and white, the threshold is the gray value above which it will be considered white, and below or equal to it will be considered black. |
| TIFF |
Tagged Image File Format, the primary format used by document imaging systems. Incorporates several forms of compression. Can store multiple page documents as a single file (as opposed to creating separate files for each page). |
| Transaction Capture |
Capturing information or data from documents and forms specifically to initiate, continue or conclude a business process. |
| Transformation |
The process of automatically and intelligently extracting, classifying, indexing and validating information from documents and forms. |
| TWAIN |
A technology standard defining how images are acquired from a document scanner. Not an acronym; taken from a line of Rudyard Kipling’s poetry “…and never the twain shall meet.” Twain is a archaic form of the word two. The term was chosen because in the time before the standard was first released (1992), there had been a great deal of difficulty getting two devices—the scanner and the PC— to communicate with each other: Unofficial acronyms include: Technology Without An Interesting Name. |
| U |
|
| Unstructured Documents |
A type of document for which, prior to scanning, both the type and location of the information it contains is unknown. Documents that cannot be identified as structured or semi-structured are assigned to this category They could be virtually any type of document: correspondence, petitions, advertisements, manuals, brochures or annual reports. Some estimates classify up to 80% of the paper in circulation at organizations as unstructured documents. |
Upload
|
Process by which completed batches are transferred from the remote site to the central site.
|
| V |
|
| Validation |
A process that raises the probability that captured data is correct. Although sometimes performed manually, faster and more accurate results are achieved when validation is automated. Frequently this is accomplished by comparing captured data to information contained in a database. For example, the city and Post code from an order can be checked against a postal service database to be sure they match. If so, the order can continue to be processed. If not, the order can be flagged and presented to an operator for correction before further processing. Logical validation procedures also can be employed. For example, an IRD number consists of nine numbers. If a captured IRD contains more or fewer numbers, or if a letter or symbol is detected as one of the characters, validation fails and the data must be reviewed and corrected. |
| Verification |
A procedure used when captured information absolutely must be correct. Typically, two or more operators will key data from the same captured image. If their entries don’t match, verification fails and the data must be reviewed and corrected.
|
| Voting |
A technique used to improve OCR/ICR accuracy. Multiple OCR/ICR engines, each utilizing a different identification algorithm, independently evaluate an alpha-numeric character and cast a “vote” with their best guess as to its identity. If the voting is not unanimous, the character receiving the most “votes” wins. |
| W |
|
| Web Server |
A computer that runs an application which allows files to be transferred over the Internet to a client machine. Web server programs operate by accepting HTTP requests from the network, and providing an HTTP response to the requester. The HTTP response typically consists of an HTML document, but can be a raw text file, an image, or some other type of document. |
| Web Services |
Web services provide a standard means of interoperating between different software applications, running on a variety of platforms and/or frameworks. Web services are characterized by their great interoperability and extensibility, as well as their machine-processable descriptions thanks to the use of XML. They can be combined in a loosely coupled way in order to achieve complex operations. Programs providing simple services can interact with each other in order to deliver sophisticated added-value services. |
| Wet Signature |
The use of a pen to sign a document by hand. Often a requirement to make the document legally binding. |
White Noise
|
Extraneous white speckles in a black portion of an image. Typically used to describe speckles in a black border. |
| White Noise Gap |
A tolerance value for the size of white speckles in a black border. |
| Workflow |
Automating the procedures for handling business processes. Workflow systems are usually based on electronic versions of documents - how they are routed through departments in a company; which transactions have to be accomplished in which order, what to do about executions and mistakes - are all workflow concerns. Among the many tasks performed, workflow software generally schedules processing, routes documents automatically among users and tracks document status. |
| Workgroup Scanner |
A low volume scanner, typically capable of scanning 10 to 25 pages per minute. See also departmental scanner and production scanner. |