
The capabilities of standard optical character recognition (OCR)-based software has increased dramatically in the past decade. Many don’t realize that semi- and unstructured forms processing has been successfully in use for several years. And auto-classification is becoming more commonplace in the document processing market as this technology evolves. With conceptual classification methods, more powerful recognition engines and the development of “memory” technologies, automating the identification of and data capture from semi- and unstructured documents including patient records, invoices and checks – which was previously not feasible – is now a proven and effective alternative to time-consuming and costly manual processes.
Traditionally, by utilizing proper template design, an organization could readily automate the extraction of data from static locations on structured documents such as credit applications or tax forms – eliminating the need for manual data entry. On these structured documents, the data is located in the same place on every form. Standard OCR solutions use a static template-based process, where users define the desired type of data to be extracted based on the location on the form using zones. Other elements on the form, such as company logos, field descriptions (e.g., ‘Street Address’) and lines are excluded from processing. This is a solid method of collecting the data you need from consistent – therefore structured – forms.
Frequently, however, an organization’s crucial data is received on semi- or unstructured documents – just like patient records, remittance statements and more. These unsorted documents typically arrive in a central location such as a company mailroom, requiring an employee to manually sort and distribute them. And the common data you need to collect often appears in a different location on each of these documents from one vendor to the next. But advancements in ECM technology now enable the automation and streamlining of the classification, distribution and capture process for these semi- and unstructured documents. As a result, stacks of documents of all types can be scanned and automatically identified, sorted and routed to their appropriate destination. Document sets can also be created and data captured and validated.
To maximize the value of available ECM technology and your own organizational resources – as well as save you time, money and frustration – you should be sure to select an automated solution that does not rely on manual processes or static templates to classify and capture data off of your documents. Here are ten characteristics to consider when selecting a solution for automating the processing of your accounting and other business documents.
1. Document Image Classification
Look for a solution that uses a combination of document attributes, images, text or keywords and conceptual classification to identify documents. The solution should also be capable of training itself as it goes, ensuring bottlenecks do not occur when a new document type if encountered. To ensure ease-of use, look for GUI-based handling and user-friendly wizards to guide you through the setup process – making it easy to effectively deploy document classification without any complex programming.
2. Intelligent Data Collection
The most effective solutions use a combination of logic, keywords and business rules to locate and capture the correct data, no matter where it may be found within the document’s page(s). In addition, ask if the solution contains a “memory” technology that remembers the location of where data was found on a particular document format – such as an invoice from a specific vendor. This memory feature speeds document processing by remembering where the data was located the next time that specific document format is received in the future.
3. Single or Multiple Page Processing
Many unstructured documents, such as healthcare, mortgage or accounting documents, can vary in length, often containing more than one page, so a solution that can handle multiple page documents is essential. A complete solution recognizes this as one logical document throughout the entire process – from scan to identification to data capture to storage.
4. Detail Line Processing
You also need a solution that can extract line item details (e.g., unit price, quantity ordered/shipped, terms, part number, etc). Don’t limit yourself to software that captures only summary-level information or requires the use of static templates – not a feasible solution for semi- and unstructured documents.
5. Data Consistency and Rule Compliance
Let the software work for you! You should be able to define precisely which data to capture and how to capture it, based on your specific business rules. Dates, for example, can be captured and consistently output in a DD/MM/YYYY format – or the format that you select. Or you can specify that a purchase order number might be a 12-character, alpha-numeric field that always begins with the numbers ‘216.’ Your solution should be able to locate and output data based on your parameters. Taken further, the software should be able to cross-check captured data against your existing databases to ensure the information is accurate, valid and complete.
6. Remote Scanning, Validation and Verification Capabilities
If your organization has offices scattered across the country or around the world, look for a software solution that can remotely scan documents and/or validate data for distribution to a central office via the Internet. It will reduce labor costs, eliminate shipping costs and greatly improve processing speed. At the same time, processing float time and the occurrence of lost documents will be minimized.
7. Flexible Data Exchange
To avoid custom or time-consuming programming that can be required for data transfers, look for a solution that can easily create a bridge between incompatible file formats. The best of breed document and data solutions contain technology to enable flexible data exchange with almost every accounting or ERP (such as SAP®), back-end or document management system. This eliminates the need for a file conversion program. You should also look for a solution that supports customizable file transfer scheduling, giving you the option to automate tasks for better efficiency – such as scheduling data transfers to automatically run by the hour, day, week or month or continuously look into target directories for new data to process throughout the day or night.
8. Adaptability to Existing Accounting or Other Procedures
Your established business procedures should not need to be altered to accommodate your data capture and classification solution. A flexible solution will let you easily incorporate features such as flags and custom routing paths based on your existing business processes. For example, invoice totals that differ from the corresponding PO (which can be determined automatically by your existing database look-ups) can be automatically directed to an alternate routing path, such as to a supervisor for review prior to approval.
9. All-in-One Solution
Both structured and unstructured forms can be found in a single enterprise. No business receives one form type and not the other, so look for a single robust software solution that can easily automate the classification and data capture of both.
10. Growth and Longevity
Finally, look for adaptability. The solution you select should be flexible enough to be able to grow along with the volume and application needs of your company. And it should keep up with emerging technology for years to come.
A solution that meets all of these criteria will simplify the automation of data classification and capture from all your document types – structured, semi- or unstructured. In turn, you’ll enjoy lower processing and labor costs and more accurate data, and you’ll be better positioned to accommodate future business needs.
Samuel L. Schrage is President of AnyDoc Software, Inc. He can be reached at info@AnyDocSoftware.com.
AnyDoc Software offers innovative document and data capture and classification solutions that have been the industry standard since 1991. Thousands of companies worldwide rely on AnyDoc solutions to eliminate millions of hours of manual data entry while improving productivity and accuracy. Any paper form or document including invoices, remittances and checks can be automatically processed with full data extraction without the need for manual keying. Clients include: Sony Pictures Entertainment, Circuit City, BlueCross BlueShield, the U.S. Census, LeasePlan, Zürcher Kantonalbank, Coop and more.
For more information, please visit www.AnyDocSoftware.com.