Main Page | Alphabetical index | English Encyclopedia

Optical character recognition

From Wikipedia, the free encyclopedia.
Optical character recognition, usually abbreviated to OCR, involves computer systems designed to translate images of typewritten text (usually captured by a scanner) into machine-editable text—to translate pictures of characters into a standard encoding scheme representing them (ASCII or Unicode). OCR began as a field of research in artificial intelligence and machine vision; though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques.

Table of contents
1 Optical versus digital character recognition
2 Training
3 Brief history of OCR
4 Typewritten OCR
5 Hand print OCR
6 Cursive OCR
7 Research areas
8 MICR
9 See also
10 External links

Optical versus digital character recognition

Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Since very few applications survive that use true optical techniques the optical character recognition term has now been broadened to cover digital character recognition as well.

Training

Early systems required "training" (essentially, the provision of known samples of each character) to read a specific font. Currently, though, "intelligent" systems that can recognize most fonts with a high degree of accuracy are now common. Some systems are even capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.

Brief history of OCR

In 1950, David Shepard, a cryptanalyst at AFSA, the forerunner of the United States National Security Agency (NSA), was asked by Frank Rowlett, who had broken the Japanese PURPLE diplomatic code, to work with Dr. Louis Tordella to recommend data automation procedures for the Agency. This included the problem of converting printed messages into machine language for computer processing. Shepard decided it must be possible to build a machine to do this, and, with the help of Harvey Cook, a friend, built "Gismo" in his attic during evenings and weekends. This was reported in the Washington Daily News on April 27, 1951 and in the New York Times on December 26, 1953 after his U.S. Patent Number 2,663,758 was issued. Shepard then founded Intelligent Machines Research Corporation (IMR), which went on to deliver the world's first several OCR systems used in commercial operation. While both Gismo and the later IMR systems used image analysis, as opposed to character matching, and could accept some font variation, Gismo was limited to reasonably close vertical registration, whereas the following commercial IMR scanners analyzed characters anywhere in the scanned field, a practical necessity on real world documents. The first commercial system was installed at the Readers Digest in 1955, which, many years later, was donated by Readers Digest to the Smithsonian, where it was put on display. The second system was sold to the Standard Oil Company of California for reading credit card imprints for billing purposes, with many more systems sold to other oil companies. Other systems sold by IMR during the late 1950's were a bill stub reader to the Ohio Bell Telephone Company and a page scanner to the U.S. Air Force for reading and transmitting by teletype typewritten messages. IBM and others were later licensed on Shepard's OCR patents.

The United States Postal Service has been using OCR machines to sort mail since 1965 based on technology devised primarily by the prolific inventor Jacob Rabinow. Canada Post has been using OCR systems since 1971. OCR systems read the name and address of the addressee at the first mechanized sorting center, and print a routing bar code on the envelope based on the postal code. After that the letters need only be sorted at later centers by less expensive sorters which need only read the bar code. To avoid interference with the human-readable address field which can be located anywhere on the letter, special ink is used that is clearly visible under UV light. This ink looks orange in normal lighting conditions. Envelopes marked with the machine readable bar code may then be processed.

Typewritten OCR

While the accurate recognition of Latin-script typewritten text is now considered largely a solved problem, recognition of hand printing and handwriting in general, and printed versions of some other scripts—particularly those with a very large number of characters—are still the subject of active research.

Hand print OCR

Systems for recognizing hand-printed text on the fly have enjoyed commercial success in recent years. Among these are the input device for the Palm Pilot and other Personal Digital Assistants. The Apple Newton pioneered this technology. The algorithms used in these devices take advantage of the fact that the order, speed, and direction of individual lines segments at input are known. Also, the user can be retrained to use only specific letter shapes. These methods cannot be used in software that scans paper documents, so accurate recognition of hand-printed documents is still largely an open problem. Accuracy rates of 80% to 90% on neat, clean hand-printed characters can be achieved, but that accuracy rate still translates to dozens of errors per page, making the technology useful only in very limited contexts.

Cursive OCR

Recognition of cursive text is an active area of research, with recognition rates even lower than that of hand-printed text. Higher rates of recognition of general cursive script will likely not be possible without the use of contextual or grammatical information. For example, recognizing entire words from a dictionary is easier than trying to parse individual characters from script. Reading the Amount line of a check (which is always a written out number) is an example where using a smaller dictionary can increase recognition rates greatly. Knowledge of the grammar of the language being scanned can also help determine if a word is likely to be a verb or a noun, for example, allowing greater accuracy. The shapes of individual cursive characters themselves simply do not contain enough information to accurately (greater than 98%) recognize all handwritten cursive script.

Research areas

A particularly difficult problem for computers and humans is that of old church baptismal and marriage records containing mostly names where the pages may be damaged by age, water or fire and the names may be obsolete or contain rare spellings. Computer image processing techniques can assist humans in reading extremely difficult texts such as the Archimedes Palimpsest or the Dead Sea Scrolls. Cooperative approaches where computers assist humans and vice-versa is an interesting area of research.

Character recognition has been an active area of computer science research since the late 1950s. It was initially perceived as an easy problem, but it turned out to be a much more interesting problem. It will be many decades, if ever, before computers will be able to read all documents with the same accuracy as human beings.

MICR

One area where accuracy and speed of computer input of character information exceeds that of humans is in the area of magnetic ink character recognition, where the error rates range around one read error for every 20,000 to 30,000 checks.

See also

External links



Limit search to: Body and Title Deutsche Seiten Path

Websites for Optical
Showing page 1 (1 - 10 of 3025 hits) Next »
iPOINT is part of the Optical/Electronics Circuits Group involved in optical ATM networking. For various reasons (header translation, buffering ... ATM networking requires a mix of electronic and optical components. Thus, a Optical/Electronic conversion is needed at some points within ... the network. iPOINT research investigates the integration of optical components on the same substrate as the ...
A selection of common images. A selection of common images.
Manufactures Newtonian optical systems, all types of Cassegrain optical systems including an enhanced Ritchey Chretien optical system with integral field corrector, modified Wynn field correctors, plus optical tube assemblies and complete telescopes made from composite materials. Optical testing services are also available. Manufactures Newtonian ...
Site dedicated to promoting the Optical Search for Extraterrestrial Intelligence. Here you will find ... approach to The Search For Extraterrestrial Intelligence. The Optical Search For Extraterrestrial Intelligence, otherwise known as Optical SETI (OSETI), seeks to detect pulsed and continuous ... and infrared spectrums. Site dedicated to promoting the Optical Search for Extraterrestrial Intelligence. Here you will find ... approach to The Search For Extraterrestrial Intelligence. The Optical Search For Extraterrestrial Intelligence, otherwise known as ...
Designed and developed optical instruments like 3D head mounted displays, holographic data storage and security systems, Optical sensors for mobile robots, 3D displays, and Optical document security systems. Designed and developed optical instruments like 3D head mounted displays, holographic data storage and security systems, Optical sensors for mobile robots, 3D displays, and ...
Optical sells surveying equipment, construction lasers, GPS for surveying and GIS, and optical tooling equipment. We also carry a full line ... for Land Surveyors, Engineers, Tooling and Construction Professionals. Optical sells surveying equipment, construction lasers, GPS for surveying and GIS, and optical tooling equipment. We also carry a full line ...
Omega Optical custom-designs and produces high performance optical filters and coatings in large volumes or single-piece quantities. Omega Optical custom-designs and produces high performance optical filters and coatings in large volumes or single ...
... at Northumbria University, UK, with research interests in optical wireless communications, photonic networks, high speed optical switching and optical sensors. Interdisciplinary research group at Northumbria University, UK, with research interests in optical wireless communications, photonic networks, high speed optical switching and optical sensors.
A wholesale optical laboratory that custom grinds optical lenses for the eyeglass industry. A wholesale optical laboratory that custom grinds optical lenses for the eyeglass industry.
Dozens of optical illusions to tease your brain. Dozens of optical illusions to tease your brain.

Next »

Help build the largest human-edited directory on the web.
Submit a Site - Open Directory Project - Become an Editor
Free thumbnail preview by Thumbshots.org

Search for products at amazon.com:
Search:
Keywords:
amazon.com books on 'Optical character recognition':
Search at Google.com:
Google
WebCalSky.com Encyclopedia

Suchresultate aus unserem günstigen CalSky-Shop