The OCR text recognition technology has risen with the popularity of scanners. It is very gratifying that this technology is quite mature and will continue to prosper in the coming years. OCR is an abbreviation for Optical Character Recognition. It refers to optical character recognition technology and is an important field in the research and application of automatic identification technology. The OCR character recognition technology is undoubtedly the best for the input of large quantities of printed text into electronic documents. Both the efficiency and the recognition rate can satisfy the users. Of course, everyone has to thank our ancestors because the Chinese strokes are complex and very easy to identify. As for English, it is not so lucky.
1, scanning
After completing the setting of each scan parameter, start scanning and select the scan mode, whether to use “TWAIN scan interface†or “direct scanâ€.
Select "TWAIN scanning interface" to perform two scans, the first "pre-scan" purpose is to determine the brightness of the scanned file, the user in the "pre-scan" results in the scan brightness, resolution and scanning range after adjustment Then "final scan".
If you select "Direct scan", the system will only perform one scan. If the user has selected "Fixed" or "Automatic" in the brightness selection of the direct final scan, the scanned image will be displayed directly after scanning; if the selection is "Manual adjustment" "" After the scan is finished, the "Select Brightness" dialog box appears on the screen. You can adjust the brightness while observing the brightness and darkness of the image and scanning quality until you are satisfied with the image.
2, identify
Tilt correction
Due to the printing and the user's various operations, the scanned image may have a certain tilt angle, especially a small tilt angle, which is difficult to avoid during scanning. For a particularly small tilt angle (about 1-2 degrees), the OCR system can Adapt automatically, without any processing. When the tilt angle is less than 10-15 degrees, you can perform tilt correction first, and then perform the recognition process. If the tilt angle is 15 degrees, the image is distorted, and it is recommended to scan again. OCR system automatically tilt correction and automatic tilt correction in two ways. Automatic tilt correction is recommended.
Layout analysis
Layout analysis is to scan the image and divide each area block. For each different area block, that is, the range of the area where the text in the image we want to identify is located. Layout analysis is divided into automatic layout analysis and manual layout analysis. A simple image layout suggests automatic analysis, and complex layouts such as newspapers and magazines recommend manual analysis to avoid missing the text to be recognized.
Identify
After the image file is processed according to the situation, such as inclination correction and layout analysis, the recognition program can be entered (the image contains only one column of horizontal text and no other complicated content can be identified without the layout analysis). Identification is the core of the OCR system. To ensure correct identification, follow the steps below.
(1) Choose correct recognition font
Select the font according to the specific situation of the recognition image
Simplified multi-body (printed) - common Song, imitation, body, body, body
Traditional Multi-body (Printed) - Common Song, Body, Body, Body, Body
Pure English (print) - common English fonts
Handwriting - require note specification, not scribbled
(2) Recognition
Click the "Identify" command in the OCR system toolbar to perform the operation. If the recognized image is identified again, the system will display a dialog box to prompt whether to overwrite the existing recognition result.