What is OCR?
OCR stands for Optical Character Recognition. It is the process of identifying alphanumeric characters in an image. The following steps outline the procedure for OCR:
- Obtain image
- Perform pre-processing on the image
- Apply algorithm for character recognition
Images can be obtained using scanning tools or cameras. While scanning tools such as scanners preserve the exact layout when done correctly, camera based images tend to skew the dimensions and positions of characters and words due to parallax. These issues can be alleviated through pre-processing.
Pre-processing of images is mostly done to ensure that computer systems have an easier time identifying characters in an image. A wide range of pre-processing algorithms can be implemented based on requirements. Some of these algorithms include
- Line removal
- Layout analysis
Although pre-processing in mainly conducted to enhance the image, incorrectly applying such filters can pose a threat to the validity of the data collected.
The next step following pre-processing is that of the actual character recognition. One of the most basic algorithms of character recognition is pattern matching. Here, the image is compared to a stored sample (glyph) and compared on a pixel by pixel basis. However, this procedure becomes invalid when it comes to handwritten text.
To overcome the hurdles posed by handwritten text, algorithms can be designed to extract and match features rather than pixels. Feature extraction also reduces dimensionality thus improving efficiency.
The confidence level is a metric that showcases how “optimistic” the algorithm is of its own prediction. This confidence level can be improved through the standard fonts and font sizes. Apart from these four basic steps, OCR accuracy can be enhanced through the implementation of application-specific optimizations.
OCR on Azure
OCR is a complex and tedious endeavour that requires extensive domain knowledge and experience. Speaking from my perspective, I have come from a general Computer Science background. As a result, I did not have official technical backing in OCR.
However, with the help of Azure’s Cognitive Services, OCR is possible for novice members of the field as well as newbie programmers. The service uses a simple REST interface that imbues an aura of familiarity as well as increasing ease of use.
OCR on Azure is made available as a sub-service of the Computer Vision API. As such, to implement Microsoft’s OCR service, one needs to obtain a key on Azure. Trial keys can be easily obtained for free to test the OCR service.
Similar to the LUIS service, Computer Vision is not readily available in every region yet. The following regions offer the Computer Vision service:
- East US
- East US 2
- South Central US
- West US
- West Central US
- West US 2
- North Europe
- West Europe
- East Asia
- Southeast Asia
- Brazil South
- Australia East
Unlike LUIS, the Computer Vision service is offered in a variety of pricing tiers:
0-1 mil – INR 66/1k transactions
1-5 mil – INR 52.8/1k transactions
>5 mil – INR 41.97/1k transactions
0-1 mil – INR 99.15/1k transactions
1-5 mil – INR 66.10/1k transactions
>5 mil – INR 42.97/1k transactions
|S3||10 transactions/s||Transactions||INR 165.25/1k transactions|
The Computer Vision API differentiates between OCR for printed text and OCR for handwritten text. As such, different nested routes are implemented for each process. As discussed earlier, printed text is far simpler to perform analysis on. This is mostly attributed to the standard font clear distinction between background and foreground. However, more complicated imagery may place some examples of printed OCR into the handwritten OCR segment. Handwritten text poses immense problems due to its varied nature. Individuals tend to attach their own flair to personal content based on environmental and mental factors that lead to a diverse set of possibilities for every character in the English alphabet.
The REST API for printed text can be accessed by using the following URL:
The above URL takes data in the form of a POST request with the image in the body (binary data). Images can also be supplied through a URL. The response is in the form of a JSON object with the analysis and corresponding confidence score.
OCR for handwritten text is slightly different. Due to the complexity involved in handwritten text recognition, the request is accepted, but not immediately processed. As such, Azure sends a 202 response with the operations ID. This ID needs to be continuously polled to check the status of the operation. Once the status indicates success, the output of the OCR operation can be pulled. The URL for handwritten text is as follows:
When compared to other cloud based OCR services, Microsoft’s Computer Vision API does not offer anything above the ordinary. The Redmond giant’s services are fairly cost effective and offer a simpler interface that does not scare beginners away.
Although IBM did offer an OCR service in the form of Watson, the American tech company has since closed it off under the banner of private beta. This leaves Google as Microsoft’s one true competitor.
From my experience, I have noticed that Microsoft’s Computer Vision service offers a more straightforward approach with slightly better accuracy based on the sample set that I have used.
If concepts such as computer vision and OCR interest you, have a look at the IoT courses that we have to offer. We showcase the possibilities of computer vision and what the future holds for such technologies with respect to IoT. Check out our IoT courses:
- Fundamentals of IoT – Level 1
- Working with Electronics in IoT – Level 2
- Cloud Robotics and Advanced IoT Architecture – Level 3
- Designing and Implementing IoT Solutions – Level 4
Let us know what tools and services you use when it comes to OCR in the comments section. We’d love to hear about services we may have missed.