MIOTECH

МИОТЕХ
МИОТЕХ

TECHNOLOGIES

Optical Character Recognition (OCR)

OCR has long been outside the walls of banks and financial institutions. Now this technology is used everywhere when someone working with any documents. It is a technology that allows you to convert printed text on paper or other media into digital format.

The scanned or photographed image remains just a picture. Text displayed in a picture cannot be modified in a text editor. The document may not be scanned properly and may contain traces of dust that were in the scanner. OCR is used when you need to get the text from the document with improved image quality.

At the preprocessing stage, the text is rotated to the horizontal position in the scanned image

After that, it is cleaned of background, noise and artifacts. 

It doesn’t matter if the image was scanned in color (24-bit RGB) or grayscale mode. It is converted to black and white (1-bit) during binarization. 

After this procedure, the resulting monochrome image is decomposed into a binary code consisting only of zeros and ones (where 0 is white and 1 is black). 

The next step is to zone-out the text into columns, rows, paragraphs and tables. 

At the preprocessing stage, the text is rotated to the horizontal position in the scanned image

After that, it is cleaned of background, noise and artifacts. 

It doesn’t matter if the image was scanned in color (24-bit RGB) or grayscale mode. It is converted to black and white (1-bit) during binarization. 

After this procedure, the resulting monochrome image is decomposed into a binary code consisting only of zeros and ones (where 0 is white and 1 is black). 

The next step is to zone-out the text into columns, rows, paragraphs and tables. 

The separately recognized letter “A” or “O” may be part of the English word or Russian.  Therefore, at this stage the document’s language is often defined.  

During character recognition, the simplest algorithms analyze each pixel contained in black and white images. They compare each character with a database of known fonts. The result of the recognition is the character that is the closest match.  

More advanced algorithms break down each character into elements, such as lines, strokes, bends and angles between them, and analyze the joints of these elements. 

The ability to connect dictionaries at the post-processing stage increases the probability of correct character recognition and eliminates the possibility that non-existent words may appear in the text. You can set certain rules in these algorithms.  For example, you have the scanned envelope. There is a field on it containing the postal code. It means there can only be digits from 0 to 9. Knowing this, the algorithm will eliminate the appearance of the letter “O” instead of the number “0”, the letter “B” instead of the number “8” or the Latin letter “I” instead of the number “1”. Telephone numbers, car plates and VAT numbers are also easily formalized. 

By now, thanks to machine learning, OCR algorithms can recognize complex fonts as well as handwritten text. 

Application for implementation



Completed projects

AlfaStrakhovanie
One of the first projects of our company. We have increased the speed of execution of additional medical insurance by reducing the time it takes to process paper identity documents
Megafon
The largest project for telecommunications, the full cycle of work with electronic and paper documents, the transition to an internal paperless workflow with the employees of the Megafon's SSC leads the top three in terms of efficiency
Metalloinvest
It is not for nothing that the general service center in Metalloinvest is called MKS (in Russian this name has the same acronym as an ISS - International Space Station) - the project turned out to be the cosmic one! Automatic wiring, contract designer, robots - and much more.
Nornickel
Nornickel is a leader for its industry in digitalization matters. Contract management, EDMS, electronic office, advanced analytics and many other solutions are part of the working toolset to improve the company's efficiency.
Rosneft
Rosneft Group is our longstanding partner. The automated control center in Saratov, the automatic posting in the accounting system, are all our common achievements. Now the procurement site has become one of the first recipients of our service on outsourcing of incoming documents processing.
Sportmaster
The largest international project in the field of clothing sales. Its peculiarity was working with documents from international suppliers, including Chinese invoices.