pytesseract is an OCR library in Python which is used to extract text from images. Python-tesseract is a python wrapper for Google's Tesseract-OCR.
pytesseract, tesseract needs to be installed in the
system, refer below steps for tesseract installation.
1. Navigate to https://github.com/UB-Mannheim/tesseract/wiki and download Tesseract installer for Windows.
2. Double click on downloaded installer to begin the installation and select language.
3. Click on "Next" to continue installation.
4. In the "License Agreement" widget click on "I Agree".
5. In the "Choose Users" section select "Install for anyone using this computer".
6. In the "Choose Components" section click on "Next".
7. In the "Choose Install Location" section click on "Next".
7. In the "Choose Install Menu Folder" window click on "Install".
8. In the "Installation Complete" window click on "Next".
9. Click on "Finish" to complete the setup.
Now the tesseract is installed, let's proceed with Python module's installation.
pillow using pip.
pip install Pillow
pip pytesseract using pip.
pip install pytesseract
Now we have all the required modules in place, let's write Python code to read the text from below image.
1. Create a python file and import the required modules.
from PIL import Image import pytesseract
2. Define the
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
3. Define the image path and open the image.
image_path = "sample.png" img = Image.open(image_path)
4. Get the text from image by running Tesseract OCR.
text = pytesseract.image_to_string(Image.open(image_path)) print(text)
"If you don't read the newspapers, you are uninformed. If you do read them, you are misinformed."
5. Complete code snippet to extract text from images using
from PIL import Image import pytesseract pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' print(pytesseract.get_languages(config='')) image_path = "sample.png" img = Image.open(image_path) text = pytesseract.image_to_string(Image.open(image_path)) print(text)