I am trying to use pytesseract on Jupyter Notebook.
- Windows 10 x64
- Running Jupyter Notebook (Anaconda3, Python 3.6.1) with administrative privilege
- The work directory containing TIFF file is in different drive (Z:)
When I run the following code:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
tessdata_dir_config = '--tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata"'
print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en', config = tessdata_dir_config))
I get the following error:
TesseractError Traceback (most recent call last)
<ipython-input-37-c1dcbc33cde4> in <module>()
11 # tessdata_dir_config = '--tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata"'
12
---> 13 print(pytesseract.image_to_string(Image.open('Multi_page24bpp.tif'), lang='en'))
14 # print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
C:UserscpchoAppDataLocalContinuumAnaconda3libsite-packagespytesseractpytesseract.py in image_to_string(image, lang, boxes, config)
123 if status:
124 errors = get_errors(error_string)
--> 125 raise TesseractError(status, errors)
126 f = open(output_file_name, 'rb')
127 try:
TesseractError: (1, 'Error opening data file \Program Files (x86)\Tesseract-OCR\en.traineddata')
I found these two references helpful but I am missing something:
https://github.com/madmaze/pytesseract/issues/50
https://github.com/madmaze/pytesseract/issues/64
Thank you for your time on this!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…