When doing:
import doc2text
doc = doc2text.Document()
doc.read('something.pdf')
doc.process()
doc.extract_text()
I get the following error:
AttributeError Traceback (most recent call last)
<ipython-input-5-57184997370d> in <module>()
----> 1 doc.extract_text()
/usr/local/lib/python2.7/dist-packages/doc2text/__init__.pyc in extract_text(self)
89 for page in self.processed_pages:
90 new = page
---> 91 text = new.extract_text()
92 self.page_content.append(text)
93 else:
/usr/local/lib/python2.7/dist-packages/doc2text/page.pyc in extract_text(self)
36 def extract_text(self):
37 temp_path = 'text_temp.png'
---> 38 cv2.imwrite(temp_path, self.image)
39 self.text = pytesseract.image_to_string(Image.open(temp_path))
40 os.remove(temp_path)
AttributeError: Page instance has no attribute 'image'
When doing:
I get the following error: