![]() Interpreter = PDFPageInterpreter(rsrcmgr, device) With TextConverter(rsrcmgr, retstr, codec=codec, '''Convert pdf content from a file path to text Test pdf file: #pip install pdfminer.sixįrom pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreterįrom nverter import TextConverter In 2020 the solutions above were not working for the particular pdf I was working with. As instructions for this would blow up this answer I put them on my personal blog. There is pdftotext which does basically the same but this assumes pdftotext in /usr/local/bin whereas I am using this in AWS lambda and wanted to use it from the current directory.ītw: For using this on lambda you need to put the binary and the dependency to libstdc++.so into your lambda function. Res = n(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE) SCRIPT_DIR = os.path.dirname(os.path.abspath(_file_))
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |