Go4Expert

Go4Expert (http://www.go4expert.com/)
-   Python (http://www.go4expert.com/forums/python/)
-   -   reading PDF file with pypdf no contents are captured. Please help (http://www.go4expert.com/forums/reading-pdf-file-pypdf-contents-t29750/)

sujan.dasmahapatra 31Jul2013 16:30

reading PDF file with pypdf no contents are captured. Please help
 
I am trying to read a PDF file using pypdf and write onto a text file. But its not working. content value in the below code is just "u/n/n/n/n/n'...PDF file has 5 pages so 5 times new line character and in the begining 'u'..whats going wrong please help. why the contents are not coming. Any help is highly appreciated. Thanks Sujan
Code:

#!/usr/bin/python
import pyPdf
import sys

def getPDFContent(path):
    content = ""
    p = file(path, "rb")
    pdf = pyPdf.PdfFileReader(p)
    for i in range(0, pdf.getNumPages()):
        content += pdf.getPage(i).extractText() + "\n"
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
    return content

def main():
    f= open('test.txt','w')
    pdfl = getPDFContent("test.pdf").encode("ascii", "ignore")
    f.write(pdfl)
    f.close()

if __name__ == "__main__":
    main()



All times are GMT +5.5. The time now is 05:56.