I am trying to read a PDF file using pypdf and write onto a text file. But its not working. content value in the below code is just "u/n/n/n/n/n'...PDF file has 5 pages so 5 times new line character and in the begining 'u'..whats going wrong please help. why the contents are not coming. Any help is highly appreciated. Thanks Sujan Code: #!/usr/bin/python import pyPdf import sys def getPDFContent(path): content = "" p = file(path, "rb") pdf = pyPdf.PdfFileReader(p) for i in range(0, pdf.getNumPages()): content += pdf.getPage(i).extractText() + "\n" content = " ".join(content.replace(u"\xa0", " ").strip().split()) return content def main(): f= open('test.txt','w') pdfl = getPDFContent("test.pdf").encode("ascii", "ignore") f.write(pdfl) f.close() if __name__ == "__main__": main()