Pdftotext

From Wikipedia, the free encyclopedia

pdftotext is an open source command-line utility for converting PDF files to plain text files —i.e. extracting text data from PDF-protected files. It is freely available and included with many Linux distributions. It can be installed as part of the xpdf package for Mac OS X (fink install Xpdf) or Windows.

$ pdftotext file.pdf

This usage produces a text file with the same name as the input file. Wildcards (*), for example $ pdftotext *pdf, for converting multiple files, cannot be used because pdftotext expects only one file name. A loop on the shell is needed for batch conversions, as in

$ for f in *.pdf
> do
>   pdftotext $f
> done

for the bash shell.

The pdftotext program is part of a larger PDF related package called Xpdf. which can be downloaded from foolabs.com.