Pdftotext
From Wikipedia, the free encyclopedia
pdftotext is an open source command-line utility for converting PDF files to plain text files —i.e. extracting text data from PDF-protected files. It is freely available and included with many Linux distributions. It can be installed as part of the xpdf package for Mac OS X (fink install Xpdf) or Windows.
$ pdftotext file.pdf
This usage produces a text file with the same name as the input file. Wildcards (*), for example $ pdftotext *pdf
, for converting multiple files, cannot be used because pdftotext expects only one file name. A loop on the shell is needed for batch conversions, as in
$ for f in *.pdf > do > pdftotext $f > done
for the bash shell.
The pdftotext program is part of a larger PDF related package called Xpdf. which can be downloaded from foolabs.com.