Reading PDF file using PDFBox API

In this example, We will show you sample program about, How to read PDF file using PDFBox in Java. The example program has been tested and shared in the same post.

Maven Dependency

<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.7</version>
</dependency>

Example Program

package com.dineshkrish;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.encryption.InvalidPasswordException;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;
/**
* 
* @author Dinesh Krishnan
*
*/
public class PDFReader {
public static String read(final File pdfFile) {
String text = null;
PDDocument document = null;
try {
document = PDDocument.load(pdfFile);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
PDFTextStripper pdfTextStripper = new PDFTextStripper();
text = pdfTextStripper.getText(document);
} catch (InvalidPasswordException e) {
System.out.println(e.getMessage());
} catch (IOException e) {
System.out.println(e.getMessage());
}
return text;
}
public static void main(String[] args) {
final String fileName = "input.pdf";
File pdfFile = new File(fileName);
System.out.println(read(pdfFile));
}
}

Output

Oct 06, 2017 1:24:05 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font ABCDEE+Calibri are not implemented in PDFBox and will be ignored
“The real 
opportunity for 
success lies 
within the 
person and not 
in the job” 

References

1. Apache PDF Examples
2. Apache PDFBox

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *