Login Register






The stories and information posted here are artistic works of fiction and falsehood. Only a fool would take anything posted here as fact.
Thread Rating:
  • 0 Vote(s) - 0 Average


PDF Extraction Toolkit filter_list
Author
Message
PDF Extraction Toolkit #1
The PDF Extraction Toolkit (formerly PDF Analyser) is a Java framework built upon the PDFBox library for performing document analysis of PDF files and creating custom conversion methods to HTML and other formats. It is partly based on my PhD work and includes an algorithm for page segmentation. GraphWrap, a system for graph-based wrapping, or semi-automatic data extraction, from PDF files, is also included within the PDF Extraction Toolkit. The main toolkit (including GraphWrap) is released under the Apache licence, which allows it to be freely incorporated into proprietary software.
A GUI is also included, built upon the XMIllum library, which enables the results of the document analysis process to be visualized. Also, an interactive graph visualization is provided to view the graph structures created by the system and allow the interactive creation and testing of graph-based wrappers on PDF documents. This GUI is released under the GPL licence. A screenshot of the GUI in action is shown below.

[Image: Vs4P58c.png]

Reply

RE: PDF Extraction Toolkit #2
If I'm not mistaken, this either Is or performs the same functionality as PDF-Analyzer 5.0 (which Is a paid/demo license).

Downloaded and will check It out.
Thank you.
[Image: AD83g1A.png]

Reply







Users browsing this thread: 1 Guest(s)