Facts: About

From FactPub
Jump to: navigation, search

http://factpub.org/wiki/images/factpub_strategy.jpg

Have a read about this project: http://factpub.org/img/huang-ng.pdf

The PDF-to-text algorithm parses the structure of a PDF document to sections, headings, tables etc. The process uses only information available from the current document and does not require any pre-trained model, using a number of unsupervised machine learning techniques and heuristics. Read about how pdf is converted to text paper: http://factpub.org/img/pdf-text.pdf

Found text that appears to be copyrighted? Please report it: http://factpub.org/html/DMCAreport.html