Full Stack Blog – Use latex to generate layout dataset

08 January 2022

Use latex to generate layout dataset

“Extracting Scientific Figures with Distantly Supervised Neural Networks”

Interesting idea here – it is patching latex source to generate document layout dataset.

For example:

if we have latex sources we can inject some latex commands and set color for title or other document part. Then document will be generated we can generate images from pdf and process these images in OpenCV to find bounding box for title text entry. So simple but we can generate big dataset in this approach.

paper link