{"id":23273,"date":"2024-02-18T10:37:39","date_gmt":"2024-02-18T08:37:39","guid":{"rendered":"https:\/\/www.wjst.de\/blog\/?p=23273"},"modified":"2024-02-19T19:22:09","modified_gmt":"2024-02-19T17:22:09","slug":"pro-tipp-next-level-ocr-of-academic-documents","status":"publish","type":"post","link":"https:\/\/www.wjst.de\/blog\/sciencesurf\/2024\/02\/pro-tipp-next-level-ocr-of-academic-documents\/","title":{"rendered":"Pro Tipp: Next level OCR of academic documents"},"content":{"rendered":"<p>Reading of math documents into <a href=\"https:\/\/de.wikipedia.org\/wiki\/LaTeX\">LaTeX<\/a> involves a lot of typing while there is some support now by <a href=\"https:\/\/facebookresearch.github.io\/nougat\/\">FB<\/a> (<a href=\"https:\/\/github.com\/facebookresearch\/nougat\">Github<\/a>)<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\r\npip install nougat-ocr\r\nnougat path\/to\/file.pdf -o output_directory\r\n<\/pre>\n\n<p>&nbsp;<\/p>\n<div class=\"bottom-note\">\n  <span class=\"mod1\">CC-BY-NC Science Surf , accessed 06.04.2026<\/span>\n <\/div>","protected":false},"excerpt":{"rendered":"<p>Reading of math documents into LaTeX involves a lot of typing while there is some support now by FB (Github) pip install nougat-ocr nougat path\/to\/file.pdf -o output_directory &nbsp; CC-BY-NC Science Surf , accessed 06.04.2026<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[2681,2870],"class_list":["post-23273","post","type-post","status-publish","format-standard","hentry","category-computer-software","tag-pdf","tag-latex"],"_links":{"self":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/23273","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/comments?post=23273"}],"version-history":[{"count":1,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/23273\/revisions"}],"predecessor-version":[{"id":23274,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/23273\/revisions\/23274"}],"wp:attachment":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/media?parent=23273"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/categories?post=23273"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/tags?post=23273"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}