{"id":19116,"date":"2022-01-22T11:39:37","date_gmt":"2022-01-22T09:39:37","guid":{"rendered":"https:\/\/www.wjst.de\/blog\/?p=19116"},"modified":"2022-05-31T15:08:41","modified_gmt":"2022-05-31T13:08:41","slug":"personalized-pdfs","status":"publish","type":"post","link":"https:\/\/www.wjst.de\/blog\/sciencesurf\/2022\/01\/personalized-pdfs\/","title":{"rendered":"Personalized PDFs"},"content":{"rendered":"<p>Scientific publishers are creating now more and more dynamic PDFs. Why do we know? There is an unexpected loading delay of a <a href=\"https:\/\/www.tandfonline.com\/doi\/epub\/10.1080\/13571516.2021.1976051?needAccess=true\">PDF<\/a> from <a href=\"https:\/\/de.wikipedia.org\/wiki\/Routledge_(Verlag)\">Routledge<\/a> \/ <a href=\"https:\/\/de.wikipedia.org\/wiki\/Taylor_%26_Francis\">Taylor &amp; Francis<\/a> group that I observed recently. First I thought about some DDos protection, but is indeed a personalized document.<\/p>\n<p><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/x-3.jpg\" data-rel=\"key-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-19121\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/x-3.jpg\" alt=\"\" width=\"216\" height=\"231\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/x-3.jpg 621w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/x-3-468x500.jpg 468w\" sizes=\"auto, (max-width: 216px) 100vw, 216px\" \/><\/a><\/p>\n<p>These websites are all being contacted while creating this PDF:<\/p>\n<p><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-22-um-07.16.24.jpg\" data-rel=\"key-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-19117\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-22-um-07.16.24.jpg\" alt=\"\" width=\"327\" height=\"453\" \/><\/a><\/p>\n<p>Scitrus.com seems to be part of a larger reference organizer network and links to scienceconnect.io. Alexametric.com is the soon to be retired Alexa internet \/ Amazon service. Snap.lidcdn.com forwards to px.ads.linkedin.com, the business social network. Then we have Twitter ads, Cloudflare security and Google Analytics. All major players now know that my IP is interested in COVID-19 research. Did I ever agree to submit my IP and time stamp when looking up a rather crude scientific paper?<\/p>\n<p>This is\u00a0 exactly what the German <a href=\"https:\/\/www.dfg.de\/download\/pdf\/foerderung\/programme\/lis\/datentracking_papier_en.pdf\">DFG already warned us <\/a>about last October<\/p>\n<blockquote><p>For some time now, the major academic publishers have been fundamentally changing their business model with significant implications for research: aggregation and the reuse or resale of user traces have become relevant aspects of their business. Some publishers now explicitly regard themselves as information analysis specialists. Their business model is shifting from content provision to data analytics.<\/p><\/blockquote>\n<p><a href=\"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/iwp-2018-0021\/html\">Another paper<\/a> describes the situation as &#8220;Forced marriages and bastards&#8221;&#8230;<\/p>\n<p>My question is : Will Francis &amp; Taylor\u00a0 even do more? The <a href=\"https:\/\/blog.didierstevens.com\/2008\/04\/09\/quickpost-about-the-physical-and-logical-structure-of-pdf-files\/\">structure of PDFs<\/a>\u00a0allows including objects <a href=\"http:\/\/ontent in scripting languages like JavaScript\">including Javascript<\/a>. When examining &#8220;document.pdf&#8221; using <a href=\"https:\/\/blog.didierstevens.com\/programs\/pdf-tools\/\">pdf-parser<\/a>\u00a0 I could not find any javascript or my current IP in clear text.\u00a0 I cannot exclude however that the chopped up IP is stamped somewhere in the document. So I will have try again at a later time point and redo a bitwise analysis. of the same PDF delivered on another day.<\/p>\n<p>At least the DFG document says that organisations might argue that such software allows for the prosecution of users of shadow libraries. While I have doubts that this is legal, we already see targeted advertisement as I received this <a href=\"https:\/\/onlinelibrary.wiley.com\/doi\/10.1111\/cpr.13137\">PDF from Wiley<\/a> that included an Eppendorf ad.<\/p>\n<figure id=\"attachment_19139\" aria-describedby=\"caption-attachment-19139\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-09.27.34.jpg\" data-rel=\"key-image-2\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-19139 size-medium\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-09.27.34-620x649.jpg\" alt=\"\" width=\"620\" height=\"649\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-09.27.34-620x649.jpg 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-09.27.34-478x500.jpg 478w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-09.27.34-768x804.jpg 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-09.27.34.jpg 1102w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><figcaption id=\"caption-attachment-19139\" class=\"wp-caption-text\">Screenshot 20.1.2022<\/figcaption><\/figure>\n<p>When I downloaded this document a second time using a different IP it was however identical. Blood\/Elsevier only let&#8217;s you even download only after watching a small slideshow&#8230;<\/p>\n<figure id=\"attachment_19148\" aria-describedby=\"caption-attachment-19148\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-13.57.24.jpg\" data-rel=\"key-image-3\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-19148 size-medium\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-13.57.24-620x384.jpg\" alt=\"\" width=\"620\" height=\"384\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-13.57.24-620x384.jpg 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-13.57.24-807x500.jpg 807w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-13.57.24-768x476.jpg 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-01-30-um-13.57.24.jpg 1310w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><figcaption id=\"caption-attachment-19148\" class=\"wp-caption-text\">Screenshot 20.1.2022<\/figcaption><\/figure>\n<p><!--nextpage--><span style=\"text-decoration: underline;\">16.3.2022<\/span><\/p>\n<p>There is now a frightening document about user tracking in science by <a href=\"https:\/\/twitter.com\/obibJournal\/status\/1503713830249144322\">@RenkeSiems<\/a> published at <a href=\"https:\/\/www.o-bib.de\/bib\/article\/view\/5797\">https:\/\/www.o-bib.de<\/a><\/p>\n<blockquote><p>In recent years, the major science publishers have evolved away from publishing content providers to data analytics businesses. As platform companies, they generate high margins and use this capital to buy up alternative offers emerging from the science community and to expand into other business areas. The goal is to make themselves indispensable in all central processes of science control, so that we should see this as a vendor lock-in, just as it is known from the information sector. To this end, publishers have equipped their platforms with tools for comprehensive user tracking&#8230;<\/p><\/blockquote>\n<p><!--nextpage--><span style=\"text-decoration: underline;\">19.3.2022<\/span><\/p>\n<p>Finally, <a href=\"https:\/\/onlinelibrary.wiley.com\/doi\/epdf\/10.1111\/joim.13201\">I found a Wiley PDF<\/a> that is different when downloaded at the same time using different proxies proxy. Text and layout is identical and even the binary structure when downloading within 5 sec from the same IP. Using a different proxy, however, gives different strings at the end of the document.<\/p>\n<figure id=\"attachment_19742\" aria-describedby=\"caption-attachment-19742\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-03-19-um-18.42.53.jpg\" rel=\"key\" data-rel=\"key-image-4\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-19742 size-medium\" src=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-03-19-um-18.42.53-620x306.jpg\" alt=\"\" width=\"620\" height=\"306\" data-wp-editing=\"1\" srcset=\"https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-03-19-um-18.42.53-620x306.jpg 620w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-03-19-um-18.42.53-1013x500.jpg 1013w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-03-19-um-18.42.53-768x379.jpg 768w, https:\/\/www.wjst.de\/blog\/wp-content\/uploads\/2022\/01\/Bildschirmfoto-2022-03-19-um-18.42.53.jpg 1139w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><figcaption id=\"caption-attachment-19742\" class=\"wp-caption-text\">binary comparison of the same document<\/figcaption><\/figure>\n<p>So far I don&#8217;t know what these strings indicate, 32 letters are too short for something executable, but long enough to store some identifier or geolocation. <a href=\"https:\/\/www.boxentriq.com\/code-breaking\/frequency-analysis\">Frequency analysis<\/a> shows mainly e,d,c,a,f and digits eg hex code. <a href=\"https:\/\/www.dein-ip-check.de\/suche\">Is it an IP6 IP address<\/a>?<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\r\nGerman Proxy\r\n630d:577e:0c28:fe7c:3934:f08c:6ddb:a93e\r\na3f4:840c:9efe:87b9:af97:df5c:2bf4:d9e6\r\nJapanese Proxy\r\ndb53:6013:615e:902a:805a:815c:229d:20da\r\n7f46:ee57:fa4a:e3a9:33c2:c1db:37e3:ebb8\r\nDutch Proxy\r\nf2a7:8db4:dd89:2c4c:dfeb:d352:87e0:5c18\r\ne606:c2e1:383a:bc21:ef35:58a1:6c54:a991\r\n<\/pre>\n<p>The length would fit an IP6 address. TBC.<\/p>\n<p>&nbsp;<\/p>\n<p><!--nextpage--><span style=\"text-decoration: underline;\">31.5.2022<\/span><\/p>\n<p>Learned only today that back in January @json_dirs was able to identify also an hashcode injected by Elsevier. Interesting that &#8220;just&#8221; exiftool is sufficient for that.<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"474\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">More fun publisher surveillance:<br \/>Elsevier embeds a hash in the PDF metadata that is *unique for each time a PDF is downloaded*, this is a diff  between metadata from two of the same paper. Combined with access timestamps, they can uniquely identify the source of any shared PDFs. <a href=\"https:\/\/t.co\/D9KxnXkMVu\">pic.twitter.com\/D9KxnXkMVu<\/a><\/p>\n<p>&mdash; jonny saunders (@json_dirs) <a href=\"https:\/\/twitter.com\/json_dirs\/status\/1486120144141123584?ref_src=twsrc%5Etfw\">January 25, 2022<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n\n<p>&nbsp;<\/p>\n<div class=\"bottom-note\">\n  <span class=\"mod1\">CC-BY-NC Science Surf , accessed 24.06.2026<\/span>\n <\/div>","protected":false},"excerpt":{"rendered":"<p>Scientific publishers are creating now more and more dynamic PDFs. Why do we know? There is an unexpected loading delay of a PDF from Routledge \/ Taylor &amp; Francis group that I observed recently. First I thought about some DDos protection, but is indeed a personalized document. These websites are all being contacted while creating &hellip; <a href=\"https:\/\/www.wjst.de\/blog\/sciencesurf\/2022\/01\/personalized-pdfs\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Personalized PDFs<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[3522,2681,2122,3722,3721],"class_list":["post-19116","post","type-post","status-publish","format-standard","hentry","category-computer-software","tag-dfg","tag-pdf","tag-business-model","tag-publisher","tag-science-integrity"],"_links":{"self":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/19116","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/comments?post=19116"}],"version-history":[{"count":19,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/19116\/revisions"}],"predecessor-version":[{"id":19970,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/19116\/revisions\/19970"}],"wp:attachment":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/media?parent=19116"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/categories?post=19116"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/tags?post=19116"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}