{"id":18061,"date":"2021-02-09T14:28:37","date_gmt":"2021-02-09T14:28:37","guid":{"rendered":"http:\/\/www.wjst.de\/blog\/?p=18061"},"modified":"2021-02-09T14:32:36","modified_gmt":"2021-02-09T14:32:36","slug":"how-to-scrape-a-website-with-r-ii-wysiwyg","status":"publish","type":"post","link":"https:\/\/www.wjst.de\/blog\/sciencesurf\/2021\/02\/how-to-scrape-a-website-with-r-ii-wysiwyg\/","title":{"rendered":"How to scrape a website with R II: WYSIWYG"},"content":{"rendered":"<p>Part II<\/p>\n<p>Although Rselenium allows a screenshot of the current browser window<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\r\nlibrary(RSelenium)\r\nremDr &lt;- remoteDriver(\r\n  remoteServerAddr = &quot;localhost&quot;,\r\n  port = 4444,\r\n  browserName = &quot;Chrome&quot;\r\n)\r\nremDr$open()\r\nremDr$navigate(&quot;http:\/\/google.com&quot;)\r\nremDr$screenshot(display = T) # remDr$screenshot(file=&quot;screen.jpg&quot;)\r\n<\/pre>\n<p>I found it extremely difficult to control a webbrowser running in\u00a0a Docker container &#8211; looking up the DOM tree, injecting javascript etc is a lot of guess work.<\/p>\n<p>So we need also a VNC server in the docker container as found at <a href=\"https:\/\/github.com\/SeleniumHQ\/docker-selenium\">github<\/a>.<\/p>\n<p>After starting in the terminal<\/p>\n<pre class=\"brush: php; title: ; notranslate\" title=\"\">\r\ndocker run -d -p 4444:4444 -p 5900:5900 -v \/dev\/shm:\/dev\/shm selenium\/standalone-chrome:4.0.0-beta-1-prerelease-20210207\r\n<\/pre>\n<p>we can watch live at vnc:\/\/127.0.0.1:5900 what&#8217;s going on.<\/p>\n\n<p>&nbsp;<\/p>\n<div class=\"bottom-note\">\n  <span class=\"mod1\">CC-BY-NC Science Surf , accessed 03.05.2026<\/span>\n <\/div>","protected":false},"excerpt":{"rendered":"<p>Part II Although Rselenium allows a screenshot of the current browser window library(RSelenium) remDr &lt;- remoteDriver( remoteServerAddr = &quot;localhost&quot;, port = 4444, browserName = &quot;Chrome&quot; ) remDr$open() remDr$navigate(&quot;http:\/\/google.com&quot;) remDr$screenshot(display = T) # remDr$screenshot(file=&quot;screen.jpg&quot;) I found it extremely difficult to control a webbrowser running in\u00a0a Docker container &#8211; looking up the DOM tree, injecting javascript etc &hellip; <a href=\"https:\/\/www.wjst.de\/blog\/sciencesurf\/2021\/02\/how-to-scrape-a-website-with-r-ii-wysiwyg\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">How to scrape a website with R II: WYSIWYG<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[2893,3573,3624,3572,3570,3623],"class_list":["post-18061","post","type-post","status-publish","format-standard","hentry","category-computer-software","tag-r","tag-rcurl","tag-rselenium","tag-rjson","tag-rvest","tag-scrape"],"_links":{"self":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/18061","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/comments?post=18061"}],"version-history":[{"count":4,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/18061\/revisions"}],"predecessor-version":[{"id":18065,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/posts\/18061\/revisions\/18065"}],"wp:attachment":[{"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/media?parent=18061"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/categories?post=18061"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wjst.de\/blog\/wp-json\/wp\/v2\/tags?post=18061"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}