Tag Archives: Rcurl

How to scrape a website with R II: WYSIWYG

9.02.2021 admin

Part II

Although Rselenium allows a screenshot of the current browser window

library(RSelenium)
remDr <- remoteDriver(
  remoteServerAddr = "localhost",
  port = 4444,
  browserName = "Chrome"
)
remDr$open()
remDr$navigate("http://google.com")
remDr$screenshot(display = T) # remDr$screenshot(file="screen.jpg")

I found it extremely difficult to control a webbrowser running in a Docker container – looking up the DOM tree, injecting javascript etc is a lot of guess work.

So we need also a VNC server in the docker container as found at github.

After starting in the terminal

docker run -d -p 4444:4444 -p 5900:5900 -v /dev/shm:/dev/shm selenium/standalone-chrome:4.0.0-beta-1-prerelease-20210207

we can watch live at vnc://127.0.0.1:5900 what’s going on.

CC-BY-NC Science Surf , accessed 27.06.2026

Software

How to scrape a website with R I: Using a browser generated cookie

19.11.2020 admin

While there are quite some SO examples out there how to manage the login, here are the ncessary steps whenever you need to login in manually and have to start with a browser cookie. First install the “EditThisCookie” plugin in Chrome and export the cookie Continue reading How to scrape a website with R I: Using a browser generated cookie →

CC-BY-NC Science Surf , accessed 27.06.2026