Tag Archives: rvest

How to scrape a website with R II: WYSIWYG

Part II

Although Rselenium allows a screenshot of the current browser window

library(RSelenium)
remDr <- remoteDriver(
  remoteServerAddr = "localhost",
  port = 4444,
  browserName = "Chrome"
)
remDr$open()
remDr$navigate("http://google.com")
remDr$screenshot(display = T) # remDr$screenshot(file="screen.jpg")

I found it extremely difficult to control a webbrowser running in a Docker container – looking up the DOM tree, injecting javascript etc is a lot of guess work.

So we need also a VNC server in the docker container as found at github.

After starting in the terminal

docker run -d -p 4444:4444 -p 5900:5900 -v /dev/shm:/dev/shm selenium/standalone-chrome:4.0.0-beta-1-prerelease-20210207

we can watch live at vnc://127.0.0.1:5900 what’s going on.

How to scrape a website with R I: Using a browser generated cookie

While there are quite some SO examples out there how to manage the login, here are the ncessary steps whenever you need to login in manually and have to start with a browser cookie. First install the “EditThisCookie” plugin in Chrome and export the cookie Continue reading How to scrape a website with R I: Using a browser generated cookie