Schlagwort-Archive: R

How to scrape a website with R II: WYSIWYG

Part II

Although Rselenium allows a screenshot of the current browser window

remDr <- remoteDriver(
  remoteServerAddr = "localhost",
  port = 4444,
  browserName = "Chrome"
remDr$screenshot(display = T) # remDr$screenshot(file="screen.jpg")

I found it extremely difficult to control a webbrowser running in a Docker container – looking up the DOM tree, injecting javascript etc is a lot of guess work.

So we need also a VNC server in the docker container as found at github.

After starting in the terminal

docker run -d -p 4444:4444 -p 5900:5900 -v /dev/shm:/dev/shm selenium/standalone-chrome:4.0.0-beta-1-prerelease-20210207

we can watch live at vnc:// what’s going on.

Breaking up long chains in R’s margrittr code and calling sub functions

Having several long and redundant chains in R ‘s margittr code, I have now figured out how to pipe into named and unnamed functions

f1 <- function(x) {
  x %>% count() %>% print()
f2 <- function(x) {
  x %>% tibble() %>% print()

So we have now two named functions with code blocks that can be inserted in an unnamed function whenever needed

iris %>%
  # do any select, mutate ....
  # before running both functions
  (function(x) {
    x %>% select(Species) %>% f1() %>% print()
    x %>% f2() %>% print()

How to scrape a website with R I: Using a browser generated cookie

While there are quite some SO examples out there how to manage the login, here are the ncessary steps whenever you need to login in manually and have to start with a browser cookie. First install the “EditThisCookie” plugin in Chrome and export the cookie How to scrape a website with R I: Using a browser generated cookie weiterlesen

Graphical display of outbreaks: Transmission trees

Let’s start with some examples from the literature, find out necessary elements, compare different versions and develop a R template for general use.

“Exploding Stars”

CDC Field Epidemiology Handbook, 2019, ISBN 9780190933692 p115

“Tetris” (2020)

“Clean Undeterministic”
“Clean Tree”

“Tree Addons”


“Simple ”


“star bus”



From R to Python

It’s bit confusing if you are having long-term experience with R but need some OpenCV Python code. What worked for me

  1. download and install Python 3.8.3.
  2. pip install opencv-python
  3. pip install opencv-contrib-python
  4. although Spyder or Jupyter is recommended for data science, I went for PyCharm
  5. install Atom and follow the video instructions
  6. take care, numerous non working introductions out there, stick to recent version

R leaflet – keep map in frame after closing info box

Although having some experience with leaflet before, it took me 5 hours to find out how to
(1) change the background color and
(2) re-center my map after the popup was panning the map  basically out view. Here is my solution

leaflet(options = leafletOptions(
 zoomControl = FALSE,
 minZoom=6, maxZoom=6,
 centerFixed = TRUE) ) %>%
addPolygons(data = ds,
 fillColor = ~pal(AnzahlFall),
 color="black", weight = 1,
 fillOpacity = 0.7,
 label = ~name_2,
 popup = ~www,
 popupOptions = popupOptions( autopan=FALSE, keepInView = TRUE ) ) %>%
 function(el, x) {
  var myMap = this;
  myMap.on('popupclose', function(e) { myMap.panTo( {lon: 10.26, lat: 51.1} ) });
  var e = document.getElementsByClassName('leaflet-container');
  e[0].style.backgroundColor = 'white';	    

A new animation of the famous HadCRUT4 climate dataset


Here is the R sample code (PPT aspect ratio is 6:4, Youtube wants 16:9) .

As ggplot2 animation packages have major difficulties to manipulate the single frames, I am combining here raw PNGs using ffmpeg.

# read_cru_hemi() modified from

list.of.packages <- c("ggplot2", "reshape", "stringr","RColorBrewer")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages, require, character.only = TRUE)

read_cru_hemi <- function(filename) {
  tab <- read.table(filename,fill=TRUE)
  nrows <- nrow(tab)
  hemi <- data.frame(
  hemi[,15:26][ hemi[,15:26]==0 ] <- c(NA)

url_dat <- ""
tempdat <- read_cru_hemi(url_dat)
tempmelt <- melt(tempdat[,c(1,3:14)],id="year")

colfunc <- colorRampPalette(c("grey","grey","red"))
FadeToGrey <- colfunc(2019-1850)

new_theme <- theme_classic() + theme(
  text = element_text(size=18, colour="grey"),
  axis.line = element_blank(), 
  axis.text = element_text(colour="grey"),
  axis.ticks = element_line(colour="grey"),
  axis.title.x = element_blank(),
  panel.border = element_blank(),
  panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  panel.background = element_blank(),
  legend.position = "none"

for(i in 1850:2019){
p <- ggplot(tempmelt[tempmelt$year %in% 1850:i,], aes(x=variable,y=value,color=as.factor(year),group=year)) + 
  geom_line() +
  scale_x_discrete( +
  scale_y_continuous( name="difference from baseline  [ oC ]", limits=c(-1,1) ) +
  annotate("text", x=11, y=1, label=i, size=7) +
  scale_color_manual( values=FadeToGrey[ 1:c(i-1849) ]  )
  fn <- paste("/Users/xxx/Desktop/X/",str_pad(i-1849, 3, pad = "0"),".png",sep="")
  ggsave(p, file=fn, width = 16, height = 9)

# not run
# ffmpeg -framerate 10 -i /Users/xxx/Desktop/X/%3d.png -r 5 -pix_fmt yuv420p -y /Users/xxx/Desktop/X/out.mp4


In comparison here is the original circular plot. Would require blue, green, yellow, red in the Color Ramp Palette…


Now it is only a minor step to the warming strips.

ggplot(tempdat, aes(x = year, y = 1, fill = annual))+
  scale_y_continuous(expand = c(0, 0))+
  scale_x_continuous(expand = c(0, 0))+
  scale_fill_gradientn(colors = rev(col_strip)) +
  guides(fill = guide_colorbar(barwidth = 1)) +
  theme( axis.ticks= element_blank(),
         axis.text = element_blank(),
         axis.title = element_blank()

tempmelt$variable <- as.numeric(str_replace(as.character(tempmelt$variable),"month.",""))
ggplot(tempmelt, aes(x = year, y = variable, z = value)) +
  geom_raster(aes(fill = value)) +
  scale_fill_gradientn(colors = rev(col_strip)) +
  scale_x_continuous(expand = c(0,0)) +
  scale_y_continuous(expand = c(0,0)) +
  theme( axis.ticks= element_blank(),
         axis.text = element_blank(),
         axis.title = element_blank()


Rstudio, knitr (Rmarkdown2) and bash

I couldn’t find any example online how to revise my R code getting the exif data from pictures

fn <- c("/usr/local/bin/exiftool /Users/wjst/Desktop/white.tif")
info <- system(fn,inter=TRUE,wait=TRUE)

when moving now to knitr. So here is what worked for me as a replacement including the parsing
of exiftool output.

```{r, engine='bash', echo=FALSE}
/usr/local/bin/exiftool /Users/wjst/Desktop/white.tif >/Users/wjst/Desktop/white.txt
```{r Exif, echo=FALSE}
fn <- '/Users/wjst/Desktop/white.txt'
info <- paste(readLines(fn))
info <- strsplit(info,"[:]{1}[ ]{1}")
info <- matrix(data=unlist(info), ncol = 2, byrow = TRUE)
info <- gsub("(^[[:space:]]+|[[:space:]]+$)", "", info)
`r kable(info)`

Better figures

A recent paper identifies 10 rules for better pictures. As I have also given several lectures on that topic, I was excited what the authors think…
1. Know your audience. This is trivial as you never know your audience.
2. Identify your message. True and not true at the same time. True as it makes your findings more evident – not true if you are allowing a reader to find his own message.
3. Adapt the figure to the support medium. Trivial. May be very time consuming.
4. Captions are not optional. Absolutely true, I also suppport good captions – mini stories for those who can’t read the whole text.
5. Do not trust the defaults. Trivial. No one does.
6. Use color efficiently.  Not really,  avoid colors for those of us who are colorblind and to avoid expensive page charges.
7. Do not mislead the reader. Why should I?
8.  Avoid Chartjunk. Absolutely. Most frequent problem.
9. Message trumps beauty. Sure, form follows function.
10. Get the right tool. Maybe correct while the further recommendations look like a poor man’s effort to make his first graphic at zero cost: Gimp, Imagemagick, R…