Tag Archives: R

Data and methods available? Forget it!

Bergeat 2022

Data were available for 2 of 65 RCTs (3.1%) published before the ICMJE policy and for 2 of 65 RCTs (3.1%) published after the policy was issued (odds ratio, 1.00; 95% CI, 0.07-14.19; P > .99).

Danchev 2021

Among the 89 articles declaring that IPD would be stored in repositories, only 17 (19.1%) deposited data, mostly because of embargo and regulatory approval.

Gabelica 2022 (visualization @ Nature)

Of 3556 analyzed articles, 3416 contained DAS. The most frequent DAS category (42%) indicated that the datasets are available on reasonable request. Among 1792 manuscripts in which DAS indicated that authors are willing to share their data, 1670 (93%) authors either did not respond or declined to share their data with us. Among 254 (14%) of 1792 authors who responded to our query for data sharing, only 122 (6.8%) provided the requested data.

The same issue applies also to software sharing  where less than 5% of all papers is depositing code.

Both issues took me many years of my scientific life. It is recognized by politics in Germany but also the most recent action plan looks  … ridiculous. Why not making data and software sharing mandatory at time of publication?

How to scrape a website with R II: WYSIWYG

Part II

Although Rselenium allows a screenshot of the current browser window

library(RSelenium)
remDr <- remoteDriver(
  remoteServerAddr = "localhost",
  port = 4444,
  browserName = "Chrome"
)
remDr$open()
remDr$navigate("http://google.com")
remDr$screenshot(display = T) # remDr$screenshot(file="screen.jpg")

I found it extremely difficult to control a webbrowser running in a Docker container – looking up the DOM tree, injecting javascript etc is a lot of guess work.

So we need also a VNC server in the docker container as found at github.

After starting in the terminal

docker run -d -p 4444:4444 -p 5900:5900 -v /dev/shm:/dev/shm selenium/standalone-chrome:4.0.0-beta-1-prerelease-20210207

we can watch live at vnc://127.0.0.1:5900 what’s going on.

Breaking up long chains in R’s dplyr/magrittr code and calling sub functions

Having several long and redundant chains in R ‘s magittr code, I have now figured out how to pipe into named and unnamed functions

f1 <- function(x) {
  x %>% count() %>% print()
}
f2 <- function(x) {
  x %>% tibble() %>% print()
}

So we have now two named functions with code blocks that can be inserted in an unnamed function whenever needed

iris %>%
  # do any select, mutate ....
  # before running both functions
  (function(x) {
    x %>% select(Species) %>% f1() %>% print()
    x %>% f2() %>% print()
  })

We can make a variable global from inside a function using <<- assignment but I have not found a way to continue the dplyr chain with any returned variable. So we are trapped inside the function

iris %>%
  (function(x) {
    result <<- x %>% head(1)
  })

How to scrape a website with R I: Using a browser generated cookie

While there are quite some SO examples out there how to manage the login, here are the ncessary steps whenever you need to login in manually and have to start with a browser cookie. First install the “EditThisCookie” plugin in Chrome and export the cookie Continue reading How to scrape a website with R I: Using a browser generated cookie

Graphical display of outbreaks: Transmission trees

Let’s start with some examples from the literature, find out necessary elements, compare different versions and develop a R template for general use.

“Exploding Stars”

CDC Field Epidemiology Handbook, 2019, ISBN 9780190933692 p115

“Tetris”

https://doi.org/10.1016/S0140-6736(20)30154-9 (2020)

“Clean Undeterministic”
“Clean Tree”

https://academic.oup.com/mbe/article/34/4/997/2919386

“Tree Addons”

https://openres.ersjournals.com/content/4/2/00162-2017

“Railways”

https://www.ages.at/service/service-presse/pressemeldungen/epidemiologische-abklaerung-am-beispiel-covid-19/

“Simple ”

https://www.ncbi.nlm.nih.gov/pubmed/32003551

“Vertical”

“star bus”

https://en.wikipedia.org/wiki/Index_case

“Mathematical”

https://www.biorxiv.org/content/10.1101/142570v3.full

“Genetics”

https://wwwnc.cdc.gov/eid/article/26/9/20-1798_article

From R to Python

It’s bit confusing if you are having long-term experience with R but need some OpenCV Python code. What worked for me

  1. download and install Python 3.8.3.
  2. pip install opencv-python
  3. pip install opencv-contrib-python
  4. although Spyder or Jupyter is recommended for data science, I went for PyCharm
  5. install Atom and follow the video instructions
  6. take care, numerous non working introductions out there, stick to recent version

R leaflet – keep map in frame after closing info box

Although having some experience with leaflet before, it took me 5 hours to find out how to
(1) change the background color and
(2) re-center my map after the popup was panning the map  basically out view. Here is my solution

leaflet(options = leafletOptions(
 zoomControl = FALSE,
 minZoom=6, maxZoom=6,
 centerFixed = TRUE) ) %>%
addPolygons(data = ds,
 fillColor = ~pal(AnzahlFall),
 color="black", weight = 1,
 fillOpacity = 0.7,
 label = ~name_2,
 popup = ~www,
 popupOptions = popupOptions( autopan=FALSE, keepInView = TRUE ) ) %>%
htmlwidgets::onRender("
 function(el, x) {
  var myMap = this;
  myMap.dragging.disable();
  myMap.on('popupclose', function(e) { myMap.panTo( {lon: 10.26, lat: 51.1} ) });
  var e = document.getElementsByClassName('leaflet-container');
  e[0].style.backgroundColor = 'white';	    
}")

A new animation of the famous HadCRUT4 climate dataset

Download

Here is the R sample code (PPT aspect ratio is 6:4, Youtube wants 16:9) .

As ggplot2 animation packages have major difficulties to manipulate the single frames, I am combining here raw PNGs using ffmpeg.

# read_cru_hemi() modified from https://mccartneytaylor.com/plotting-climate-change-on-a-spider-graph-using-r

list.of.packages <- c("ggplot2", "reshape", "stringr","RColorBrewer")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages, require, character.only = TRUE)

read_cru_hemi <- function(filename) {
  tab <- read.table(filename,fill=TRUE)
  nrows <- nrow(tab)
  hemi <- data.frame(
    year=tab[seq(1,nrows,2),1],
    annual=tab[seq(1,nrows,2),14],
    month=array(tab[seq(1,nrows,2),2:13]),
    cover=array(tab[seq(2,nrows,2),2:13])
  )
  hemi[,15:26][ hemi[,15:26]==0 ] <- c(NA)
  return(hemi)
}

url_dat <- "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT4-gl.dat"
tempdat <- read_cru_hemi(url_dat)
tempmelt <- melt(tempdat[,c(1,3:14)],id="year")

colfunc <- colorRampPalette(c("grey","grey","red"))
FadeToGrey <- colfunc(2019-1850)

new_theme <- theme_classic() + theme(
  text = element_text(size=18, colour="grey"),
  axis.line = element_blank(), 
  axis.text = element_text(colour="grey"),
  axis.ticks = element_line(colour="grey"),
  axis.title.x = element_blank(),
  panel.border = element_blank(),
  panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  panel.background = element_blank(),
  legend.position = "none"
)
theme_set(new_theme)

for(i in 1850:2019){
p <- ggplot(tempmelt[tempmelt$year %in% 1850:i,], aes(x=variable,y=value,color=as.factor(year),group=year)) + 
  geom_line() +
  scale_x_discrete( labels=month.abb) +
  scale_y_continuous( name="difference from baseline  [ oC ]", limits=c(-1,1) ) +
  annotate("text", x=11, y=1, label=i, size=7) +
  scale_color_manual( values=FadeToGrey[ 1:c(i-1849) ]  )
  fn <- paste("/Users/xxx/Desktop/X/",str_pad(i-1849, 3, pad = "0"),".png",sep="")
  ggsave(p, file=fn, width = 16, height = 9)
}

# not run
# ffmpeg -framerate 10 -i /Users/xxx/Desktop/X/%3d.png -r 5 -pix_fmt yuv420p -y /Users/xxx/Desktop/X/out.mp4

 

In comparison here is the original circular plot. Would require blue, green, yellow, red in the Color Ramp Palette…

 

Now it is only a minor step to the warming strips.

ggplot(tempdat, aes(x = year, y = 1, fill = annual))+
  geom_tile()+
  scale_y_continuous(expand = c(0, 0))+
  scale_x_continuous(expand = c(0, 0))+
  scale_fill_gradientn(colors = rev(col_strip)) +
  guides(fill = guide_colorbar(barwidth = 1)) +
  theme( axis.ticks= element_blank(),
         axis.text = element_blank(),
         axis.title = element_blank()
  )

tempmelt$variable <- as.numeric(str_replace(as.character(tempmelt$variable),"month.",""))
ggplot(tempmelt, aes(x = year, y = variable, z = value)) +
  geom_raster(aes(fill = value)) +
  scale_fill_gradientn(colors = rev(col_strip)) +
  scale_x_continuous(expand = c(0,0)) +
  scale_y_continuous(expand = c(0,0)) +
  theme( axis.ticks= element_blank(),
         axis.text = element_blank(),
         axis.title = element_blank()
  )

 

Rstudio, knitr (Rmarkdown2) and bash

I couldn’t find any example online how to revise my R code getting the exif data from pictures

fn <- c("/usr/local/bin/exiftool /Users/wjst/Desktop/white.tif")
info <- system(fn,inter=TRUE,wait=TRUE)

when moving now to knitr. So here is what worked for me as a replacement including the parsing
of exiftool output.

```{r, engine='bash', echo=FALSE}
/usr/local/bin/exiftool /Users/wjst/Desktop/white.tif >/Users/wjst/Desktop/white.txt
```
```{r Exif, echo=FALSE}
fn <- '/Users/wjst/Desktop/white.txt'
info <- paste(readLines(fn))
info <- strsplit(info,"[:]{1}[ ]{1}")
info <- matrix(data=unlist(info), ncol = 2, byrow = TRUE)
info <- gsub("(^[[:space:]]+|[[:space:]]+$)", "", info)
```
*Exif*
`r kable(info)`

Better figures

A recent paper identifies 10 rules for better pictures. As I have also given several lectures on that topic, I was excited what the authors think…
1. Know your audience. This is trivial as you never know your audience.
2. Identify your message. True and not true at the same time. True as it makes your findings more evident – not true if you are allowing a reader to find his own message.
3. Adapt the figure to the support medium. Trivial. May be very time consuming.
4. Captions are not optional. Absolutely true, I also suppport good captions – mini stories for those who can’t read the whole text.
5. Do not trust the defaults. Trivial. No one does.
6. Use color efficiently.  Not really,  avoid colors for those of us who are colorblind and to avoid expensive page charges.
7. Do not mislead the reader. Why should I?
8.  Avoid Chartjunk. Absolutely. Most frequent problem.
9. Message trumps beauty. Sure, form follows function.
10. Get the right tool. Maybe correct while the further recommendations look like a poor man’s effort to make his first graphic at zero cost: Gimp, Imagemagick, R…