Tag Archives: Computer + Software

Biomedical journal search – a small gift

Pubmed often leads you to dead ends – journal citations without a link to the journals’ home. Google helps sometimes but is always time consuming. As a little gift for you, I have written a small bookmarklet that will scan the Regensburg library files – just left click on the


link below and move it to your browser toolbar. With just a mouse click you can then locate the journals homepage.


Another approach – the LibraryLookup Bookmarklet Generator.

Playing with your browser

Some blog authors are nuts about protecting their web site from copying files. There are many ways to protect your site – but only one really good (publish nothing). I often see small javascripts that disable the ability to right-click where javascript.about.com has a much simpler solution:

<body oncontextmenu=”return false;”>

Please try a right click now…
If you are fooled by a web author in such a way, what could you do? tech-recipes has the answer: Of course, we can use javascript to turn it back on.

When visiting the offending website, type the following into the URL bar of your browser:

Happy browsing, yea, yea.


How to recognize photoshopped pictures? This will be a routine task for future editorial process (BTW I already recognized a faked gel gel picture where the edges and density of the bands looked somewhat artifical). However, with the ever increasing technical capacities we probably need non-destroyable, watermarked pictures from professional scanning and digitizing equipment.
In the meantime, check Wikipedia and the links there. I believe that the majority of the faked pictures could have been detected by splitting up color channels and looking at non-continous transitions of hue (“Farbton”), saturation (“Sättigung”) and brightness (“Helligkeit”) or grey value. This will even work with scanned figures although I would recommed to check original computer files (that may always be electronically stamped by previous publishers). Don´t miss the website of the mp3 developers).




Here is another examples how to recognize photoshop spoof:

set the hue to a low setting, the saturation to a higher setting, and mess with the light and look for blotches of color that don’t follow the rest of the image

22-2-07: The JBC has now adopted an explicit policy

“No specific feature within an image may be enhanced, obscured, moved, removed, or introduced. The groupings of images from different parts of the same gel, or from different gels, fields or exposures must be made explicit by the arrangement of the figure (e.g. using dividing lines) and in the text of the figure legend. Adjustments of brightness, contrast, or color balance are acceptable if and as long as they do not obscure or eliminate any information present in the original. Nonlinear adjustments (e.g. changes to gamma settings) must be disclosed in the figure legend.

11-9-07 Hamin Farid has developed tools to detect digital tampering.

I am the guy with the hammer, and everything is a nail

We could think in more general terms about this quote from an interesting benchmarking paper. (I renember a DFG referee saying that I am no expert for the applied study. Vice versa we all see studies where everything is treated with a hammer.)
With a lot of data on my desk, however, I am more interested in the technical conclusions of the paper and feel quite comfortable with their opinion that commercial RDBMSs are not always the best choice. These RDBMSs include more and more features, and missing features are included in add-on packages from third party vendors. With these ever increasing features also useless overhead is being increased with penalty for performance.
A redesign for special databases like those used in genetic epidemiology and bioinformatics therefore seems to be invitable. Some may have already noticed my preference for SQLite, HDF-5, NetCDF.

  • Do we really need client-server mode?
  • We may ask if not 90% of all tasks can be done in presorted arrays (or materialized views).
  • Why can`t processes run completely in virtual memory without disk I/O?
  • Is there any chance to compile to machine code for better performance?
  • Why not ordering task for priority with those having minimum latency being the first in the row?

Yea, yea.

Scientific spam

I am sending here a trackback to David who asked if a new series of spam email may be

nothing more than an intricate social engineering endeavour and that I’ve been duped into responding in this way.

I found me also answering an email where a 15 year old asked (after having smoked a few cigarettes) if she will now have an increased lung cancer risk. Only 10 minutes later at the coffee machine I heard that a dozen people had just answered exactly the same email. Think of it like an April Fool’s joke, yea, yea.

Online statistical analysis

There are many occasions where I quickly need a calculator for example when reviewing a paper. My favorite links are statpages and VassarStats; for confidence intervals I use Poisson confidence intervals.

Clear science

If you ever need to safely dispose a computer, here is an instruction: download an ISO image of DBAN, burn it and boot your computer from CD, that´s it. Take care, at least in Germany, you need to store research data for 10 years.



While waiting for genomewide SNP data to be re-partioned into LD blocks I found this page with some neat progamming tricks. It is part of the dissertation of Ben Fry / MIT about computational information design. Page 74 ff has a history of redesigning the widely used haploview pogram.

The design of these diagrams was first developed manually to work out
the details, but in interest of seeing them implemented, it was clear that
HaploView needed to be modified directly in order to demonstrate the
improvements in practice. Images of the redesigned version are seen
on this page and the page following. The redesigned version was even-
tually used as the base for a subsequence ‘version 2.0’ of the program,
which has since been released to the public and is distributed as one of
the analysis tools for the HapMap [www.hapmap.org] project.


Open culture podcasts

As a frequent traveller I like podcasts. Here is a quick link to Open culture that have a huge university podcast collection including many foreign language selections (Boston College, Bowdoin College, Collège de France, Duke University Law School, Harvard University, Haverford College – Classic Texts, Johns Hopkins, Northwestern University, Ohio State, Princeton University, Stanford University, Swathmore College, University of California (the best collection), The University of Chicago, The University of Glasgow, The University of Pennsylvania, The University of Virginia, The University of Wisconsin-Madison, Vanderbilt University, Yale University and Ecole normale supérieure). If you don´t like proprietary formats you need to find the good and the bad apples.

How to detect your own CNVs

How to detect copy number variation (CNV) in your own genotype chip data, can be found in a companion paper of the recent Nature publication.
In the previous Nature paper the authors explained their algorithm to be based on k-means and PAM (partitioning around medoid) clustering, but it seems quite different. They call genotypes with DM (which seems to be already obsolete by the BRLMM, see a comparison at Broad and the AFFX whitepaper), then adjust heterocygote ratios by Gaussian mixture clustering, normalize and reduce noise before! merging NspI and StyI arrays. The software is at Genome Science, Tokyo. Yea, yea.

New R packages for SNP studies

The December R newsletter reports several brandnew bioconductor packages useful for SNP studies:


Number cruncher

In a recent blog I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software.
My recommendation is to save that money and store the data in a special format that may be more useful for these large dataset; details are in a technical report that I will upload later this day. In the meantime you can already check some software tools to work with these large datasets. This is what I know so far

  • David Duffy has recompiled his sibpair program |link
  • Geron(R) has something under development |link
  • Jochen Hampe and colleagues offer Genomizer |link
  • Franz Rüschendorf developed Alohomora |link
  • I renember about SNPGWA, a development at Wake Forest University |no link yet
  • there will be a R-Bioconductor package by Rob Scharpf |no link yet
  • R library GenABEL by Yurii Aulchenko |link
  • R library SNPassoc by Juan González |link


A technical report how to work with large SNP dataset is now also available at my paper section. Alternatives to what I am suggesting in this paper, have been set out by an anonmyous reviewer

For R users, if SQLite limits are reached, hdf5 (http://hdf.ncsa.uiuc.edu/HDF5/) may be one way forward for really huge table structures since there is an R interface already available. PostgreSQL column limit depends on data type with a maximum of 1600 for simple types. MySQL with the BerkeleyDB backend may be like SQLite with no obvious column count limit. Metakit is not mentioned – it is column oriented and probably also has “unlimited” columns as long as each database is < 1GB or so.


-moblog- Last week I asked an author for some additional information that was not available in his online supplement. He responded immediately, I saw the email arriving in Thunderbird, but when I wanted to read it a couple of hours later I couldn’t find it – neither in in the inbox, spam nor trash folder. Bugtraq has identical user reports – which makes me believe that also Activesync sometimes drops items form the todo list, mainly the important ones. Computer are only a higher ordering system for managing the chaos but there is no reason to believe in impeccability. Yea, yea.