Category Archives: Software

Scientific spam

12.01.2007 admin

I am sending here a trackback to David who asked if a new series of spam email may be

nothing more than an intricate social engineering endeavour and that Iâ€™ve been duped into responding in this way.

I found me also answering an email where a 15 year old asked (after having smoked a few cigarettes) if she will now have an increased lung cancer risk. Only 10 minutes later at the coffee machine I heard that a dozen people had just answered exactly the same email. Think of it like an April Fool’s joke, yea, yea.

CC-BY-NC Science Surf accessed 09.01.2026

Software

Online statistical analysis

11.01.2007 admin

There are many occasions where I quickly need a calculator for example when reviewing a paper. My favorite links are statpages and VassarStats; for confidence intervals I use Poisson confidence intervals.

CC-BY-NC Science Surf accessed 09.01.2026

Software

Clear science

10.01.2007 admin

If you ever need to safely dispose a computer, here is an instruction: download an ISO image of DBAN, burn it and boot your computer from CD, thatÂ´s it. Take care, at least in Germany, you need to store research data for 10 years.

CC-BY-NC Science Surf accessed 09.01.2026

Software

R-type and E-type

9.01.2007 admin

This afternoon I attended a talk at MPI of Yurii Aulchenko about his R package genABEL. He uses a nice trick to convert genotypes into 8 bit (as there are is no 2 bit format 00,01,10,11) which saves quite a lot of memory.

And here is the E type that I have seen a couple of months ago:

CC-BY-NC Science Surf accessed 09.01.2026

Genetics, Software

3D LD

15.12.2006 admin 1 Comment

While waiting for genomewide SNP data to be re-partioned into LD blocks I found this page with some neat progamming tricks. It is part of the dissertation of Ben Fry / MIT about computational information design. Page 74 ff has a history of redesigning the widely used haploview pogram.

The design of these diagrams was first developed manually to work out
the details, but in interest of seeing them implemented, it was clear that
HaploView needed to be modified directly in order to demonstrate the
improvements in practice. Images of the redesigned version are seen
on this page and the page following. The redesigned version was even-
tually used as the base for a subsequence â€˜version 2.0â€™ of the program,
which has since been released to the public and is distributed as one of
the analysis tools for the HapMap [www.hapmap.org] project.

CC-BY-NC Science Surf accessed 09.01.2026

Philosophy, Software

Open culture podcasts

13.12.2006 admin

As a frequent traveller I like podcasts. Here is a quick link to Open culture that have a huge university podcast collection including many foreign language selections (Boston College, Bowdoin College, Collège de France, Duke University Law School, Harvard University, Haverford College – Classic Texts, Johns Hopkins, Northwestern University, Ohio State, Princeton University, Stanford University, Swathmore College, University of California (the best collection), The University of Chicago, The University of Glasgow, The University of Pennsylvania, The University of Virginia, The University of Wisconsin-Madison, Vanderbilt University, Yale University and Ecole normale supérieure). If you donÂ´t like proprietary formats you need to find the good and the bad apples.

CC-BY-NC Science Surf accessed 09.01.2026

Genetics, Software

How to detect your own CNVs

10.12.2006 admin

How to detect copy number variation (CNV) in your own genotype chip data, can be found in a companion paper of the recent Nature publication.
In the previous Nature paper the authors explained their algorithm to be based on k-means and PAM (partitioning around medoid) clustering, but it seems quite different. They call genotypes with DM (which seems to be already obsolete by the BRLMM, see a comparison at Broad and the AFFX whitepaper), then adjust heterocygote ratios by Gaussian mixture clustering, normalize and reduce noise before! merging NspI and StyI arrays. The software is at Genome Science, Tokyo. Yea, yea.

CC-BY-NC Science Surf accessed 09.01.2026

Genetics, Software

New LD measure

6.12.2006 admin

There is a new way to calculate LD that may overcome the limitations of D’ and R^2 that are not easily generalizable to multiallelic markers (or haplotypes) and depend on the distribution of SNPs (or haplotypes).
The paper is at BMC, the sources at the authors’ website. I have slightly modified the program to allow input and output file names on the command line before compiling it. Use at your own risk, yea, yea.

CC-BY-NC Science Surf accessed 09.01.2026

Genetics, Software

New R packages for SNP studies

5.12.2006 admin

The December R newsletter reports several brandnew bioconductor packages useful for SNP studies:

biomaRt: query Ensembl
beadArray: analyze Illumina arrays
RSNPper: query SNPper

CC-BY-NC Science Surf accessed 09.01.2026

Genetics, Software

Number cruncher

29.11.2006 admin

In a recent blog I described high resolution SNP datasets that are available on the net. To work with these datasets you will probably need to upgrade your hardware and software. For data handling many people stick nowadays to commercial SQL databases that have plugins for PD software.
My recommendation is to save that money and store the data in a special format that may be more useful for these large dataset; details are in a technical report that I will upload later this day. In the meantime you can already check some software tools to work with these large datasets. This is what I know so far

David Duffy has recompiled his sibpair program |link
Geron(R) has something under development |link
Jochen Hampe and colleagues offer Genomizer |link
Franz Rüschendorf developed Alohomora |link
I renember about SNPGWA, a development at Wake Forest University |no link yet
there will be a R-Bioconductor package by Rob Scharpf |no link yet
R library GenABEL by Yurii Aulchenko |link
R library SNPassoc by Juan GonzÃ¡lez |link

Addendum

A technical report how to work with large SNP dataset is now also available at my paper section. Alternatives to what I am suggesting in this paper, have been set out by an anonmyous reviewer

For R users, if SQLite limits are reached, hdf5 (http://hdf.ncsa.uiuc.edu/HDF5/) may be one way forward for really huge table structures since there is an R interface already available. PostgreSQL column limit depends on data type with a maximum of 1600 for simple types. MySQL with the BerkeleyDB backend may be like SQLite with no obvious column count limit. Metakit is not mentioned â€“ it is column oriented and probably also has â€œunlimitedâ€ columns as long as each database is < 1GB or so.

CC-BY-NC Science Surf accessed 09.01.2026

Joke, Software

email@nirvana.info

25.11.2006 admin

-moblog- Last week I asked an author for some additional information that was not available in his online supplement. He responded immediately, I saw the email arriving in Thunderbird, but when I wanted to read it a couple of hours later I couldn’t find it – neither in in the inbox, spam nor trash folder. Bugtraq has identical user reports – which makes me believe that also Activesync sometimes drops items form the todo list, mainly the important ones. Computer are only a higher ordering system for managing the chaos but there is no reason to believe in impeccability. Yea, yea.

CC-BY-NC Science Surf accessed 09.01.2026

Joke, Software

Calibrate!

22.11.2006 admin

This is a quick link to Eye of Science, a website with impressing micro photographs. Calibrate your monitor first at sriker, then goto Eye of Science. Oliver Meckes is quoting Albert Einstein

People should be ashamed to use the wonders of science and technology if they donâ€™t know any more about it than a cow knows about the botany of the grass it relishes in eating.

CC-BY-NC Science Surf accessed 09.01.2026

Software

Some privacy…

4.11.2006 admin 2 Comments

Every click leaves many traces in the internet. To enjoy at least some privacy, I recommend to install the CookieCuller, that will destroy all cookies (except some protected cokkies) when closing your browser. A slightly higher level of privacy may be obtained by using TORPARK, that is now even available in a standalone USB stick version form www.torrify.com. Even by using TORPARK you are still identified by your network card – SMAC is the ultimate solution, yea, yea.

Addendum

Science writes:

As you browse the Internet, many Web sites such as Google’s record a string of tex–the cookie–representing the identity of your computer. And when you use Google, its servers keep track not only of what you search for but also where you go next. People add new entries to this record at the rate of 200 million Web searches per day. This electronic record is key to Google’s business model: Most of its $1 billion annual revenue comes from Internet advertising targeted to individuals.

Another tip – disable also Flash super cookies in the online applet.

CC-BY-NC Science Surf accessed 09.01.2026

Software

Hidden feature at PUBMED

1.11.2006 admin

No, it is not really a hidden feature – but keep your mouse on “links” at the right part of the citation, wait for the drop-down, select “Link-out” and there is a good chance to jump directly to the publisher site, yea, yea.

Addendum

The nodalpoint blog” writes about MEDIE, a new PUBMED parser:

…is an â€œintelligentâ€ semantic search engine that retrieves biomedical correlations from over 14 million articles in MEDLINE. You can find abstracts and sentences in MEDLINE by specifying the semantics of correlations; for example, What activates tumour suppressor protein p53? So just how useful is MEDIE and is it at the cutting edge?

CC-BY-NC Science Surf accessed 09.01.2026

Genetics, Software

SQL injection and retrovirus infection

11.09.2006 admin

What is similiar to SQL injection (in webforms) and retrovirus infection (of mammals)? I think there is always a vulnerable situation (tainted variable by unexpected data entry – or double strand break and ligation) that allows foreign code to be inserted. What is different? Retrovirus insertion is probably position specific while SQL injection can even determine its own target. Yea, yea.

CC-BY-NC Science Surf accessed 09.01.2026