This seems to be a question with not so many good answers. “The origins of genome architecture” (Sinauer, 2007) has a nice chapter on “Genes in pieces” covering
- the spliceosome fisson-fusion model: Did a single ancestral spliceosome diverge into different lineages or is there an endogeneous origin within a single species?
- the introns early-introns late debate: Were introns present in the first prokaryotic cell or not?
Mechanisms about intron gain center about AGGT tetramer duplication that result in new splice sites —AG|gt … ag|GT— with the segment in lower cases being the new intron. (What I found a funny fact, when working for the first time with genomic sequences around 1994 is the “codon ignorance” and “domain ignorance” of introns while on the other hand allowing for alternative splicing). Another mechanism of intron gain may be transposable elements although retrotransposons cannot deliver introns as they are spliced out. More likely are already released introns and ectopic reintegration. Another (not mentioned mechanism) could be random mutations activating cryptic splice sites.
The key question remains – why do we really need introns in the human genome? The book provides no answer on that. Is it it about effective re-use of code? At least they come at an enormous cost for the individual where cells produce thousands of mRNAs per cell cycle with a considerable energetic investment in nucleotides (p 251).
Transcripts elongate at rates of approx 20-40 bp per second in eukaryotic cells … so an average human gene, with approx 1.3 kb of total coding DNA interspersed among eigth introns with average lengths of 4.8 kb, requires approx 22 minutes for complete transcription (not included the untranslated end sequences) about thirty times the time required for an intron-free allele.
This estimate let me wonder if introns are just a slow-down mechanisms? Narrowing the hourglass??