… permanently leading to errors – the plus / minus strand orientation and the consecutive sequence / allele designation of SNPs. Only recently I came across a fancy way how to define strand direction – from a tech note by Illumina that deserves a larger audience. They introduce a TOP/BOTTOM strand concept instead of PLUS/MINUS.
Let us consider the case where one allele is A and the remaining variation a C or G. In this instance, the sequence direction is defined TOP and allele 1 as A while allele 2 is C or G. Similar with T and variation C or G: the direction is BOTTOM, allele 1 = A, and allele 2 = C or G. If the SNP is an [A/T] or a [C/G], however, then these rules do not apply.
For the [A/T] SNP, the presence of an A would indicate TOP and the presence of a T would indicate BOTTOM. But, both the A and the T would be designated as allele 1 resulting in ambiguity. It would be similarly confusing to attempt to designate strand and allele for the [C/G] SNP. These [A/T] and [C/G] pairings are considered ambiguous for the purpose of determining strand and allele based on the SNP alone. Illumina employs a â€˜sequence walkingâ€™ technique to designate strand and allele for [A/T] and [C/G] SNPs. For this sequence walking method, the actual SNP is considered to be position â€˜nâ€™. The sequences immediately before and after the SNP are â€˜n-1â€™ and â€˜n+1â€™, respectively. Similarly, two base pairs before the SNP is â€˜n-2â€™ and two base pairs after the SNP is â€˜n+2â€™, etc. Using this method, sequence walking continues until an unambiguous* pairing is present […] To designate strand, when the A or T in the first unambiguous pair is on the 5â€™ side of the SNP, then the sequence is designated TOP. When the A or T in the first unambiguous pair is on the 3â€™ side of the SNP, then the sequence is designated BOTTOM. To designate allele for an [A/T] SNP, when the strand is TOP then allele 1 = A and allele 2 = T. When the strand is BOTTOM, then allele 1 = T and allele 2 = A.
The only point that I do not understand – the 5′ ends with a terminal phosphate group and the 3′ end with a terminal hydroxyl group – where is the hen and where is the egg if I have an unknown sequence?