OLGenie: Software for Overlapping Genes
We are thrilled to share our new study of overlapping genes using OLGenie, out today in Molecular Biology and Evolution. It was a privilege to work with wonderful co-authors Xinzhu (April) Wei and Zachary Ardern.
Briefly, overlapping genes (OLGs) occur when a single stretch of nucleotides (ACGT…) encodes two distinct proteins in different reading frames. A given segment may encode up to 6 reading frames, 3 each on the forward and reverse strands. See our Fig 1A.
OLGenie is our software for finding OLGs by detecting purifying selection (constraint) in one reading frame (gene 2) while controlling for another (gene 1). To do this, the Wei-Zhang method, tailored for OLGs, is modified to estimate dN/dS quickly.
We estimate rates like dNS, where NS=nonsynonymous (N) in gene 1 but synonymous (S) in gene 2 (see our Fig 1B). Because dNN/dNS considers only N sites in gene 1, dNN/dNS<1 (NS>NN) indicates that N changes observed in gene 1 are disproportionately S in gene 2, suggesting constraint.
Benchmarking with simulated sequences & biological controls shows high accuracy & low false-positive rates, especially for subsets of data with low dN/dS (see our Fig 3B). We also detect constraint in HIV-1’s putative antisense protein (asp) gene, corroborated by new lab evidence.
We hope OLGenie will be used to study known OLGs and to predict new OLGs in genome annotation. We are super grateful for the support and interest of our colleagues and friends, and sincerely hope for feedback on the method, available at GitHub.