Motivation: Identifying genomic variants is an essential step for connecting genotype and phenotype. The usual approach consists of statistical inference of variants from alignments of sequencing reads. State-of-the-art variant callers can resolve a wide range of different variant types with high accuracy. However, they require that all read alignments be available from the beginning of variant calling and be sorted by coordinates. Sorting is computationally expensive, both memory- and speed-wise, and the resulting pipelines suffer from storing and retrieving large alignments files from external memory. Therefore, there is interest in developing methods for resource-efficient variant calling.
Results: We present Ococo, the first program capable of inferring variants in a real-time, as read alignments are fed in. Ococo inputs unsorted alignments from a stream and infers single-nucleotide variants, together with a genomic consensus, using statistics stored in compact several-bit counters. Ococo provides a fast and memory-efficient alternative to the usual variant calling. It is particularly advantageous when reads are sequenced or mapped progressively, or when available computational resources are at a premium.