Last week, we released the initial structural variant (SV) callset for the Genome Aggregation Database (gnomAD), which included nearly a half-million distinct SVs discovered across ~15 thousand human whole-genome sequences.
RyanLCollins13@bioggrimes@brent_p Also, can require higher overlap as a function of size, and also put bounds on breakpoint distance. In practice, we require something like 80% reciprocal overlap for CNVs > 5kb, and ±300bp between breakpoints for CNVs < 5kb, for instance
RyanLCollins13@bioggrimes@brent_p We're working on some methods to formalize this, but a simple bedtools intersect with reciprocal overlap of 50% seems to work well for CNVs. Parameters for overlap don't seem to matter much. Other SV classes (insertions, inversions, complex, etc) need to be treated differently