Published September 2023 | Version Published
Conference Paper Open

Genetic Site-Aware Three-Dimensional Layout Algorithm for Variation Graphs

  • 1. ROR icon California Institute of Technology
  • 2. ROR icon University of California, Santa Cruz

Abstract

Sequence graph visualization is an increasingly important tool for genome sequence analysis, being useful in the context of sequence assembly, as well as in pan-genomic applications. Tremendous work has been done in advancing such visualization with the creation of UCSC's Sequence Tube Maps and IGGE (the Interactive Graph Genomic Explorer). However, visual presentation of graph topology varies in both quality and methodology. The PanGraphViewer provides a node-based method for traversing through pangenome graphs, but the lack of linear traversal causes it to lose intuitiveness. Sequence Tube maps, designed specifically for sequence graphs provides an intuitive layout but depends on haplotype information to infer topology and is not designed for full-chromosome visualization. IGGE, in its current state, uses GraphViz's implementation of the sfdp algorithm to layout nodes and edges. While sfdp is a computationally efficient layout algorithm, the layouts generated by sfdp often presented difficulties with human-readability. Thus, it is necessary to develop a methodology to visualize these sequence graphs in a manner that can focus on the readability of genome structure using the topology of genetic sites. This poster outlines a graph layout method that uses the novel concept of bundles in variation graphs to produce a graph visualization that focuses on adjacency relationships between alleles in complex graphs. At first, the graph is decomposed into the snarl structures using Paten's snarl decomposition implemented in vg. For each snarl, a smaller subgraph is constructed, limiting the search space for the bundle-finding algorithm. Bundles, which act as zones of variation are then found in the linearized subgraph. Each subgraph is temporarily and reversibly projected to acyclic digraph space and linearized via a topological sort to determine a major axis, which will determine orientation of the subgraph layout. The bundles determine positioning of nodes in groups representing alternative alleles along the major axis of each subgraph. This layout process is applied recursively in a LIFO manner through the entire snarl decomposition tree. The graph is then stitched back together. This layout will provide a graphical linearization, which builds on the one presented in UCSC's Sequence Tube Maps to combine its intuitive structure with richer topological information. Three-dimensional medium is exploited by our layout algorithm to reduce edge crossings by arranging alternative alleles in two-dimensional planes perpendicular to the chromosome's major axis. This allows for higher graph readability by taking advantage of our natural ability to trace paths in three-dimensional space. This gives our proposed traversal methodology a clear advantage over visualization methodologies like UCSC's two-dimensional Sequence Tube Maps. Our method is also readily deployed by end users by using the frontend of the web-based 3D graph visualization tool IGGE.

Copyright and License

© 2023 Owner/Author(s).

Files

3584371.3613053.pdf

Files (364.2 kB)

Name Size Download all
md5:2533814d1ba192dc6aab3c293113b98a
364.2 kB Preview Download