Hello TigerGraph community! I wanted to share what I’m 99% certain is a bug that I found in the A* implementation in the GDS Library, which can cause errors on directed graphs.
The bug is located at line 130: FROM Start:s-(e_type_set:e)-> :t
. This pattern goes in the wrong direction.
The line is executed inside a loop after the algorithm has found the target vertex and starts to generate the optimal path by backtracking earlier visited vertices. If we consider the first iteration, then Start
is the target vertex, and t
is the second to last vertex along the optimal path. However, since the optimal path by definition has a direction from the source to the target vertex, the pattern above might not exist. When this happens, the algorithm fails.
To fix the bug, the direction of the pattern must go in the opposite direction. I found it less straight-forward to fix than it appreas, because the V1 syntax enforces traversal action from left to right (e.g. FROM Start:s <-(e_type_set:e)- :t
does not work).
I have come up with a solution that seems to fix it, at least it provides the correct results on the case where I discovered the bug. I’m new with GSQL, so I’m sure there is a more elegant way, but I will provide it anyway: Replace line 125-146 with this:
# @@current_node: Current node to traverse back from
# @@next_node: The node to be set to current_node in next iteration (current_node.@min_dist_heap.top().v)
# @@display_node_set: Accumulation of current_node
IF @@or_valid_path_exists THEN
ListAccum<VERTEX> @@current_node;
@@current_node += target_vertex;
ListAccum<VERTEX> @@next_node;
WHILE @@current_node.get(0) != source_vertex DO
# Getting @@next_node from the heap of @@current_node
current_node = {@@current_node};
__ = SELECT n
FROM current_node:n
POST-ACCUM
@@next_node += n.@min_dist_heap.top().v;
# Adding the edge from @@next_node to @@current_node to the list of edges (@@display_edge_set)
next_node = {@@next_node};
__ = SELECT n
FROM next_node:n -(e_type_set:e)-> :m
WHERE m == @@current_node.get(0)
ACCUM
@@display_edge_set += e,
CASE weight_type WHEN "INT" THEN
@@sum_total_dist += e.getAttr(weight_attribute, "INT")
WHEN "FLOAT" THEN
@@sum_total_dist += e.getAttr(weight_attribute, "FLOAT")
WHEN "DOUBLE" THEN
@@sum_total_dist += e.getAttr(weight_attribute, "DOUBLE")
END;
# Adding @@next_node to the list of vertices (@@display_node_set)
@@display_node_set += @@next_node.get(0);
# Updating @@current_node by setting it equal to @@next_node
@@current_node.clear();
@@current_node += @@next_node;
# Clearing @@next_node to make it ready for the next iteration
@@next_node.clear();
END;
PRINT @@sum_total_dist;
PRINT @@display_edge_set.size() as hop; # Not related to the bug, but i found the variable hop to be unnecessary
I have a couple of additional comments regarding the A* implementation in TigerGraph.
- It appears that the implementation is only suitable for the specific use case where vertices are coordinates with latitude and longitude as attributes, and edges have an attribute holding the distance in 10^-6 meters between vertices. The Haversine Formula seems to be the only option for the heuristic function. While this works well for my use case as I work with a road map, it may not be suitable for other use cases. It’s important to make this clear in the documentation to avoid confusion.
- There seems to be a discrepancy between the time complexity of the A* algorithm as stated in the TigerGraph documentation and on Wikipedia. While Wikipedia states that the time complexity of A* is O(E), the TigerGraph documentation states it’s O(V^2). This difference is significant on sparse problems, which many real-world problems are.
I hope somone finds this post helpful! Please correct me if you find any errors! Thank you for taking the time to read and provide feedback.