Cycle Detection for Undirected Graph

Hello TG Team,

I am looking to find cycles in my undirected graph, while currently the only implemented cycle detection algorithm - Rocha-Thatte is for directed graph. I believe the RT algorithm can also run on my undirected graph, it’s just there would be many circularly identical sequences to deal with.

So is there a better way to do cycle detection on undirected graph? Or are there papers that you can recommend for implementing the cycle detection algorithm in undirected graph?

Thanks

1 Like

Hey! Welcome to the community!

Looking through the implementation of RT algorithm in TigerGraph, we can see that a detected cycle will only be added to the result if its ending vertex has the lowest internal ID among all vertexes in the path. Therefore, circularly identical sequences will not appear many times in the result. In undirected graph, a simple cycle will only be added twice: starting from the vertex with the lowest internal ID, and going either way of the cycle. We can take out the duplicates once we receive the result from TigerGraph’s algorithm.

It is true that there would probably be many circularly identical sequences actively expanding themselves during the execution, but the time complexity is still O(E*k), where E is the number of edges and k is the number of iterations (max length of cycle you are finding). If it becomes a concern for you, one optimization I would suggest for the algorithm is to keep track of the vertex with the lowest internal ID in each path and discard a sequence as soon as it tries to add a vertex with lower internal ID than the starting vertex.

Feel free to reach out if you have any other TigerGraph or GSQL questions!

3 Likes

Hello Jim. Thanks for your timely reply!

I tried the algorithm (tg_cycle_detection.gsql) and it used so much memory (about 80% of my 96G RAM), thus I had to stop the process midway. I already did my best to prune the initial Active vertex set but to no prevail. Do you have any suggestions on reducing memory usage? I am willing to sacrifice some cpu time.

Thanks

1 Like

Hello Luyi. Thanks for following up!

The high memory consumption could be inherent to the problem of finding all cycles. If we consider a complete graph (all edges are present) of n vertex, then each permutation of n (i.e. 3, 1, 2, 4, n, n-1 …) would correspond to a distinct, valid simple cycle, so the size of the answer should be n*n!, which is too much for memory. According the paper of RT algorithm (section 3.1.3 to 3.1.4, complexity analysis for different graphs), the working memory grows exponentially with number of iterations and is only manageable for graph with very low degree or highly sparse and random graph.

For your problem, a quick fix would be to limit the number of iterations. Besides, you can consider if there is a workaround specific to your problem (i.e. is it enough to have the count of cycles or some of the cycles instead of all the cycles).

Enjoy your adventure with TigerGraph and graph algorithms!

3 Likes

Hello Jim,

I was implementing your suggested modification:

… one optimization I would suggest for the algorithm is to keep track of the vertex with the lowest internal ID in each path and discard a sequence as soon as it tries to add a vertex with lower internal ID than the starting vertex…

I was struggling to keep track of per sequence information, such as the lowest internal ID of a sequence, since nesting accumulators inside ListAccum other than ListAccum is impossible. Thus I decided to use GroupbyAccum and created sequence id via gsql_uuid_v4(). However, I was unable to run t.@new_info += (gsql_uuid_v4() -> min((seq_info.min_v, t)), seq_info.seq+[t]) , because the function min() cannot take a set of vertices as input.

Here is my code modified around tg_cycle_detection.gsql:

CREATE OR REPLACE QUERY cycle_detection (SET<STRING> v_type, SET<STRING> e_type, INT min_depth = 2, INT max_depth = 10, INT n_batch = 1, BOOL print_accum = TRUE, STRING file_path = "")  SYNTAX V1 {
  ListAccum<ListAccum<VERTEX>> @curr_list, @new_list;
  ListAccum<ListAccum<VERTEX>> @@cycles_list;
  GroupByAccum<STRING seq_id, VERTEX min_v, ListAccum<VERTEX> seq> @curr_info, @new_info;
  
  FILE f (file_path);
  
  Active = {v_type};
  Active = SELECT s FROM Active:s WHERE s.outdegree(e_type) > 1;
  
  FOREACH i IN RANGE[0, n_batch-1] DO
      # initialization and Early Pruning for batch i
      Active = SELECT s FROM Active:s 
                              WHERE getvid(s) % n_batch == i
                                ACCUM s.@curr_info = (gsql_uuid_v4() -> s, [s]);
  
      WHILE Active.size() > 0 LIMIT max_depth DO 
          Active = SELECT t 
                   FROM Active:s -(e_type:e)- :t 
                   WHERE t.outdegree(e_type) > 1 
                   ACCUM 
                       FOREACH seq_info IN s.@curr_info DO
                           IF t == seq_info.seq.get(0) AND seq_info.seq.size() >= min_depth THEN  
                               IF getvid(seq_info.min_v) > getvid(t) THEN  # if it has the minimal label in the list, report 
                                   IF print_accum THEN 
                                       @@cycles_list += seq_info.seq
                                   END,
                                   IF file_path != "" THEN 
                                       f.println(seq_info.seq) 
                                   END
                               END
                           ELSE IF 
                               seq_info.seq.contains(t) == FALSE 
                               
                           THEN   
                               t.@new_info += (gsql_uuid_v4() -> min((seq_info.min_v, t)), seq_info.seq+[t])   // store sequences in @newList to avoid conflict with @currList
                           END
                      END
                  POST-ACCUM s.@curr_info.clear();
          Active = SELECT t FROM Active:t    
                  POST-ACCUM t.@curr_info = t.@new_info,
                             t.@new_info.clear()
                  HAVING t.@curr_info.size() > 0;  # IF receive no sequences, deactivate it;
      END;
      _t = SELECT s FROM ANY:s
              WHERE s.@curr_info.size() > 0 OR s.@new_info.size() > 0 
              POST-ACCUM s.@curr_info.clear(), 
                         s.@new_info.clear();
           
  END;
  
  IF print_accum THEN
      PRINT @@cycles_list.size() as n_cycles;
      PRINT @@cycles_list as cycles;
  END;  
}

So how should I track the per sequence information?

Thanks

1 Like

Hello Luyi,

Sorry for the confusion. It might not be necessary to explicitly keep track of the vertex with lowest internal ID. Since a detected cycle will only be added into the answer if its last vertex has the lowest ID, and the last vertex is the same as the first vertex, the lowest ID of a sequence should always be getvid(sequence.get(0)).

We can change line 41 of the original code into:
ELSE IF sequence.contains(t) == FALSE AND getvid(sequence.get(0))<getvid(t) THEN
and the optimization is in place.

As a reminder, if there is a complete subgraph of more than 20 vertex, the working memory will quickly grow out of control even with this optimization.

Feel free to reach out if you have any other questions!

3 Likes