Could Someone Give me Advice on Optimizing Graph Queries for Large Datasets?

Hello there,

I am working with TigerGraph on a project that involves processing a large dataset; and I am facing some performance challenges. Specifically; the queries I am running are taking longer than expected; and I am looking for advice on how to optimize them for better performance.

I am working with a graph that contains millions of nodes and edges; and I am using a mix of vertex and edge attributes to filter and traverse the graph.

The queries often include multiple filtering conditions; as well as traversal operations that span several hops.

I have made sure to use the appropriate indexes; but the performance is still suboptimal; especially when scaling up the dataset.

What are some common techniques for improving the execution time of graph queries; especially in cases involving large datasets and complex traversals?

How can I efficiently scale my graph without experiencing performance bottlenecks? Are there specific configurations or setups that can improve performance?

Also, I have gone through this post; https://info.tigergraph.com/hubfs/Collateral/TigerGraph-Rise-Future-Graph-WP-Devops which definitely helped me out a lot.

What tools or methods do you recommend for identifying and addressing performance issues in TigerGraph?

Thank you in advance for your help and assistance.

Hi @samzy ,

I have a couple questions:

  • May I ask which TigerGraph version you are currently using? E.g. 3.10.1 or 3.9.3?
  • What is the rough runtimes of the queries you are mentioning? E.g. 3 minutes, 30 minutes…?
  • What is your cluster setup? What is your cluster’s partition number and replica number
  • Are you currently using TG Cloud? TG On-prem? Or TG in Docker?

Here is a list of some of the optimization options you could look more into:

Measuring time of each portion of TigerGraph code:

curr_part_start_time = timestamp();
some_vertex_set = SELECT s FROM ...
curr_part_end_time = timestamp();    
INT curr_part_time_ms = curr_part_end_time - curr_part_start_time;
PRINT curr_part_time_ms;

I hope this helps to some degree!

Best,
Supawish Limprasert (Jim)
Solution Engineer, TigerGraph