Hi @samzy ,
I have a couple questions:
- May I ask which TigerGraph version you are currently using? E.g. 3.10.1 or 3.9.3?
- What is the rough runtimes of the queries you are mentioning? E.g. 3 minutes, 30 minutes…?
- What is your cluster setup? What is your cluster’s partition number and replica number
- Are you currently using TG Cloud? TG On-prem? Or TG in Docker?
Here is a list of some of the optimization options you could look more into:
-
Follow the Best Practices:
- E.g. hard-batching, avoid hub nodes, using fewer container-type accumulators, …
- Best Practice videos seen here:
- GSQL SCHEMA DESIGN & QUERY WRITING BEST PRACTICES: Part 2: Query Writing Best Practices
- Graph Gurus 33: GSQL Writing Best Practices - Part 2 Design Optimal Traversal Plan
- Graph Gurus 36: GSQL Writing Best Practices - Part 3 Memory Optimization
- Graph Gurus 38: GSQL Writing Best Practices - Part 4 Parallelization And Preprocessing
- Graph Gurus 40: GSQL Writing Best Practices - Part 5 Data Structure
-
Try using a distributed mode query (query with DISTRIBUTED keyword), or if you already use one, try to make your query into a single mode query
-
Try to modify the part(s) or SELECT statements that take considerable longer amount of time compared to other parts. This can be especially helpful if there are a lot of SELECT statements
-
Use batching for queries when applicable. While this can result in longer runtimes, it will result in less memory used per batch (e.g. more free memory).
-
Consider moving the subquery logic into the main query, especially if the subquery has SELECT statements and it was used in ACCUM/POST-ACCUM clauses
- Small subqueries that are run outside of ACCUM clauses could be fine
-
Try not to do too many SELECT statements in your query, especially the ones in loop iteration (if you can avoid it).
- E.g. if you have a loop with 10k iterations and each iteration was doing something like 3 SELECT statements, that means it could be doing 30k SELECT statements. Each SELECT statement have a bit of overhead to it, so it could be beneficial in that case to do something like inverting the logic so the iterations are done inside the SELECT statements instead.
Measuring time of each portion of TigerGraph code:
-
using datetime functions (already available) - precise up to 1 second: How to time the graph algorithm execution - #3 by Szilard_Barany
-
using timestamp() function (only for TG 3.10 and above) - precise up to 1 millisecond:
curr_part_start_time = timestamp();
some_vertex_set = SELECT s FROM ...
curr_part_end_time = timestamp();
INT curr_part_time_ms = curr_part_end_time - curr_part_start_time;
PRINT curr_part_time_ms;
I hope this helps to some degree!
Best,
Supawish Limprasert (Jim)
Solution Engineer, TigerGraph