Hi,
Please help me to speed up my query !
I look into a blockchain use-case.
I uploaded 25M transactions to a graph.
This is my data schema
ADD VERTEX Wallet (PRIMARY_ID address INT) WITH primary_id_as_attribute="TRUE";
ADD VERTEX Tx (PRIMARY_ID tx_id INT, block_timestamp DATETIME, value FLOAT) WITH primary_id_as_attribute="TRUE";
ADD DIRECTED EDGE TxIn (FROM Tx, TO Wallet) WITH REVERSE_EDGE="TxIn_Reversed";
ADD DIRECTED EDGE TxOut (FROM Wallet, TO Tx) WITH REVERSE_EDGE="TxOut_Reversed";
I cannot model a transaction as an edge between two wallets because X could send Y assets multiple times.
Then, given a source wallet, I would like to find all wallets that received transactions from this source.
And then for each such neighbor, I would like to iterate through the cartesian product of incoming X outgoing transactions.
For simplicity, assume I would like to count the pairs in the cartesian product for each neighbor (in practice, I need to perform a sophsiticated computation).
This is my query:
SumAccum<INT> @fast_liquidation;
Sources = {source};
Neighbors = SELECT w
FROM Sources:x - (TxOut>) - (Tx) - (TxIn>) - Wallet:w;
InTxs = SELECT t1
FROM Neighbors:w - (TxIn_Reversed>) - Tx:t1;
Pairs = SELECT w
FROM InTxs:t1 - (TxIn>) - Neighbors:w - (TxOut>) - Tx:t2
ACCUM w.@fast_liquidation += 1;
I installed it and ran it on a few vertices.
I expect the running time to be roughly linear in the sum of the cartesian products sizes of the neighbors.
I.e. the sum of @fast_liquidation
for all results.
However, in my examples, the this sum is only 10000, while the running time is 5s.
The first two SELECT statements run in 700ms.
The last SELECT statement runs in 4500ms.
I use TigerGraph in the cloud, using a machine with 64GB RAM.
The ONLY thing the machine is doing at the moment is running my query.
I do not execute them in parallel.
This is the only non-negligible graph hosted on that machine.
My graph is very small. The data in the source CSV files is less than 2GB.
Therefore, computationally, I expect the DB to hold everything in RAM,
and complete the computation in under 1ms.
If I remove the ACCUM w.@fast_liquidation += 1
operation, the query still runs in 2900ms.
5s is useless performance for my production environment. That is even before I add more sophisticated logic !
Looking forward for an advice / insight !
Monitoring insight:
looking into the Admin Portal at Admin Portal
I see that during query execution the CPU reaches 99%, while before and after execution it is roughly at 3%.
RAM usage is stable at 12GB before, during and after execution.