Performance issues with a simple graph traversal and self join

assafyifrach · May 16, 2023, 2:38pm

Hi,
Please help me to speed up my query !

I look into a blockchain use-case.
I uploaded 25M transactions to a graph.

This is my data schema

        ADD VERTEX Wallet (PRIMARY_ID address INT) WITH primary_id_as_attribute="TRUE";
        ADD VERTEX Tx (PRIMARY_ID tx_id INT, block_timestamp DATETIME, value FLOAT) WITH primary_id_as_attribute="TRUE";
        ADD DIRECTED EDGE TxIn (FROM Tx, TO Wallet) WITH REVERSE_EDGE="TxIn_Reversed";
        ADD DIRECTED EDGE TxOut (FROM Wallet, TO Tx) WITH REVERSE_EDGE="TxOut_Reversed";

I cannot model a transaction as an edge between two wallets because X could send Y assets multiple times.

Then, given a source wallet, I would like to find all wallets that received transactions from this source.

And then for each such neighbor, I would like to iterate through the cartesian product of incoming X outgoing transactions.
For simplicity, assume I would like to count the pairs in the cartesian product for each neighbor (in practice, I need to perform a sophsiticated computation).

This is my query:

    SumAccum<INT> @fast_liquidation;
    Sources = {source};

    Neighbors = SELECT w
              FROM Sources:x - (TxOut>) - (Tx) - (TxIn>) - Wallet:w;

    InTxs = SELECT t1
              FROM Neighbors:w - (TxIn_Reversed>) - Tx:t1;

    Pairs = SELECT w
             FROM InTxs:t1 - (TxIn>) - Neighbors:w - (TxOut>) - Tx:t2
             ACCUM w.@fast_liquidation += 1;

I installed it and ran it on a few vertices.

I expect the running time to be roughly linear in the sum of the cartesian products sizes of the neighbors.
I.e. the sum of @fast_liquidation for all results.

However, in my examples, the this sum is only 10000, while the running time is 5s.
The first two SELECT statements run in 700ms.
The last SELECT statement runs in 4500ms.

I use TigerGraph in the cloud, using a machine with 64GB RAM.
The ONLY thing the machine is doing at the moment is running my query.
I do not execute them in parallel.
This is the only non-negligible graph hosted on that machine.

My graph is very small. The data in the source CSV files is less than 2GB.
Therefore, computationally, I expect the DB to hold everything in RAM,
and complete the computation in under 1ms.
If I remove the ACCUM w.@fast_liquidation += 1 operation, the query still runs in 2900ms.

5s is useless performance for my production environment. That is even before I add more sophisticated logic !

Looking forward for an advice / insight !

Monitoring insight:
looking into the Admin Portal at Admin Portal
I see that during query execution the CPU reaches 99%, while before and after execution it is roughly at 3%.
RAM usage is stable at 12GB before, during and after execution.

Wyatt_Joyner · May 16, 2023, 4:35pm

Hi there!

We actually support edge discriminators now: Defining a Graph Schema :: GSQL Language Reference
So you should be able to use direct edges with the timestamp as a discriminator attribute.

Besides that point, if we don’t want all the in-out pairs and instead just want the count, then this V1 version should be really fast (make sure that Syntax V1 is specified):

SumAccum<INT> @fast_liquidation;
Sources = {source};

transactions = 
  SELECT t FROM Sources:s - (TxOut) - Tx:t;
wallets =
  SELECT t FROM transactions:s -(TxIn)- Wallet:t
  POST-ACCUM t.@fast_liquidation += t.outdegree("TxIn_Reversed") * t.outdegree("TxOut");

assafyifrach · May 16, 2023, 5:05pm

Hey Wyatt_Joyner
The count is not useful in my case.
I try the discriminators and will let you know of the results.

assafyifrach · May 16, 2023, 5:43pm

Hey @Wyatt_Joyner ,

I tried it and it did not improve the performance significantly enough.
Can you explain WHY does the query perform so poorly ?

I did come up with a minor optimization

Here is my simplified Schema

        ADD VERTEX Wallet (PRIMARY_ID address INT) WITH primary_id_as_attribute="TRUE";
        ADD DIRECTED EDGE TxOut (FROM Wallet, TO Wallet, DISCRIMINATOR(tx_id INT), block_timestamp DATETIME, value FLOAT) WITH REVERSE_EDGE="TxIn";

The optimized query

    SumAccum<INT> @fast_liquidation;
    Sources = {source};

    Neighbors = SELECT w
              FROM Sources:x - (TxOut>) - Wallet:w;
    
    Sources1 = SELECT s
            FROM Neighbors:w - (TxIn>:t1) - Wallet:s;
            
    Pairs = SELECT w
             FROM Sources1 - (TxOut>:t1) - Neighbors:w - (TxOut>:t2) - Wallet
             ACCUM w.@fast_liquidation += 1;
                  
    PRINT Neighbors;

Wyatt_Joyner · May 17, 2023, 8:15am

Hey Assaf,

Hm, how exactly does this POST-ACCUM t.@fast_liquidation += t.outdegree("TxIn_Reversed") * t.outdegree("TxOut"); differ from the calculation you posted? Do you plan on performing a more complex operation for each pair of input and output transactions? Because it should result in the same sum.

Also can you share how many CPU cores/threads you provisioned? Also, are you installing the query or running it in interpreted mode?

assafyifrach · May 17, 2023, 9:48am

Yes, I need to perform further processing on these pairs.
I do not want to share the logic and it is irrelevant.

My question is quite basic though - is my schema/query obviously inefficient ?

If the system was implemented efficiently, the expected COMPUTATION (ignoring constant overhead) running time should be just 1-10ms. Not a few seconds.

Wyatt_Joyner · May 17, 2023, 11:45am

Hey there,

This is the V2 version which uses conjunctive pattern-matching.

SumAccum<INT> @fast_liquidation;
Sources = {source};

Neighbors = 
  SELECT w
  FROM Sources -(TxOut>)- Wallet:w -(TxOut>)- Wallet:t1,
  Wallet:w -(TxIn>)- Wallet:t2
  PER (w, t1, t2)
  ACCUM w.@fast_liquidation += 1;

And this is the V1 version which might compile to more efficient C++:

SumAccum<INT> @fast_liquidation;
MapAccum<EDGE, MaxAccum<VERTEX>> @in_txns;
Sources = {source};

neighbors = 
  SELECT w FROM Sources -(TxOut)- Wallet:w;

neighbors =
  SELECT s FROM neighbors:s -(TxIn:e)- Wallet:t
  ACCUM s.@in_txns += (e -> t);

neighbors =
  SELECT s FROM neighbors:s -(TxOut)- Wallet:t
  ACCUM
    FOREACH (in_edge, in_wallet) IN s.@in_txns DO
      s.@fast_liquidation += 1
    END;

assafyifrach · May 17, 2023, 1:55pm

@Wyatt_Joyner TYVM
It does improve performance to make it feasible for production usecase !

The query I need slightly differs from what you sent.
I need to iterate over all pairs of transactions (t1, t2) per neighbor w.
Since my graph is could have multiple edges between 2 nodes,
then it is not equivalent to iterating over pairs of neighbors.

Neighbors = 
      SELECT w
      FROM Sources -(TxOut>)- Wallet:w -(TxOut>:t2)- Wallet,
                                 Wallet:w -(TxIn>:t1)- Wallet
      PER (w, t1, t2)
      ACCUM w.@fast_liquidation += 1;

In order to measure performance, I first run this query as-is and get a measurement M1.
Then, I duplicate the computation of Neighbors 5 times and measure again.
The difference divided by 5, is a lower bound of the running time of the computation itself.

Bottom line, I get 70ms per computation. Which brings TigerGraph back to the table !!!

This is still high for my simple usecase and I find it hard to believe that my heavy usecase (that includes traversal to depth 5) will hold.
I’ll keep you posted of whether I manage to implement the heavy computation as well.

Wyatt_Joyner · May 17, 2023, 2:36pm

Wonderful! Does the V1 version have similar performance?

Regards,

Wyatt