Difference between ACCUM and POST_ACCUM

jimwu · May 3, 2021, 4:39pm

What is the difference between the two GSQL query below?

Start = SELECT s FROM Start:s
POST-ACCUM
do-something;

Start = SELECT s FROM Start:s
ACCUM
do-something;

Jon_Herke · May 3, 2021, 6:13pm

ACCUM vs. POST-ACCUM

ACCUM and POST-ACCUM clauses are computed in stages, where in a SELECT-FROM-WHERE query block, ACCUM is executed first, followed by the POST-ACCUM clause.

ACCUM executes its statement(s) once for each matched edge (or path) of the FROM clause pattern. Further, ACCUM parallely executes its statements for all the matches.

POST-ACCUM executes its statement(s) once for each involved vertex. Note that each statement within the POST-ACCUM clause can refer to either source vertices or target vertices but not both.

Source: https://www.tigergraph.com/blog/accumulator-101/

jimwu · May 3, 2021, 6:44pm

Thanks Jon!

I do understand what you said when ACCUM and POST_ACCUM are used in the same SELECT statement. What I am asking is what the difference is when the SELECT statement only has ACCUM or POST_ACCUM?

Richard_Henderson · May 4, 2021, 4:42pm

No practical difference as far as I know. In that situation it is all effectively a POST-ACCUM i.e. vertex attached computation.

jimwu · May 13, 2021, 12:33pm

Thanks for checking @Richard_Henderson!

Based on the explanation in the docs, is it the intention that if accumulation on edges I use ACCUM and accumulation on vertexes I use POST-ACCUM?

Richard_Henderson · May 13, 2021, 4:10pm

Yes, exactly so, though we cannot attach accumulators to edges directly.
This is rarely a problem, and there are some slightly ugly workarounds that can be used if you really want to do that.
This is a side-effect of the rather clever mapping between the BSP way of working (a bit like map-reduce without the shuffle) and the structure of the graph. Accumulators only have a new value after the ACCUM phase of operation is complete.

The way it logically works is :

ACCUM: send a bunch of accumulation messages along each edge to the target vertices.
invisible synchronise step: Accumulate the messages into a new value for affected accumulators.
POST-ACCUM: Operate on the new values of the accumulators in the vertex-set.

One cute trick: the old value of the accumulator is available using the ’ operator (see here: https://docs.tigergraph.com/dev/gsql-ref/querying/operators-functions-and-expressions#previous-value-of-accumulator)

andrew.s.hannigan · August 2, 2021, 4:14pm

If I’m understanding correctly, there would be an important difference between ACCUM and POST-ACCUM in the presence of multi-edges. For instance in a graph with two nodes and say 100 directed edges from node a to b, the ACCUM clause would be executed 100 times, once for each edge. But the POST-ACCUM clause would be executed once since there is only one matched vertex. Is this a correct interpretation?

Jon_Herke · August 6, 2021, 7:16pm

@andrew.s.hannigan This image might help with understanding:

Richard_Henderson · August 9, 2021, 2:48pm

Yes and no :).
The ACCUM clause is indeed run independently and in parallel across all outgoing edges.

The minor point is that you can’t have multiple edges between the same two vertices, but say you had a hundred edges from A to a hundred other vertices, then your reasoning is exactly correct.