GSQL Query - Limit on Each Hop

juguo · May 6, 2024, 9:22am

Does GSQL support limiting on each hop? For example a 2hop pattern A - B - C, could we limit that for each vertex in A, there will be at most 100 edges traversed to B type vertex?

I formerly used Gremlin to write graph queries. And above capability can be achieved simply with in/out().limit(100)

Jim_Limprasert · May 9, 2024, 5:41pm

Hi @juguo ,

That is a good question! The short answer is - we do not support limiting the vertices B for each vertex A in the A - B traversal in GSQL right now, at least not directly.

However, there are probably several ways to achieve the same functionality with the existing GSQL functionality - although due to the extra storage those have, this won’t be very performant on large graphs.

Here is one way to do it:

You’ll have to separate each hop into 1 select statement. E.g. if you have 2-hop pattern, you’ll want to do one SELECT statement for A - B traversal, then another SELECT statement for B - C traversal
You can set up a HeapAccum such that it’ll only contains the top B vertices for each A vertex.
If you want to limit the number of “B” type vertex each vertex A can have, do the A - B traversal, and add in the B vertices towards the HeapAccum of vertex A.
Get all the B vertices into a new vertex set (say, call this selected_B)
Traverse from selected_B to C.

If you want to limit it to something like for the A - B traversal, only select top 100 B vertices (regardless of how many vertices from each individual A vertex to B), it will be much simpler.

Please let us know if you want to see some GSQL code example of this method!

Best,
Supawish Limprasert (Jim)

Jim_Limprasert · May 10, 2024, 3:59pm

Hi @juguo ,

There’s actually a simpler way to do that as I have learnt from another Solution Engineer in our team. Sorry for giving a complicated answer earlier.

We DO support limit on each hop. You can utilize the SAMPLE clause to select a random sample from edges!

You can specify things like “sample 25% of the edges or sample 100 edges”. Note that the sampling is non-deterministic right now. The approach I provided above is considerably more complex but you can do things like “explore top 100 B vertices from each A vertex based off some attribute”.

Please see this page for more details!

In your case, I’d recommend splitting the query into multiple SELECT statements, each containing one hop, and do things like this:

sampled_B = SELECT t
    FROM A:s -(<edge-type>:e)- B:t
    SAMPLE 100 WHEN s.outdegree() >= 1;

result_C = SELECT t
    FROM sampled_B:s -(<edge-type-2>:e)- C:t;

I hope this helps!

Best,
Supawish Limprasert (Jim)

juguo · May 11, 2024, 3:03am

Thank you Jim for the detailed answers! The sample clause solves the problem.