Does GSQL support limiting on each hop? For example a 2hop pattern A - B - C, could we limit that for each vertex in A, there will be at most 100 edges traversed to B type vertex?

I formerly used Gremlin to write graph queries. And above capability can be achieved simply with in/out().limit(100)

That is a good question! The short answer is - we do not support limiting the vertices B for each vertex A in the A - B traversal in GSQL right now, at least not directly.

However, there are probably several ways to achieve the same functionality with the existing GSQL functionality - although due to the extra storage those have, this wonâ€™t be very performant on large graphs.

Here is one way to do it:

Youâ€™ll have to separate each hop into 1 select statement. E.g. if you have 2-hop pattern, youâ€™ll want to do one SELECT statement for A - B traversal, then another SELECT statement for B - C traversal

You can set up a HeapAccum such that itâ€™ll only contains the top B vertices for each A vertex.

If you want to limit the number of â€śBâ€ť type vertex each vertex A can have, do the A - B traversal, and add in the B vertices towards the HeapAccum of vertex A.

Get all the B vertices into a new vertex set (say, call this selected_B)

Traverse from selected_B to C.

If you want to limit it to something like for the A - B traversal, only select top 100 B vertices (regardless of how many vertices from each individual A vertex to B), it will be much simpler.

Please let us know if you want to see some GSQL code example of this method!

Thereâ€™s actually a simpler way to do that as I have learnt from another Solution Engineer in our team. Sorry for giving a complicated answer earlier.

We DO support limit on each hop. You can utilize the SAMPLE clause to select a random sample from edges!

You can specify things like â€śsample 25% of the edges or sample 100 edgesâ€ť. Note that the sampling is non-deterministic right now. The approach I provided above is considerably more complex but you can do things like â€śexplore top 100 B vertices from each A vertex based off some attributeâ€ť.

Please see this page for more details!

In your case, Iâ€™d recommend splitting the query into multiple SELECT statements, each containing one hop, and do things like this:

sampled_B = SELECT t
FROM A:s -(<edge-type>:e)- B:t
SAMPLE 100 WHEN s.outdegree() >= 1;
result_C = SELECT t
FROM sampled_B:s -(<edge-type-2>:e)- C:t;