Multiple hops in old syntax (syntax v1)

Maatdeamon · July 1, 2020, 1:46am

Hi,

I wonder how in the classic GSQL syntax, one could express the following:

SELECT s
            FROM Person:s-(LIKES>) -:msg - (TWEETED_BY>)-Organization:t

That is, I want to find all the person who have liked msg(s) that were twitted by an Organization.

Can someone show me how to write that in the old syntax v1.

Richard_Henderson · July 1, 2020, 7:15am

Sure, it looks like this (apologies for any syntax typos).

Given some organizations (I’ll assume all of them for simplicity as you didn’t specify in the code above), and given reverse edges (since we are traversing from Organization backwards to Message, and then backwards from Message to Person). You’d need those in SYNTAX V2 also.

It moves step by step as follows:

myOrgs = Organization.*;

myMsg = SELECT msg FROM myOrgs -(reverse_TWEETED_BY)-> Message:msg;

myPeople = select s from myMsg -(reverse_LIKES)-> Person:s;

Maatdeamon · July 1, 2020, 9:16am

Thank you so much and i appreciate the answer, as it really help me understand the spirit behind GSQL. I’m coming from a SPARQL background.

Few questions here:

I though about that solution but i was not sure for a couple of reasons:

Let assume that we do not have reverse link in the graph ? would we be stuck then ?
My understanding so far, was that one of the advantage of “Native” Property Graph over Solution such as Triplestore which rely on Joins, was that property graphs with the adjacency list, have nodes act like micro indices, which enable to follow the links quickly … To be fair, this is kind of true in your answer above, but i am surprised that the only way to solve the type of query that my query above represents, is to go backward.

Here the type of query is query where you want to match the start node of your traversal, provided that the entire traversal meet a certain condition. This could be 1 to n hope away. In other words, those are queries where the start nodes correspond to what we want, based, on what they connects to several hops away.

Is there not a way to express those type of query in a forward fashion, using let say accumulator or something else ?

The reason for this is, let say there are a lot of organisation that produces a lot of messages that are not liked, while there are few persons, that like messages, let alone messages that are twitted by Organization. Clearly the reverse approach has to prune a lot of data, but if we were to have a way to navigate forward, then, in the spirit of following the links, there would be much less pruning and the query would be way faster.

Hence, i wonder if with the classic GSQL, i can control how the graph is traversed, and decide to either go forward or backward as above. If so, could you show what would be an equivalent of write this query in a forward fashion using the classic Syntax.

Maatdeamon · July 1, 2020, 11:14am

Here is the Sparql version that implies a lot of join

?p a Person .
?p likes ?msg .
?msg a Message .
?msg twitted_by ?Org .
?Org a Organization .

Here is the Gremlin Version, which seem to do the traversal in a forward fashion as i imagine the advantage would be

g.V().hasLabel("Person").where(
        out('likes').hasLabel('Message').
        out('twitted_by').hasLabel('Organization')
    )

I’m simply trying to understand the difference as there is not traversal in the reverse direction.

Maatdeamon · July 1, 2020, 1:40pm

In addition,

I wonder what is the complexity of intersect ? Is it like a join ? How is it implemented ? is there a documentation to it ?

Richard_Henderson · July 1, 2020, 2:26pm

OK. So it is easy to reverse the sense of the traversal. I think you already have an intuition about how this works.

Generally, graphs are quite physical in the way they work, edges are outward, and physically bound to their source vertices. This is why you need a reverse edge so you can traverse. We could hide the fact that is happening, but the result is identical. To traverse along an edge from a vertex, there needs to be an edge expressed in that direction. We do allow a generic selector in the edge using an underscore (meaning any edge that works) which does mean less typing, but the structures still have to be there.

So (1) Yes, you’d be stuck.

(2) I understand your surprise, it is a little different to the logical world of relational, or indeed the higher declarative abstraction which SParQL presents. Nevertheless, in implementation, there will be indexes and reverse indexes as proxies for edges and reverse edges.

(3) Forward query would like the following, though I’m not sure that is what you are looking for in the original question. If your primary restriction is one organisation, then the pruning happens immediately at the start. Perhaps I confused things by picking all organisations at the start.

Given a single organisation as a parameter (my_query_org), the initial vertex seed set would simply be:

myOrgs = {my_query_org};

For reference:

myPeople = People.*;

myMsg = SELECT msg FROM myPeople -(TWEETED_BY)-> Message:msg;

myOrgs = select o from myMsg -(LIKES)-> Organisation:o;

Maatdeamon · July 1, 2020, 3:36pm

Hi,

1 & 2 ) I understand 1 and 2. And thanks for the answer.

3 ) The code is quite not right. First, no it is not a single organisation, it any organization that has Tweeted a msg and that has been liked by any person. Unfortunately you got the relation backward there.

The relation are, person-(Like)->msg and msg -(TwittedBy)->Org.

I think the Gremlin code summarises it Well

g.V().hasLabel("Person").where(
        out('likes').hasLabel('Message').
        out('twitted_by').hasLabel('Organization')
    )

The Key in Gremlin is the Where step which has they define it, is a filter.

In the same way, your new syntax v2 express it quite easily

SELECT s
            FROM Person:s-(LIKES>) -:msg - (TWEETED_BY>)-Organization:t

The thing is, i’m a trying to understand the nuts and bolts of GSQL, hence i want to know the classic GSQL syntax for it, and understand, how under the hood, Syntax v2, implement/operate the query. There might be a time, where i won’t be happy with the default traversal generated by Syntax V2.

To me that is not quite equivalent to

myOrgs = Organization.*;

myMsg = SELECT msg FROM myOrgs -(reverse_TWEETED_BY)-> Message:msg;

myPeople = select s from myMsg -(reverse_LIKES)-> Person:s;

As a matter if fact syntax v2 does not require the existence of a reverse edge here. Hence there must be a way this is decomposted and executed at run time that could be expressed in the classic GSQL… no ?

Maatdeamon · July 1, 2020, 3:59pm

One way i was thinking about doing it, is accumulate/propagate the path/node as you move forward.

So what i want are the persons that have liked a messages tweeted by any organisation. What i have is a directed graph where, relation are

person-(Like)->msg and msg -(TwittedBy)->Org.

Hence to navigate forward that is according to the direction of the edges available my idea would be to accumulate and propagate the persons in msg and then in Org. Then some how ultimately, merge all the accumulator for each org node in a set (potentially global accumulator), and then print the result which is that set.

Not sure yet but something akin to

people = People.*;

msgs = SELECT p FROM people -(LIKES)-> Message:msg;

ACCUM msg@peop += p

orgs = select o from m:myMsg -(TWITTED_BY)-> o:Org;
ACCUM org@peop += m.@peop

And then somehow, merge all the org@peop within a global accumulator and return the result.

Sorry, just tried quickly on top of my head for the sake of the discussion

Please let me know, if what is happing under the hood, would be closer to that. Indeed, i don’t see how v2 syntax, can decide to increase the number of index behind the scene just to be able to do a chain of match.

Maatdeamon · July 3, 2020, 12:33pm

@Richard_Henderson Am I completely off with this … ?

Richard_Henderson · July 3, 2020, 4:54pm

So, V2 will automatically select the reverse edge . V1 doesn’t, but it is a minor point, you just need to remember to specify it manually.

The reverse route is correct I think. You want to know who has liked any message from any organisation. Doing it the other way round feels odd. Gremlin works DFS (depth first) and is very very slow as a result.

In any case, if you DO want to go the other way, then I’ll assume there are messages that are not sent by an organisation. What V2 does is compose a map accumulator, and attach it to vertices as it moves along. We can do slightly better by attaching a SetAccum <Person> @myPeople . Then when you get to Organisation you can add them up. Which is pretty similar to your thought I think.

In code:

SetAccum <Person> @myPeople;
SetAccum <Person> @@resultsPeople;

allPeople = Person.*;

allMessages = SELECT m  FROM Person:p-(LIKES)->msg:m
ACCUM m.@myPeople += p;

allOrganisations = select o from allMessages:m-(TWEETED_BY)->Organization:o
ACCUM @@resultsPeople += m.@myPeople;

PRINT @@resultsPeople;

Maatdeamon · July 3, 2020, 5:06pm

Thanks that clarify it all.

And yes i understand that passing message is probably slower than chasing pointer in this case. So having the reverse edge makes it faster. I just wanted to understand the behind the scene business. And the answer just does that.

Many thanks