Working with subgraph

Maatdeamon · July 5, 2020, 11:28am

Hi quick question even tho i have some intuition about it,

What is the envisioned way to work with subgraph in GSQL ?

What i mean is let say i have a very large graph, do a subset selection of it, and would like to run a centrality algorithm on that sub-graph. What would be the GSQL approach to handle it.

PS: so far, what i am thinking about is via query parameter, but not yet clear, about the all structure of the program.

Maatdeamon · July 7, 2020, 7:40am

@Richard_Henderson any idea on this ?

Richard_Henderson · July 7, 2020, 1:01pm

Having reviewed the documentation again, it [still] looks like the only way to pass sub-graphs is with a SET parameter. This can also be returned.

I would expect this to be somewhat less efficient than using an inline vertex seed set (without calling another query), but that will very much depend on what you are doing in the nested function, and the size of the subset. If it isn’t huge, then it should be okay.

I think we would need to look at the specific case to see if that was the best approach.

vic · June 14, 2022, 7:45am

Hello

I’m trying to tackle a similar problem, I have a graph whose edges have a timestamp parameter. I’m trying to query a subgraph, by selecting only vertexes connected by edges with a timestamp within a certain date range.

Then I would like to run a PageRank only on the queried vertexes and edges, which means that, even in my vertexes subset, some of the edges might fall outside of my selected date range.

@Richard_Henderson what would you think is the best way to achieve this, granted that I’m planning to tune in a little the code for the Page Rank to take into account the custom input of vertexes and edges?

So far my ideas were:

tag the result vertexes and edges with a specific attribute, then add in the PageRank code to select only vertexes and edges with that attribute set to true.
add a custom vertex and edge type in the schema, then literally copy the result vertexes/edges as that custom type, then run the PageRank on that type.
model the edges as vertexes, this would give me more freedom to query, but I’ll have to deeply change the PageRank algorithm, or any other algo I’m planning to use. Also it doesn’t feel that elegant give that you create a whole bunch of nodes with fixed degree 2 who should in fact be edges…

Thanks for any help…

Richard_Henderson · August 1, 2022, 10:03am

Sorry for being late to this.

Changing the pagerank algo to do this should be straightforward, basically adding a filter on the edge traversal. This assumes you have already been able to express those edges i.e. you aren’t adding multiple edges of the same type between the same two vertices (albeit on different dates).

This is one of the features of the way we do things, you can change the algo’s to match what you need.

The pagerank algo’s are relatively easy to understand.

I do sometimes use custom edge types, but usually only for the “latest” edge, as that is often the most frequently needed. Pre-processing before every query sounds a little heavy, but might be worth it to use the algo code without modification, and to avoid any performance penalty associated with extra filtering/edge iteration. A key point here is that a single routine in TG cannot read its own writes.

Let me know how you get on.