Installing Tiger graph algorithms on windows to use in Graph Studio - WCC, Jacard, Louvian etc

Jon_Herke · February 22, 2022, 2:07pm

Just verifying this is the link you’re referring to https://github.com/tigergraph/gsql-graph-algorithms/tree/master/algorithms/Similarity/jaccard then when clicking on tg_jaccard_nbor_ap.gsql you got a 404 (page doesn’t exist). I’ve also (confirmed) received the error.

Forwarding this on to the Graph Data Science team to get their input on this thread.

AnuroopAjmera · February 23, 2022, 2:41am

Yes, this is the link

AnuroopAjmera · February 28, 2022, 3:57pm

tg_wcc algorithm is not working in my case. The weekly connected components have been allocated different communities/result attribute value. There coul dbe something wrong at my end but I am not able to figure out.

Could anyone please help?

I am available for a zoom call today as well as tomorrow but please consider this on top priority.

My email ID: anuroop.ajmera@gmail.com

@Parker_Erickson @Mohamed_Zrouga @Jon_Herke @Szilard_Barany @Dan_Barkus @Vladimir_Slesarev @Bruno

Jon_Herke · February 28, 2022, 7:55pm

@AnuroopAjmera “tg_wcc algorithm is not working in my case”

Can you elaborate on what isn’t working as expected? What steps did you take? What errors are you receiving?

AnuroopAjmera · March 1, 2022, 9:17am

Hi Jon,
It’s assigning different community IDs to nodes which should be in same community as per source data. I checked this using pivot in excel.

Looks like something wrong at my end.

Could someone please help on priority? I am available to connect over a meeting (zoom call?) - anuroop.ajmera@gmail.com

Regards
Anuroop Anmera

@Jon_Herke @Parker_Erickson @Mohamed_Zrouga @Szilard_Barany @Dan_Barkus @Vladimir_Slesarev @Bruno

AnuroopAjmera · March 2, 2022, 4:38pm

wcc algorithm is not working in my case because unlike neo4j, this algo in Tigergraph expects nodes to be similar but in my case I want to find communities of nodes based on other nodes (properties).

@Jon_Herke @Parker_Erickson @Mohamed_Zrouga @Szilard_Barany @Dan_Barkus @Vladimir_Slesarev @Bruno

AnuroopAjmera · March 6, 2022, 10:28am

Current conclusion on my Tigergraph’s DS project is given below. Work is still in progress so final conclusion will be shared later.

tg_wcc is not working
wcc algorithm is not working in my case because unlike neo4j, this algo in Tigergraph expects nodes to be similar but in my case I want to find communities of nodes based on other nodes (properties).
tg_jaccard_nbor_ap_batch has code issue
Code has bug which is now resolved locally with the help of @Szilard_Barany
tg_louvain is not showing visualizations
Louvain is also not showing any visualizations of communities like shown in Tigergraph videos. “Explore Graph” section is showing some nodes having common cid but Louvain’s output is not hsowing the same.
Absence of weighted degree centrality algorith in Tigergraph:
Details below -
Weighted degree centrality algorithm
Not able to load 30 MB file:
Details below:
Data load getting stuck for 16 MB file
Changes done in gsql ar enot reflecting in Tigergraph GraphStudio:
Details below:
How to get GSQL changes reflected in GraphStudio

@Jon_Herke @Parker_Erickson @Mohamed_Zrouga @Szilard_Barany @Dan_Barkus @Vladimir_Slesarev @Bruno @Pawan_Mall

AnuroopAjmera · March 10, 2022, 9:47am

want to apply maxmin normalization but getting error when doing max() - min() below, can someone please suggest what is the issue here?

GSQL > BEGIN
GSQL > CREATE QUERY degree_cent_res() FOR GRAPH NetworkAnomalyDetection SYNTAX v2 {
GSQL > SELECT s.SourceAddress, (max(s.degree) - min(s.degree)) as risk_score INTO T
GSQL > FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
GSQL > GROUP BY s.SourceAddress
GSQL > ORDER BY risk_score DESC
GSQL > LIMIT 100;
GSQL > PRINT T;
GSQL > }
GSQL > END
Index -1 out of bounds for length 2
Failed to create queries: [degree_cent_res].

@Szilard_Barany @Dan_Barkus @Parker_Erickson @Jon_Herke @Renchu_Song @Mohamed_Zrouga @Bruno @Pawan_Mall @Vladimir_Slesarev @markmegerian

AnuroopAjmera · March 10, 2022, 5:37pm

can anyone please suggest?

@Szilard_Barany @Dan_Barkus @Parker_Erickson @Jon_Herke @Renchu_Song @Mohamed_Zrouga @Bruno @Pawan_Mall @Vladimir_Slesarev @markmegerian

markmegerian · March 10, 2022, 7:57pm

I have a few observations and questions for you

The use of the SQL-like syntax is not appropriate in this case, you should use the regular graph traversal syntax. The main reason being that you can only include base attributes in the SELECT list and you are trying to use the degree function
The degree function has parentheses after it, degree( )
Will you have multiple SourceIP vertices with the same SourceAddress value? If so, then you can summarize using a MapAccum, like this


 MapAccum<STRING, MinAccum<INT>> @@minDegree;
  MapAccum<STRING, MaxAccum<INT>> @@maxDegree;
  
  S = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)-  SessionEvent:se 
            ACCUM @@minDegree += (s.SourceAddress -> s.degree()), 
                          @@maxDegree += (s.SourceAddress -> s.degree())

if not (i.e. each SourceIP has a unique SourceAddress) then its even simpler, you can just use the vertex-attached accum, like this

  MinAccum<INT> @minDegree;
  MaxAccum<INT> @maxDegree, @riskScore;
  
  S = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)-  SessionEvent:se 
            ACCUM s.@minDegree += s.degree(),
                          s.@maxDegree += s.degree()
           POST-ACCUM  s.@riskScore += s.@maxDegree - s.@minDegree;

AnuroopAjmera · March 11, 2022, 9:39am

@markmegerian
Here I am not intending to use degree function, I have added an attribute named “degree” in vertex and captured the output of degree centrality algorithm in that attribute.
And now I want to apply minmax normalisation and facing above mentioned index out of bound issue.

In this case, can’t I use simple SQL syntax? Kindly suggest.

@Szilard_Barany @Dan_Barkus @Parker_Erickson @Jon_Herke @Renchu_Song @Mohamed_Zrouga @Bruno @Pawan_Mall @Vladimir_Slesarev @markmegerian

AnuroopAjmera · March 11, 2022, 10:37am

@markmegerian
Getting an error "no viable alternative at input ‘s.degree()\n s’ at line s.@maxDegree += s.degree() from below code -

CREATE QUERY degree_centrality_res() FOR GRAPH NetworkAnomalyDetection {
MinAccum @minDegree;
MaxAccum @maxDegree, @riskScore;

S = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
ACCUM s.@minDegree += s.degree(),
s.@maxDegree += s.degree()
POST-ACCUM s.@riskScore += s.@maxDegree - s.@minDegree;
}

@Szilard_Barany @Parker_Erickson @Renchu_Song @Mohamed_Zrouga @Pawan_Mall @Vladimir_Slesarev @markmegerian

markmegerian · March 11, 2022, 2:21pm

try specifying SYNTAX V2

CREATE QUERY degree_centrality_res() FOR GRAPH NetworkAnomalyDetection SYNTAX V2 {

AnuroopAjmera · March 11, 2022, 3:25pm

@markmegerian

This is working fine without brackets with degree as that’s an attribute as mentioned earlier but it is not giving desired results.

Actually, I have to first calculate max and min of degree across the graph and then
I have to calculate risk score using max-min normalisation as below -

riskScore += (1 - (s.degree - s.@minDegree) / (s.@maxDegree - s.@minDegree));

Could you kindly suggest?

@Szilard_Barany @Parker_Erickson @Renchu_Song @Mohamed_Zrouga @Pawan_Mall @Vladimir_Slesarev

AnuroopAjmera · March 15, 2022, 5:46am

Could someone please help?

I have to first calculate max and min of degree (node attribute) across the graph and then
calculate risk score per node using max-min normalisation as below -

riskScore += (1 - (s.degree - s.@minDegree) / (s.@maxDegree - s.@minDegree));

@markmegerian @Szilard_Barany @Parker_Erickson @Renchu_Song @Mohamed_Zrouga @Pawan_Mall @Vladimir_Slesarev

AnuroopAjmera · March 15, 2022, 5:47am

This has to be done on output of degree centrality algorithm which I have saved in “degree” attribute on each SourceIP node.

@markmegerian @Szilard_Barany @Parker_Erickson @Renchu_Song @Mohamed_Zrouga @Pawan_Mall @Vladimir_Slesarev

AnuroopAjmera · March 15, 2022, 12:02pm

@markmegerian

I am able to calculate min and max correctly at graph level but risk score calculation at node level is not working correctly. Not sure what kind of variable should @riskscore be defined and what is the issue with calculation?

CREATE QUERY degree_centrality_res() FOR GRAPH NetworkAnomalyDetection SYNTAX V2{
MinAccum @@minDegree;
MaxAccum @@maxDegree;
SumAccum @riskScore;

S = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
ACCUM @@minDegree += s.degree,
@@maxDegree += s.degree;
T = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
ACCUM
s.@riskScore += (1 - (s.degree - @@minDegree) / (@@maxDegree - @@minDegree));
PRINT @@minDegree;
PRINT @@maxDegree;
PRINT T[T.@riskScore];
}

@Szilard_Barany @Parker_Erickson @Renchu_Song @Mohamed_Zrouga @Pawan_Mall @Vladimir_Slesarev

markmegerian · March 15, 2022, 1:53pm

You said its not working correctly, but what is the problem?
Also, since you are calculating the risk score at the node level, why do you need to hop to the SessionEvent? This is going to cause an issue when a SourceIP has multiple edges, because it will accumulate each time. Here is a version the removes that hop, does this work?

CREATE QUERY degree_centrality_res() FOR GRAPH NetworkAnomalyDetection SYNTAX V2{
MinAccum<INT> @@minDegree;
MaxAccum<INT> @@maxDegree;
SumAccum<DOUBLE>  @riskScore;

S = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
ACCUM @@minDegree += s.degree,
@@maxDegree += s.degree;
T = SELECT s FROM SourceIP:s 
ACCUM
s.@riskScore += (1.0  - (s.degree - @@minDegree) / (@@maxDegree - @@minDegree));
PRINT @@minDegree;
PRINT @@maxDegree;
PRINT T[T.@riskScore];
}

A few other comments/ questions:

(1) on your first query, you also have the hop to sessionevent but you arent using any attribute from that. is that intentional? is the idea that you have only calculated the degree attribute for SourceIP vertices that have an edge to SessionEvent?
(2) using degree as an attribute name may work, but I would avoid using a reserved word or fucnction name to avoid confusion or perhaps creating a syntax error
(3) if you must have the connection to sessionevent on both queries for some reason, you can change the 2nd query to use a POST ACCUM to only calculate it once per vertex, like this:

T = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
POST-ACCUM
s.@riskScore += (1 - (s.degree - @@minDegree) / (@@maxDegree - @@minDegree));

Finally, you could do this all in one step

S = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
ACCUM @@minDegree += s.degree,
@@maxDegree += s.degree;
POST-ACCUM
s.@riskScore += (1 - (s.degree - @@minDegree) / (@@maxDegree - @@minDegree));

AnuroopAjmera · March 15, 2022, 5:39pm

Thanks so much @markmegerian

Below 3 versions are working for me -

Note the accumulator for riskscore

CREATE QUERY degree_centrality_res() FOR GRAPH NetworkAnomalyDetection SYNTAX V2{
MinAccum @@minDegree;
MaxAccum @@maxDegree;
MaxAccum @riskScore;

S = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
ACCUM @@minDegree += s.degree,
@@maxDegree += s.degree;
T = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
ACCUM
s.@riskScore += (1 - (s.degree - @@minDegree) / (@@maxDegree - @@minDegree));
PRINT @@minDegree;
PRINT @@maxDegree;
PRINT T[T.degree, T.@riskScore];
}

CREATE QUERY degree_centrality_res_v2() FOR GRAPH NetworkAnomalyDetection SYNTAX V2{
MinAccum @@minDegree;
MaxAccum @@maxDegree;
SumAccum @riskScore;

S = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
ACCUM @@minDegree += s.degree,
@@maxDegree += s.degree
POST-ACCUM
s.@riskScore += (1 - (s.degree - @@minDegree) / (@@maxDegree - @@minDegree));

PRINT @@minDegree;
PRINT @@maxDegree;
PRINT S[S.degree, S.@riskScore];
}

CREATE QUERY degree_centrality_res_v3() FOR GRAPH NetworkAnomalyDetection SYNTAX V2{
MinAccum @@minDegree;
MaxAccum @@maxDegree;
SumAccum @riskScore;

S = SELECT s FROM SourceIP:s -(HAS_SESSION_EVENT>:e)- SessionEvent:se
ACCUM @@minDegree += s.degree,
@@maxDegree += s.degree;
T = SELECT s FROM SourceIP:s
ACCUM
s.@riskScore += (1.0 - (s.degree - @@minDegree) / (@@maxDegree - @@minDegree));
PRINT @@minDegree;
PRINT @@maxDegree;
PRINT T[T.degree, T.@riskScore];
}

@Szilard_Barany @Parker_Erickson @Renchu_Song @Mohamed_Zrouga @Pawan_Mall @Vladimir_Slesarev

markmegerian · March 15, 2022, 6:07pm

great ! i like version 2 the best