Installing Tiger graph algorithms on windows to use in Graph Studio - WCC, Jacard, Louvian etc

Hi, I am working on a project and have loaded data successfully in my schema. Now I have to apply Tiger graph Algorithms like WCC and then Jaccard and Louvian. But I am not sure how to install these algorithms.

Around 30 minutes in video for “Fundamentals of the Data Science Library” shows how to install algorithms from Tigergraph CLI but after downloading the CLI, it gets closed within a second of opening it.

Tigergraph page mentioned in video: GitHub - TigerGraph-DevLabs/TigerGraph-CLI

Tigergraph CLI link for Windows: https://tigertool.tigergraph.com/dl/windows/tgcli.exe

Any help will be greatly appreciated.

Regards
Anuroop

video link

Can anyone please suggest here, it’s been 19 hours I am waiting for some help.

I have started missing my timelines now.

Hi Anuroop,

I’m sorry you’re having trouble with the TigerGraph CLI. We’ll look into issues regarding the CLI running on windows.

In the meantime, can you try the GSQL installation method for the Graph algorithm?
You can find the repository of GSQL algorithms here. To install, you simply paste the Query body into the GSQL terminal as defined in the README of each algo, or outlined like this:

GSQL > BEGIN
# Paste <K Nearest Neighbors Algorithm> code after BEGIN command
GSQL > END 
GSQL > INSTALL QUERY <K Nearest Neighbors Algorithm>

Let me know if this works for you, and I’ll look into the CLI issues on Windows.

Thanks,
Dan

Hi Dan,

I tried to install 4 algos -

  1. 2 Algos got created sucessfully, but…
GSQL > USE GRAPH NetworkAnomalyDetection
Using graph 'NetworkAnomalyDetection'
GSQL > @tg_wcc.gsql
Successfully created queries: [tg_wcc].
GSQL > @tg_jaccard_nbor_ap_batch.gsql
Successfully created queries: [tg_jaccard_nbor_ap_batch].

When I tried to use those in “Write Queries” section in GraphStudio, I got below error for both the above algos -
Semantic Check Fails: The graph null does not exist!

  1. I got below error for louvian algo
    Please note that I have replaced @ symbol with ‘at the rate’ as I am not able to put more than 2 @ symbols as a new user, it’s taking @ symbol as if I am referring other users.
GSQL > @tg_louvain.gsql
Warning in query tg_louvain (WARN-5): line 72, col 25
The comparison '-t.'at the rate'max_best_move.weight==t.'at the rate'sum_cc_weight' may lead to
unexpected behavior because it involves equality test between float/double
numeric values. We suggest to do such comparison with an error margin, e.g.
'abs((-t.'at the rate'max_best_move.weight) - (t.'at the rate'sum_cc_weight)) < epsilon', where epsilon
is a very small positive value of your choice, such as 0.0001.
Warning in query tg_louvain (WARN-5): line 148, col 29
The comparison '-s.'at the rate'max_best_move.weight==s.'at the rate'sum_cc_weight' may lead to
unexpected behavior because it involves equality test between float/double
numeric values. We suggest to do such comparison with an error margin, e.g.
'abs((-s.'at the rate'max_best_move.weight) - (s.'at the rate'sum_cc_weight)) < epsilon', where epsilon
is a very small positive value of your choice, such as 0.0001.
Successfully created queries: [tg_louvain].
  1. fastRP algo threw below error and failed:
GSQL > @tg_fastRP.gsql

Type Check Error in query tg_fastRP (TYP-158): line 91, col 53
's.fastrp_embedding' indicates no valid vertex type.
Possible reasons:

- The expression refers to a primary_id, which is not directly
usable in the query body. To use primary_id, declare it as an
attribute. E.g "CREATE VERTEX Person (PRIMARY_ID ssn string, ssn string, age
int)"
- The expression has misspelled an attribute, or a vertex name

**Failed to create queries: [tg_fastRP].**

@Dan_Barkus could you please suggest on priority?
@Jon_Herke

Best Regards
Anuroop Ajmera
+91-8007211700

@AnuroopAjmera , thanks for raising this issue , tgcli windows binaries are being updated ( roadmap : next 2 weeks ) , in order to load a gsql script you can simply use pyTigerGraph as follows :

import pyTigerGraph as tg 

conn = tg.TigerGraphConnection(host="https://<host>.i.tgcloud.io",user="tigergraph",password="tigergraph")

gsql_query ="""
CREATE QUERY tg_pagerank_wt (STRING v_type, STRING e_type, STRING wt_attr,
 FLOAT max_change=0.001, INT max_iter=25, FLOAT damping=0.85, INT top_k = 100,
 BOOL print_accum = TRUE, STRING result_attr =  "", STRING file_path = "",
 BOOL display_edges = FALSE) {
/*
 Compute the pageRank score for each vertex in the GRAPH
 In each iteration, compute a score for each vertex:
     score = (1-damping) + damping*sum(received scores FROM its neighbors).
 The pageRank algorithm stops when either of the following is true:
 a) it reaches max_iter iterations;
 b) the max score change for any vertex compared to the last iteration <= max_change.
 v_type: vertex types to traverse          print_accum: print JSON output
 e_type: edge types to traverse            result_attr: INT attr to store results to
 wt_attr: attribute for edge weights
 max_iter: max #iterations                 file_path: file to write CSV output to
 top_k: #top scores to output              display_edges: output edges for visualization
 max_change: max allowed change between iterations to achieve convergence
 damping: importance of traversal vs. random teleport

 This query supports only taking in a single edge for the time being (8/13/2020).
*/
TYPEDEF TUPLE<VERTEX Vertex_ID, FLOAT score> Vertex_Score;
HeapAccum<Vertex_Score>(top_k, score DESC) @@top_scores_heap;
MaxAccum<FLOAT> @@max_diff = 9999;    # max score change in an iteration
SumAccum<FLOAT> @sum_recvd_score = 0; # sum of scores each vertex receives FROM neighbors
SumAccum<FLOAT> @sum_score = 1;           # initial score for every vertex is 1.
SetAccum<EDGE> @@edge_set;             # list of all edges, if display is needed
SumAccum<FLOAT> @sum_total_wt;
FILE f (file_path);

Start = {v_type};
 # Calculate the total weight for each vertex
Start = SELECT s                
        FROM Start:s -(e_type:e) -> v_type:t
        ACCUM s.@sum_total_wt += e.getAttr(wt_attr, "FLOAT"); 
            
# PageRank iterations	
# Start with all vertices of specified type(s)
WHILE @@max_diff > max_change LIMIT max_iter DO
    @@max_diff = 0;
    V = SELECT s
	FROM Start:s -(e_type:e)-> v_type:t
	ACCUM t.@sum_recvd_score += s.@sum_score * e.getAttr(wt_attr, "FLOAT")/s.@sum_total_wt
	POST-ACCUM s.@sum_score = (1.0-damping) + damping * s.@sum_recvd_score,
		   s.@sum_recvd_score = 0,
		   @@max_diff += abs(s.@sum_score - s.@sum_score');
END; # END WHILE loop
# Output
IF file_path != "" THEN
    f.println("Vertex_ID", "PageRank");
END;

V = SELECT s 
    FROM Start:s
    POST-ACCUM 
        IF result_attr != "" THEN 
            s.setAttr(result_attr, s.@sum_score) 
        END,
   
	IF file_path != "" THEN 
            f.println(s, s.@sum_score) 
        END,
   
	IF print_accum THEN 
            @@top_scores_heap += Vertex_Score(s, s.@sum_score) 
        END;
	
IF print_accum THEN
    PRINT @@top_scores_heap;
    IF display_edges THEN
        PRINT Start[Start.@sum_score];
	Start = SELECT s
		FROM Start:s -(e_type:e)-> v_type:t
		ACCUM @@edge_set += e;
	PRINT @@edge_set;
    END;
END;
}

"""
res = conn.gsql("USE GRAPH <graphname>")
res = conn.gsql(gsql_query)

print(res)

feel free to use any query from the Graph Algo Library and that should work for you !

@AnuroopAjmera also for fastRP, be sure to follow the instructions in the algorithm github to install the query c++ code (all in the readme section for the algorithm). Additional instructions and background on deploying UDF functions are here

Thanks Zrouga for your response!
2 quick questions -

  1. Where should I run the above statements - jupyter notebook?
  2. In conn statement what should I use for “host” as I am working on my windows laptop and not on cloud? Shall I use localhost?

Regards

why the gsql code given in “tg_fastRP.gsql” won’t work? Why do developers have to do so many things to use algorithms?

Is TigerGraph a half baked product?

I don’t know coding in C++, what should I do?

Regards
Anuroop

@Mohamed_Zrouga @Pawan_Mall

Hi Zrouga,
I used below statement for pagerank algo for connection -

conn = tg.TigerGraphConnection(host="localhost", restppPort=9000, gsPort=14240, graphname="NetworkAnomalyDetection", username="tigergraph", password="tigergraph")

By this I could create this algo to be visible in TigerGraph but on running, this is also throwing below error, same as Jacard and wcc which I implemented directly so what’s the point in using pyTigerGraph?

Semantic Check Fails: The graph null does not exist!

@Vladimir_Slesarev @Pawan_Mall

I used pyTigerGraph approach but got the same error for fastRP as below -

Type Check Error in query tg_fastRP (TYP-158): line 92, col 53
's.fastrp_embedding' indicates no valid vertex type.
Possible reasons:

- The expression refers to a primary_id, which is not directly
usable in the query body. To use primary_id, declare it as an
attribute. E.g "CREATE VERTEX Person (PRIMARY_ID ssn string, ssn string, age
int)"
- The expression has misspelled an attribute, or a vertex name

Failed to create queries: [tg_fastRP].

@AnuroopAjmera I’ve reached out to the Graph Data Science team at TigerGraph to help with the issue you’re running into. The FastRP algorithm does use a User Defined Function (C++). It could be that the UDF wasn’t installed and so you are having issues with the FastRP implementation. I will have them reach out shortly.

@AnuroopAjmera - the reason that you are getting this error is that FastRP assumes there is a vertex attribute in your schema of type LIST named fastrp_embedding. This will allow you to store your embeddings on each vertex in the graph, rather than returning them over the REST API or writing them to a file. This is listed on the documentation here: https://github.com/tigergraph/gsql-graph-algorithms/tree/master/algorithms/GraphML/Embeddings/FastRP#running-queries. If you do not want to modify your schema, then you need to remove this part of the query, specifically lines 90-93:

IF result_attr != "" THEN
		storeEmbeddings = SELECT s FROM verts:s POST-ACCUM s.fastrp_embedding = s.@final_embedding_list;
END;

Hope this helps, let us know if you have more questions!

@Dan_Barkus @Jon_Herke @Pawan_Mall @Vladimir_Slesarev @Parker_Erickson

Can someone suggest on this issue?

@Dan_Barkus @Jon_Herke @Pawan_Mall @Vladimir_Slesarev @Parker_Erickson

Can someone suggest on this issue on Louvain algo?

Thanks @Parker_Erickson , I will try out this solution.

Are we saying that C++ is not required to be dealth with, at user (my) end?

@Pawan_Mall

hmm your Louvain seems to have been created successfully - you need to issue the INSTALL QUERY command to compile the algorithm and you should be able to run it (ignore the float/double comparison warning in the interpreted mode).

If you encounter graph name errors, simply add the graph name (FOR GRAPH my_graph1) to the query after the parameter definition like so

CREATE QUERY tg_louvain(SET v_type, SET e_type, STRING wt_attr = “weight”, INT max_iter = 10, STRING result_attr = “cid”, STRING file_path = “”, BOOL print_info = FALSE) FOR GRAPH my_graph1 {
/*
louvain community detection algorithm
add keyword DISTRIBUTED for cluster environment …
}