Page-rank algorithm has bug

tg_pagerank

Below algorithm works incorrectly if you have sink nodes.

Sample Graph
image

In tg_algorithm you are going to return 1 for both page 0 & page 1, because you are not considering sink nodes(node with zero outward edges)

As per Wikipedia https://en.wikipedia.org/wiki/PageRank

If a page has no links to other pages, it becomes a sink and therefore terminates the random surfing process. If the random surfer arrives at a sink page, it picks another [URL](https://en.wikipedia.org/wiki/Uniform_Resource_Locator) at random and continues surfing again.

When calculating PageRank, pages with no outbound links are assumed to link out to all other pages in the collection. Their PageRank scores are therefore divided evenly among all other pages.
CREATE QUERY tg_pagerank (STRING v_type="Person", STRING e_type="friend",
 FLOAT max_change=0.001, INT max_iter=5, FLOAT damping=0.85, INT top_k = 100,
 BOOL print_accum = TRUE, STRING result_attr =  "", STRING file_path = "",
 BOOL display_edges = FALSE) SYNTAX V1 {

/*
 Compute the pageRank score for each vertex in the GRAPH
 In each iteration, compute a score for each vertex:
     score = (1-damping) + damping*sum(received scores FROM its neighbors).
 The pageRank algorithm stops when either of the following is true:
 a) it reaches max_iter iterations;
 b) the max score change for any vertex compared to the last iteration <= max_change.
 v_type: vertex types to traverse          print_accum: print JSON output
 e_type: edge types to traverse            result_attr: INT attr to store results to
 max_iter; max #iterations                 file_path: file to write CSV output to
 top_k: #top scores to output              display_edges: output edges for visualization
 max_change: max allowed change between iterations to achieve convergence
 damping: importance of traversal vs. random teleport

 This query supports only taking in a single edge for the time being (8/13/2020).
*/
TYPEDEF TUPLE<VERTEX Vertex_ID, FLOAT score> Vertex_Score;
HeapAccum<Vertex_Score>(top_k, score DESC) @@top_scores_heap;
MaxAccum<FLOAT> @@max_diff = 9999;    # max score change in an iteration
SumAccum<FLOAT> @sum_recvd_score = 0; # sum of scores each vertex receives FROM neighbors
SumAccum<FLOAT> @sum_score = 1;           # initial score for every vertex is 1.
SetAccum<EDGE> @@edge_set;             # list of all edges, if display is needed
FILE f (file_path);

# PageRank iterations	
Start = {v_type};                     # Start with all vertices of specified type(s)
uint start=udf_zetta_start_timer();
WHILE @@max_diff > max_change 
    LIMIT max_iter DO
        @@max_diff = 0;
    V = SELECT s
	FROM Start:s -(e_type:e)- v_type:t
	ACCUM 
            t.@sum_recvd_score += s.@sum_score/(s.outdegree(e_type)) 
	POST-ACCUM 
            s.@sum_score = (1.0-damping) + damping * s.@sum_recvd_score,
	    s.@sum_recvd_score = 0,
	    @@max_diff += abs(s.@sum_score - s.@sum_score');
END; # END WHILE loop
print ("TG - Page Rank Computation "+to_string(udf_zetta_elapsed_time(start)));
# Output
IF file_path != "" THEN
    f.println("Vertex_ID", "PageRank");
END;
V = SELECT s 
    FROM Start:s
    POST-ACCUM 
        IF result_attr != "" THEN 
            s.setAttr(result_attr, s.@sum_score) 
        END,
   
	IF file_path != "" THEN 
            f.println(s, s.@sum_score) 
        END,
   
	IF print_accum THEN 
            @@top_scores_heap += Vertex_Score(s, s.@sum_score) 
        END;

IF print_accum THEN
    PRINT @@top_scores_heap;
    IF display_edges THEN
        PRINT Start[Start.@sum_score];
	Start = SELECT s
	        FROM Start:s -(e_type:e)- v_type:t
	        ACCUM @@edge_set += e;
        PRINT @@edge_set;
    END;
END;
   
  File F("/tmp/tg_out");
   v= SELECT v1 from Start:v1
     ACCUM F.PRINTLN(v1.@sum_score,v1.id);
}**strong text**```