my instance details are below:
-
r4.4xlarge
-
tigergraph@ip-10-0-3-115:~/tigergraph/data/gstore$ du -sh *
16G 0
and below is my Partition size, IDS size, vertex count etc.
tigergraph@ip-10-0-3-115:~/tigergraph/data$ gstatusgraph
=== graph ===
[GRAPH ] Graph was loaded (/home/tigergraph/tigergraph/data/gstore):
[m1 ] Partition size: 5.7GiB, IDS size: 59MiB, Vertex count: 8300040, Edge count: 344764532, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[WARN ] Above vertex and edge counts are for internal use which show approximate topology size of the local graph partition. Use DML to get the correct graph topology information
right now, we have RFM algorithm, Entity resolutions …etc. which is taking too much time to complete, if we have “foreach” complexity in any of the algorithm then there will be a higher chance of timeout of that query. (restpp time is set to 1200 seconds/20 mins)
what configuration would you suggest? to improve the query performance.
Below is my entity resolution for similar_profile
CREATE QUERY resolve_profile(float threshold = 0.6) FOR GRAPH clientC_retail syntax V2{
/* Write query logic here */
//start = {Profile.*};
MapAccum<STRING, SumAccum<FLOAT>> @id_list;
MapAccum<STRING, ListAccum<STRING>> @debug;
SetAccum<EDGE> @@edges;
MapAccum<STRING, FLOAT> @@same_profiles;
SumAccum<float> @score;
#start = {profile_id};
#start = {AM_CUSTOMER.*};
start = select s from AM_CUSTOMER:s;
#PRINT start.size();
#start = {AM_CUSTOMER.*};
all_profiles = {AM_CUSTOMER.*};
#start = select s from start:s accum s.@id_list+=s.account_no;
proxy_profiles = SELECT s FROM start:s-((customer_email|customer_contact|customer_address|dummy_edge):e)-:t-((customer_email|customer_contact|customer_address|dummy_edge):ee)-AM_CUSTOMER:tt
where s.account_no!=tt.account_no
accum
IF s.account_no!=tt.account_no THEN
s.@id_list += (tt.account_no->ee.weight),
#s.@debug += (tt.account_no->ee.type+"-"+to_string(ee.weight)+"er"),
CASE ee.type
WHEN "dummy_edge" THEN
IF trim(s.first_name) != "" and trim(tt.first_name) != "" and (s.first_name == tt.first_name) THEN
s.@id_list += (tt.account_no->0.2)
#s.@debug += (tt.account_no->ee.type+"-"+to_string(0.2) +"io"+ s.account_no +"-"+ tt.account_no)
END,
IF trim(s.surname) != "" and trim(tt.surname) != "" and (s.surname == tt.surname) THEN
s.@id_list += (tt.account_no->0.2)
#s.@debug += (tt.account_no->ee.type+"-"+to_string(0.2) +"io"+ s.account_no +"-"+ tt.account_no)
END,
IF s.dob == tt.dob THEN
s.@id_list += (tt.account_no->0.2)
#s.@debug += (tt.account_no->ee.type+"-"+to_string(0.2) +"io"+ s.account_no +"-"+ tt.account_no)
END
end
END
POST-ACCUM
FOREACH (neighbor,score) IN s.@id_list DO
if score >= threshold then
insert into same_customer values(s.account_no, neighbor, score)
end
END;
#print @@id_list;
#PRINT proxy_profiles;
}