Instance configuration

my instance details are below:

  1. r4.4xlarge

  2. tigergraph@ip-10-0-3-115:~/tigergraph/data/gstore$ du -sh *

    16G 0

and below is my Partition size, IDS size, vertex count etc.

tigergraph@ip-10-0-3-115:~/tigergraph/data$ gstatusgraph
=== graph ===
[GRAPH  ] Graph was loaded (/home/tigergraph/tigergraph/data/gstore):
[m1     ] Partition size: 5.7GiB, IDS size: 59MiB, Vertex count: 8300040, Edge count: 344764532, NumOfDeletedVertices: 0 NumOfSkippedVertices: 0
[WARN   ] Above vertex and edge counts are for internal use which show approximate topology size of the local graph partition. Use DML to get the correct graph topology information

right now, we have RFM algorithm, Entity resolutions …etc. which is taking too much time to complete, if we have “foreach” complexity in any of the algorithm then there will be a higher chance of timeout of that query. (restpp time is set to 1200 seconds/20 mins)

what configuration would you suggest? to improve the query performance.

Below is my entity resolution for similar_profile


CREATE QUERY resolve_profile(float threshold = 0.6) FOR GRAPH clientC_retail syntax V2{ 
                /* Write query logic here */ 
                //start = {Profile.*};
                MapAccum<STRING, SumAccum<FLOAT>> @id_list;
               MapAccum<STRING, ListAccum<STRING>> @debug;
                SetAccum<EDGE> @@edges;
                MapAccum<STRING, FLOAT> @@same_profiles;
                SumAccum<float> @score;
                #start = {profile_id};
                #start = {AM_CUSTOMER.*};
                start = select s from AM_CUSTOMER:s;
                #PRINT start.size();
                #start = {AM_CUSTOMER.*};
                all_profiles = {AM_CUSTOMER.*};
                #start = select s from start:s accum s.@id_list+=s.account_no;
                
                proxy_profiles = SELECT s FROM start:s-((customer_email|customer_contact|customer_address|dummy_edge):e)-:t-((customer_email|customer_contact|customer_address|dummy_edge):ee)-AM_CUSTOMER:tt 
                where s.account_no!=tt.account_no 
                 accum 
                   IF s.account_no!=tt.account_no  THEN
                     s.@id_list += (tt.account_no->ee.weight),
                     #s.@debug += (tt.account_no->ee.type+"-"+to_string(ee.weight)+"er"),
                     CASE ee.type
                            WHEN "dummy_edge" THEN 
                              
                              IF trim(s.first_name) != "" and trim(tt.first_name) != "" and (s.first_name == tt.first_name) THEN
                               s.@id_list += (tt.account_no->0.2)
                               #s.@debug += (tt.account_no->ee.type+"-"+to_string(0.2) +"io"+ s.account_no +"-"+ tt.account_no)
                              END,
                              IF trim(s.surname) != "" and trim(tt.surname) != "" and (s.surname == tt.surname) THEN 
                               s.@id_list += (tt.account_no->0.2)
                               #s.@debug += (tt.account_no->ee.type+"-"+to_string(0.2) +"io"+ s.account_no +"-"+ tt.account_no)
                              END,
                              IF s.dob == tt.dob THEN 
                               s.@id_list += (tt.account_no->0.2)
                               #s.@debug += (tt.account_no->ee.type+"-"+to_string(0.2) +"io"+ s.account_no +"-"+ tt.account_no)
                              END
                              
                     end
                     
                   END 
                 POST-ACCUM
                   FOREACH (neighbor,score) IN s.@id_list DO
                      if score >= threshold then
                        insert into same_customer values(s.account_no, neighbor, score)
                      end
                   END;
                #print @@id_list;
  
                #PRINT proxy_profiles;

          }