Query optimization

hi team, here below is my query for resolving the profiles which are similar, this query works perfect when we run it on 4-5 customers
But while running this on 3,85,000 customers is giving timeout (we set the restpp timeout of 1200 seconds)

  1. Any way to optimize this logic ?
  2. if somehow, we want to consider customer attributes too (“date_of_birth” | “first_name” too, with the weightage of:-
    date_of_birth : 0.2
    first_name: 0.2
    then how can we achieve this in our logic ?

CREATE DISTRIBUTED QUERY test(float threshold = 0.6) FOR GRAPH clientC_retail syntax V2{ 
                /* Write query logic here */ 
                //start = {Profile.*};
                MapAccum<STRING, SetAccum<STRING>> @@id_list;
                SetAccum<EDGE> @@edges;
                MapAccum<STRING, FLOAT> @@same_profiles;
                SumAccum<float> @score;
                start = {AM_CUSTOMER.*};
                all_profiles = {AM_CUSTOMER.*};
                
                proxy_profiles = SELECT s FROM start:s-((customer_email|customer_contact|customer_address):e)-:t-((customer_email|customer_contact|customer_address):ee)-AM_CUSTOMER:tt 
                where s.account_no!=tt.account_no accum 
                 IF s.account_no!=tt.account_no  THEN
                 @@id_list += (s.account_no->tt.account_no)
                 END;
                #print @@id_list;
  
                FOREACH (root_customer,childs) IN @@id_list DO
                  related_profiles = select s from all_profiles:s where s.account_no in childs;
                  
                  related_profiles = SELECT s
                         FROM related_profiles:s-((customer_email|customer_contact|customer_address):e)-:t-((customer_email|customer_contact|customer_address):ee)-AM_CUSTOMER:tt 
                         where s!=tt AND tt.account_no!="0"
                         ACCUM CASE ee.type
                            WHEN "customer_email" THEN @@same_profiles += (tt.account_no->0.6)
                            WHEN "customer_contact" THEN @@same_profiles += (tt.account_no->0.6) 
                            WHEN "customer_address" THEN @@same_profiles += (tt.account_no->0.3) 
                            #WHEN "profile_address" THEN s.@same_profiles += (tt->0.1) 
                            #WHEN "profile_device" THEN s.@same_profiles += (tt->0.5) 
                         end;
                   #PRINT related_profiles;
                   #PRINT @@same_profiles.get(root_customer);
                   #PRINT @@same_profiles.get(root_customer) >= threshold;
                   related_profiles = select s from related_profiles:s
                                POST-ACCUM
                                if @@same_profiles.get(root_customer) >= threshold then
                                insert into same_customer values(root_customer, s, @@same_profiles.get(root_customer))
                                end;
                                
                END;

          }

Please guide

Thanks

You are making this way too complicated by putting everything in a global accum, try this:

CREATE DISTRIBUTED QUERY test(float threshold = 0.6) FOR GRAPH clientC_retail syntax V2{ 
                MapAccum<STRING, FLOAT> @score;
                start = {AM_CUSTOMER.*};
                all_profiles = {AM_CUSTOMER.*};
                
                proxy_profiles = SELECT s 
                  FROM start:s-((customer_email|customer_contact|customer_address):e)-:t-((customer_email|customer_contact|customer_address):ee)-AM_CUSTOMER:tt 
                  where s.account_no!=tt.account_no 
                  ACCUM CASE ee.type
                            WHEN "customer_email" THEN s.@score += (tt.account_no->0.6)
                            WHEN "customer_contact" THEN s.@score += (tt.account_no->0.6) 
                            WHEN "customer_address" THEN s.@score += (tt.account_no->0.3) 
                  POST-ACCUM
                            FOREACH (x,y) IN s.@score DO
                               if y >= threshold then
                                insert into same_customer values(x, s, y)
                               end
                            END;
}

thanks @markmegerian
in this way it works perfect. but my question in that is
how to works with the attributes
suppose we want to run the same logic on “dob” and “first_name” and “last_name” too
then how can we do that? the above solution left to right and right to traversal only works when we have vertex type.

proxy_profiles = SELECT s
FROM start:s-((customer_email|customer_contact|customer_address):e)-:t-((customer_email|customer_contact|customer_address):ee)-AM_CUSTOMER:tt
where s.account_no!=tt.account_no

what I think for that is
We need to create a new edge like (dummy_edge) in between AM_CUSTOMER to AM_CUSTOMER
and then connect the single customer to every other customer
and then use that “dummy_edge” in your logic

but i don’t think so this is the effective solution to do this?
what are your suggestions ?

This is not quite as elegant as the above, but it should work.

SetAccum<VERTEX<AM_CUSTOMER>> @@pps;
  proxy_profiles = SELECT s 
                  FROM start:s-((customer_email|customer_contact|customer_address):e)-:t-((customer_email|customer_contact|customer_address):ee)-AM_CUSTOMER:tt 
                  where s.account_no!=tt.account_no 
                  ACCUM CASE ee.type
                            WHEN "customer_email" THEN s.@score += (tt.account_no->0.6)
                            WHEN "customer_contact" THEN s.@score += (tt.account_no->0.6) 
                            WHEN "customer_address" THEN s.@score += (tt.account_no->0.3) 
                  POST-ACCUM
                             @@pps += s;

  FOREACH pp IN @@pps DO
  
     x = SELECT a FROM AM_CUSTOMER:a
            WHERE a.firstname == pp.firstname and a.lastname == pp.firstname
         ACCUM  pp.@score += (a.account_no -> 0.2);
  END;

P2 = SELECT p FROM proxy_profiles:p
                     ACCUM 
                            FOREACH (x,y) IN s.@score DO
                               if y >= threshold then
                                insert into same_customer values(x, s, y)
                               end
                            END;

thanks @markmegerian this works

and in the returned results:
how we can traverse the same_customer edge
to get to know the :
info on unification numbers , how many are same customers

now that you have the same_customer, can’t you just traverse that edge?

@markmegerian Actually, I got stuck in the traversal of this edge. Please help

I raised a different query to explain this problem, trying to mimic the problem in a different example.

How to traverse same edge iteratively between multiple vertices - GSQL / Queries - TigerGraph

hi @markmegerian
when I start typing this solution on my actual TG studio
I got to know that
this VERTEX type accumulators only store primary ids only
so hence the problem is we can’t be able to get the
pp.firstname , pp.lastname, pp.@score
inside for each loop.

so is there any way to declare the accumulator like that, so that we can store the complete node data with there attributes too.
SetAccum<VERTEX<AM_CUSTOMER>> @@pps;
Internally store like this .{“MA02727”, “MA561625”…}