Query loading continuously

pkr2 · March 9, 2022, 2:44pm

@markmegerian
Thanks for your all support, I got your concern that updating relationships and deleting edges constantly is not a good practice, But for the demo sake I need to do, to persist the changes and convey my messages in the demo, and become a problem solver in every criteria, but in production not going to this one constantly
actually I am doing some demos for my client, I want to proposed this Tiger Graph to my client for our next project or next product.
Right now I saw a weird behavior in my TG cloud Account, I have one GSQL which one is working perfect

CREATE QUERY segment_calculate(INT xParams, INT yParams) FOR GRAPH supply_chain { 
  /* Write query logic here */ 
  SumAccum<DOUBLE> @@total_margin;
  SumAccum<DOUBLE>@d_margin_penetration;
  
  ListAccum<DOUBLE> @@margin_penetration; 
  SumAccum<INT>@lesser_then_margin;
  SumAccum<DOUBLE>@yAxis;
  
  SetAccum<VERTEX<customer>> @product_customers;
  SetAccum<VERTEX<customer>> @@total_customers;
  
  ListAccum<DOUBLE> @@customer_penetration; 
  SumAccum<INT>@lesser_then;
  SumAccum<DOUBLE>@xAxis;
  
  SumAccum<string>@customer_group;
  SumAccum<string>@margin_group;
  SumAccum<string>@segment_group;
  start = {Product.*};
  
  products_purchased = select s from start:s-(customer_product:e)-customer:t ACCUM s.@product_customers+=t, @@total_customers+=t
  post-ACCUM(s) @@customer_penetration +=  s.d_customer_penetration;
  
  
  products_purchased = select s from products_purchased:s accum  @@total_margin+=s.d_margin
  POST-ACCUM(s) s.d_margin_penetration=s.d_margin/@@total_margin,
  s.@d_margin_penetration=s.d_margin/@@total_margin,
  @@margin_penetration+=s.d_margin/@@total_margin;
  
  PRINT @@total_margin;
  

# Calculate Percentile
  products_purchased = select s from products_purchased:s accum 
  FOREACH attr in @@customer_penetration DO
            IF s.d_customer_penetration > attr THEN s.@lesser_then += 1
            END
       end, 
  FOREACH attr in @@margin_penetration DO
            IF s.@d_margin_penetration > attr THEN s.@lesser_then_margin += 1
            END
       end 
  
  post-ACCUM(s) s.@xAxis=s.@lesser_then*1.0/@@customer_penetration.size()*100,
  s.@yAxis=s.@lesser_then_margin*1.0/@@margin_penetration.size()*100;
  
   z = select t from products_purchased:s-(popularity_index:pi)-Popularity_Index:t ACCUM delete(pi);
  z = select t from products_purchased:s-(margin_index:mi)-Margin_Index:t ACCUM delete(mi);
  z = select t from products_purchased:s-(product_segment:ps)-Segment:t ACCUM delete(ps);
  
  
**here I use divide and Conquer strategy I just use only above GSQL **
**and remove below part the QUERY and it would take almost **
**1 second to execute it successfully but when I use below code with above code it continuously loading and gives **
**The query didn't finish because it exceeded the query timeout threshold (60 seconds). 
Please check GSE log for license expiration and RESTPP/GPE log with request id (131081.RESTPP_1_1.1646799783472.N) for details. 
Try increase RESTPP.Factory.DefaultQueryTimeoutSec or add header GSQL-TIMEOUT to override default system timeout. 
even though Products are not that much in the database Total Products in DB : 9871,  and only 110 products which one having complete data to traverse these edges 
Whats wrong in below code ? why this taking longer time even though data is not that much, and after two , three time failure one time comes this will complete execute in 1 second**

   products_purchased = select s from products_purchased:s ACCUM CASE WHEN s.@xAxis > xParams THEN 
  
  s.@customer_group += "High",
  INSERT INTO popularity_index (FROM, TO)
VALUES (s.id, "High")
  END,
  
  
  CASE WHEN s.@xAxis < xParams THEN 
  s.@customer_group += "Low",
  INSERT INTO popularity_index (FROM, TO)
VALUES (s.id, "Low")
  END,
  
  
  CASE WHEN s.@yAxis > yParams THEN 
  
  s.@margin_group += "High", 
  INSERT INTO margin_index (FROM, TO)
VALUES (s.id, "High")
  END,
  
  
  CASE WHEN s.@yAxis < yParams THEN 
  
  s.@margin_group += "Low", 
  INSERT INTO margin_index (FROM, TO)
VALUES (s.id, "Low")
  
  END;
  
  result = select s from products_purchased:s ACCUM CASE WHEN s.@margin_group == "High" and s.@customer_group == "High" THEN 
  s.@segment_group += "High Margin Popular Products",
  INSERT INTO product_segment (FROM, TO)
VALUES (s.id, "High Margin Popular Products")
  END,
  
  CASE WHEN s.@margin_group == "High" and s.@customer_group == "Low" THEN
  s.@segment_group += "High Margin Low Products",
  INSERT INTO product_segment (FROM, TO)
VALUES (s.id, "High Margin Low Products")
  END,
  
  CASE WHEN s.@margin_group == "Low" and s.@customer_group == "High" THEN
  s.@segment_group += "Low Margin Popular Products",
  INSERT INTO product_segment (FROM, TO)
VALUES (s.id, "Low Margin Popular Products")
  END,
  
  CASE WHEN s.@margin_group == "Low" and s.@customer_group == "Low" THEN
  s.@segment_group += "Need Improvement",
  INSERT INTO product_segment (FROM, TO)
VALUES (s.id, "Need Improvement")
  END;
  
  PRINT result; 
  
}

I want to persist the result that’s why we as a team doing this , so that we can explore it into the explore section, and when I removed all those insert edges from GSQL code and its working perfect and in effective time, but making edges in that much frequently manner, It seems its not that easily possible in this . If this is the Issue , If I will make the LOADING JOB and insert millions of data into the GRAPH that time ?

pkr2 · March 9, 2022, 4:23pm

hi I run a separate Query and create multiple edges and this will giving the response in proper time in their first execution, and after second execution it will take too much time third time execution giving me this one :The query didn’t finish because it exceeded the query timeout threshold (60 seconds). Please check GSE log for license expiration and RESTPP/GPE log with request id (65541.RESTPP_1_1.1646843255951.N) for details. Try increase RESTPP.Factory.DefaultQueryTimeoutSec or add header GSQL-TIMEOUT to override default system timeout.

CREATE QUERY ucase(INT xParams) FOR GRAPH supply_chain syntax v2{ 
  /* Write query logic here */ 
   start = {Product.*};
  
   z = select t from start:s-(popularity_index:pi)-Popularity_Index:t ACCUM delete(pi);
  z = select t from start:s-(margin_index:mi)-Margin_Index:t ACCUM delete(mi);
  
  products_purchased = select s from start:s ORDER BY s.id ASC LIMIT 5000;
  #PRINT products_purchased;
  
  products_purchased = select s from products_purchased:s ACCUM 
  INSERT INTO popularity_index (FROM, TO)
VALUES (s.id, "High"),
  INSERT INTO margin_index (FROM, TO)
VALUES (s.id, "High");
  
}

markmegerian · March 9, 2022, 4:56pm

Here are a couple of ideas

1 - what are the attributes of the popularity_index and the margin_index? does it make a difference if you omit the ( FROM , TO) on the insert statement and explicitly provide all attributes?

2 - I am wondering about using the STRING value “High” for all of the inserts. I would suggest getting an actual vertex and using that in the insert - something like this

Vertex<Popularity_Index> pi;
P = {Popularity_Index.*};
P1 = SELECT p FROM P:p WHERE p.indexVal == “High”
POST-ACCUM pi = p;

products_purchased = select s from products_purchased:s ACCUM
INSERT INTO popularity_index (FROM, TO)
VALUES (s, pi);

Notice in the above example, we are using the vertex itself on the insert, not a string or attribute.

pkr2 · March 9, 2022, 6:25pm

@markmegerian
Still same issue coming
The query didn’t finish because it exceeded the query timeout threshold (60 seconds). Please check GSE log for license expiration and RESTPP/GPE log with request id (16777217.RESTPP_1_1.1646850909238.N) for details. Try increase RESTPP.Factory.DefaultQueryTimeoutSec or add header GSQL-TIMEOUT to override default system timeout.

Popularity_Index vertex having id as attribute which is not set as true in my schema , I set it to as true

Margin_Index vertex having id as attribute which is not set as true in my schema , I set it to as true

and I made the query which you assist in me

CREATE QUERY ucase(INT xParams) FOR GRAPH supply_chain syntax v2{ 
  /* Write query logic here */ 
   start = {Product.*};
  Vertex<Popularity_Index> pi;
P = {Popularity_Index.*};
PK = SELECT p FROM P:p WHERE p.id == "High"
POST-ACCUM pi = p;
    
   z = select t from start:s-(popularity_index:pi_indx)-Popularity_Index:t ACCUM delete(pi_indx);
  z = select t from start:s-(margin_index:mar_indx)-Margin_Index:t ACCUM delete(mar_indx);
  

  products_purchased = select s from start:s ORDER BY s.id ASC LIMIT 120;
  #PRINT products_purchased;
  
  products_purchased = select s from products_purchased:s ACCUM 
  INSERT INTO popularity_index (FROM, TO)
VALUES (s.id, pi);
  #INSERT INTO margin_index (FROM, TO)
#VALUES (s.id, M1);
  
}

It seems while inserting it would take too long time,
1st time executed in one second, 2nd time executed in one second,
3rd time 60 seconds error
How we will debug this one

pkr2 · March 11, 2022, 5:01pm

hi @markmegerian can you please help me in multi edge traversal for one single vertex

I want all products
[
{“id”: 1, “name”: “Wings”,
“@product_family”: “CardLocks”,
“@product_type”: “Electronics”,
“@product_detail1”: “Classic”,
“@product_detail2”: “W/O Reader”
},
{“id”: 2, “name”: “h-Wings”,
“@product_family”: “OpenerLocks”,
“@product_type”: “ECR”,
“@product_detail1”: “Classic”,
“@product_detail2”: “No Reader”
}]

product_family contains: id and product_family_name
product_type conatins: id and product_type_name
product_detail1: id and product_detail
product_detail2: id and product_detail

Edge contains No attributes

markmegerian · March 11, 2022, 6:54pm

If you always have a 1:1 relationship between product and family, family and type, type and detail, etc. then you can just have a simple ACCUM like this

SYNTAX V2

MaxAccum< STRING > @product_family, @product_type, @product_detail1, @product_detail2;
ALL_PRODS = SELECT p FROM Product:p -(product_belongs_family>)- product_family:pf  -(family_type>) - product_type:pt - (type_detail_1>) - product_detail_1:pd1 -(detail_1_detail_2>)- product_detail_2:pd2
   ACCUM p.@product_family += pf.product_family_name, 
                 p.@product_type += pt.product_type_name, 
                 p.@product_detail1 += pd1.product_detail, 
                 p.@product_detail2 += pd2.product_detail;

PRINT ALL_PRODS;

Note that this uses MaxAccum since the 1:1 relationship only has one value for the accumulator. If you have a 1:many relationship, such as a type that can have multiple details, then you could easily change the detail to a SetAccum.

Since I try to avoid hard-coded limits, the one suggestion I would have is to just have details connected to type, and not enforce the detail 1 and detail 2 arrangement.

pkr2 · March 16, 2022, 2:01pm

hi @markmegerian
above solutions worked for me, Thanks
right now I have one question just to built the optimized schema or correct schema so our queries work very well on this scenario

Requirement is I want to store like this:- Every Profile having their own calendar of 365 days, means 365 edges for particular and their corresponding status so we can get the what is the status of today and yesterday, previous month, and so on of particular profile, Right Now every vertex having “id” attribute and calendar vertex having calender_id, day, date, fis_week_id, month, year, fis_week_epoch, festival, weekend. So do i need to make more one vertex in between of Profile and calender? what is the better approach to achieve this?
Direct to calender seems wrong me in this case

markmegerian · March 16, 2022, 2:44pm

I think I would use a slightly different approach. There is a very nice DATETIME attribute, which will give you all of the functions you need, like year( ), month( ) , datetime_to_epoch, datetime_add, datetime_sub, etc. This is very powerful when doing date arithmetic. If you had it broken out by day and date, month and year as separate attributes, the date arithmetic would be too complicated.

Lets say you have a Profile vertex and its connected to an Event vertex, and the Event vertex has a DATETIME attribute called eventDate. You would easily get events in the past year for a specific Profile like this:

E = SELECT e FROM startprofile -(_>) Event:e
WHERE e.eventDate >= datetime_sub(now(), INTERVAL 1 YEAR);

I don’t see the benefit of having 365 “days” all created as a separate vertex.

pkr2 · March 16, 2022, 3:53pm

hi @markmegerian thanks for your reply

Now its perfect? this one is right now ?
I don’t see the benefit of having 365 “days” all created as a separate vertex.
Because of our use case we need to store last one year of status of a particular profile

pkr2 · March 21, 2022, 2:36pm

@markmegerian

this one is giving me this warning , but as I checked the GSQL runs well, can you please guide how to solve this (to get rid from this error)?

xParams, yParams is Double from parameter
CREATE QUERY segment_calculate(DOUBLE xParams, DOUBLE yParams, STRING pmline)

markmegerian · March 21, 2022, 2:57pm

That is just a warning, so you can decide for yourself if you want to address it.

Its telling you that any comparisons between DOUBLE, where there is either == or >= comparison operator, is inherently inexact. So you could change it to purely < or > , you could just ignore the warning, or you could actually put in the suggested fix, where you declare FLOAT epsilon = 0.0001 and use it as suggested in the message.

pkr2 · March 23, 2022, 2:16pm

hi @markmegerian
can you please let me know is this one is the good approach to do this?
product-(product_category)-category
product-(product_subcategory)-subcategory
product-(product_department)-department

markmegerian · March 24, 2022, 2:33am

Looks good. Its worth mentioning that the main reason to separate a particular value into its own vertex is to optimize your query efficiency. In the relational DB world, you could leave department, category and subcategory as columns in the product table, and just create all combinations of indexes.

Its far better in a knowledge graph to isolate the low cardinality values into their own vertex. So in your case, when you want to query for all the products with a certain category, or department, it will perform extremely well, even on a large data set, because you will follow the edges from department to product.

For this reason, I would always advise against a query like this

Prods = SELECT p FROM product:p -(product_department>)- department:d
WHERE d.deptName == “deptXYX”

Since this is not taking full advantage of the graph structure. Something like this is much better

Dept = SELECT d FROM department:d WHERE d.deptName == “deptXYZ”
Prods = SELECT p FROM Dept -(_>)- product:p;