GSQL type inference buggy

luyilun32661 · September 7, 2021, 10:00am

Hi TG team,

When I run the following query:

SetAccum<VERTEX<Claim>> @claims;
  SetAccum<STRING> @vins;
  INT n_secs;
  n_secs = n_days*24*60*60+1;
  
  _t0 = SELECT s FROM Claim:s - (REV_CAR_ASSOCIATE_WITH>) - Car:t
              ACCUM s.@vins += t.vin;
  
  _t1 = SELECT c1 FROM Claim:c1 - (<CAR_ASSOCIATE_WITH.CAR_ASSOCIATE_WITH>) - Claim:c2
            ACCUM 
              IF c1 != c2 AND count(c1.@vins INTERSECT c2.@vins) > 1 
                THEN
                  c1.@claims += c2 END
            POST-ACCUM c1.@claims += c1;

It gave me error messege:

Error: Left expression ‘@@gsqlpe_src_attrib_install_map__3__rval__vins.get ( c1 )’ is a set expression of vertex type , but right expression ‘c2.@vins’ is not a set expression of vertex. A set operator cannot be applied to such combination.

But I have defined @vins as a Set Accumulator of string not vertex. What is the workaround ? Thanks!

Leo_Shestakov · September 7, 2021, 2:25pm

Hello,

This certainly seems to be unintended behavior (error message displays internal code). I have been able to replicate this error in another query with a different schema and will report your error to the team while I try to find a possible workaround.

Leo_Shestakov · September 7, 2021, 6:55pm

Hi again,

I have worked out that the issue with your error lies in the compatibility of the INTERSECT operator with GSQL Syntax v2 (which you must be using because of your multi-hop SELECT statement).

Since all Syntax v2 statements are internally broken up into multiple v1 statements, this is probably causing difficulties for the INTERSECT operator since it works with vertices from different v1 statements that are internally separated.

For a workaround, you can use a DML-sub FOREACH to mimic the INTERSECT operator as such:

  SetAccum<VERTEX<Claim>> @claims;
  SetAccum<STRING> @vins;
  SumAccum<INT> @count;
  
  _t0 = SELECT s FROM Claim:s - (REV_CAR_ASSOCIATE_WITH>) - Car:t
              ACCUM s.@vins += t.vin;
            
  _t1 = SELECT c1 FROM Claim:c1 - (<CAR_ASSOCIATE_WITH.CAR_ASSOCIATE_WITH>) - Claim:c2
            ACCUM IF c1 != c2 THEN 
                    FOREACH vin IN c2.@vins DO
                      if c1.@vins.contains(vin) then c1.@count += 1 end,
                      if c1.@count > 1 then c1.@claims += c2 end
                    END
                  END, c1.@count = 0
            POST-ACCUM c1.@claims += c1;

Note: the vertex-attached @count is reset after every successful use.

luyilun32661 · September 8, 2021, 2:57am

Hi Ishestakov,

Thanks for the reply It turns out that my actual use case is a little more complicated than my previous simplified example. The actual code looks something like this (note it will throw the same error):

  SetAccum<VERTEX<Claim>> @claims;
  SetAccum<STRING> @vins, @plates, @ids, @names;
  INT n_secs;

  n_secs = n_days*24*60*60+1;
 
  _t0 = SELECT s FROM claim:s - (REV_CAR_ASSOCIATE_WITH>) - Car:t
              ACCUM s.@vins += t.vin,
                    s.@plates += t.plate_no;

_t1 = SELECT c1 FROM claim:c1 - (<CAR_ASSOCIATE_WITH.CAR_ASSOCIATE_WITH>) - claim:c2
            ACCUM 
              count(c1.@plates INTERSECT c2.@plates) > 1 OR count(c1.@vins INTERSECT c2.@vins) > 1 AND
               c1 != c2 THEN
                  c1.@claims += c2 END
            POST-ACCUM c1.@claims += c1;

And I modified it into:

 SetAccum<VERTEX<Claim>> @claims;
 SetAccum<STRING> @vins, @plates, @ids, @names;
 SumAccum<INT> @vin_cnt, @plate_cnt;
 INT n_secs;
 n_secs = n_days*24*60*60+1;
     
_t0 = SELECT s FROM claim:s - (REV_CAR_ASSOCIATE_WITH>) - Car:t
                  ACCUM s.@vins += t.vin,
                        s.@plates += t.plate_no;

_t1 = SELECT c2 FROM claim:c1 - (<CAR_ASSOCIATE_WITH.CAR_ASSOCIATE_WITH>) - claim:c2
            ACCUM IF c1 != c2 AND abs(datetime_diff(c1.accident_time, c2.accident_time)) < n_secs THEN 
                    FOREACH vin IN c2.@vins DO
                      FOREACH plate IN c2.@plates DO
                        IF c1.@vins.contains(vin) THEN c1.@vin_cnt += 1 END,
                        IF c1.@plates.contains(plate) THEN c1.@plate_cnt += 1 END,
                        IF c1.@vin_cnt > 1 OR c1.@plate_cnt > 1 then c1.@claims += c2, BREAK END
                      END
                    END
                  END, c1.@vin_cnt = 0, c1.@plate_cnt = 0
            POST-ACCUM c1.@claims += c1;

(PS. I added a BREAK inside FOREACH loop to stop iterating once the condition is satisfied)

But now it throws the error:

an undefined variable plate in the current scope

which seems to be another version 2 syntax incompatibility issue.

So eventually I gave up the multi-hop syntax and switched to single hop, although doing single hop for pairwise comparison has always been less intuitive to me.

SetAccum<VERTEX<Claim>> @claims;
SetAccum<STRING> @vins, @plates;
GroupByAccum<VERTEX<Claim> claim, DATETIME accident_time, SetAccum<STRING> vin, SetAccum<STRING> plate> @claim_info;
INT n_secs;

n_secs = n_days*24*60*60+1;
  
_t0 = SELECT t FROM claim:s - (REV_CAR_ASSOCIATE_WITH>) - Car:t
              ACCUM     
                    s.@vins += t.vin,
                    s.@plates += t.plate_no,
                    t.@claim_info += (s -> s.accident_time, t.vin, t.plate_no);

 
_t1 = SELECT t FROM claim:s - (REV_CAR_ASSOCIATE_WITH>) - Car:t
              ACCUM 
                  FOREACH info IN t.@claim_info DO
                       IF s != info.claim AND
                      
                         abs(datetime_diff(s.accident_time, info.accident_time)) < n_secs AND
                       
                         (count(info.vin INTERSECT s.@vins) > 1 OR count(info.plate INTERSECT s.@plates) > 1) THEN
                         
                            s.@claims += info.claim
                      END
                  END
              POST-ACCUM s.@claims += s;

It basically has to store the info from the Claim vertex on the Car vertex it connects to, and then traverse from Car again to Claim to do the pairwise comparison. However, while doing the pairwise comparison, there seems to be plenty of redundancy, much like what is in jaccard_batch implementation.