Hello TG Team,
One of my use case is determining which insurance claims belong to the same accident.
A car accident will likely correspond to multiple insurance claims since each party has his/her own insurance policy. In my data, there only exists claims but not accidents. Hence, I want to generate car accident data from insurance claim data, via GSQL.
My graph has the following schema:
Car - (ASSOCIATE_WITH) -> Claim
Claim vertex has attribute accident_time.
So initially I have claims associated with cars as such:
claim1: {car1, car2, car3}
claim2: {car2, car3}
claim3: {car1, car3}
claim4: {car4, car5}
claim5: {car5, car6}
I used the following query to perform pairwise comparison between claims, and store the set of claims which should belong to the same accident in a vertex attached set accumulator. The query checked the time gap between two claims and whether two claims have common associated cars.
SetAccum<VERTEX<Claim>> @claims;
_t1 = SELECT c1 FROM Claim:c1 - (<ASSOCIATE_WITH.ASSOCIATE_WITH>) - Claim:c2
WHERE datetime_diff(c1.accident_time, c2.accident_time) <= n_days*24*60*60
AND count(c1.neighbors("REV_ASSOCIATE_WITH") INTERSECT c2.neighbors("REV_ASSOCIATE_WITH")) > 0
AND c1 != c2
ACCUM c1.@claims += c2
POST-ACCUM c1.@claims += c1;
Printing out @claims for each claim, you would have:
claim1: {claim1, claim2, claim3}
claim2: {claim1, claim2, claim3}
**claim3: {claim1, claim2, claim3} **
claim4: {claim4, claim5}
claim5: {claim4, claim5}
Next I need to perform pairwise comparison between claims again. I will compare the @claims of the two claims. If the @claims are the same, then the two claims belong to the same accident with a generated unique accident id.
The expected output is:
accident_id 1: {claim1, claim2, claim3}
accident_id 2: {claim4, claim5}
I first tried to implement it using SELECT…ACCUM… , but failed since the value of the accumulator cannot accessed during accumulation. Hence, I had to stick to for loop, but it is too slow for my use case, since I have several millions of insurance claims to compare. So is there a way to solve my problem using the SELECT … ACCUM clause with which I can take advantage of its parallelism?
Thanks!