Performance help

We are trying out TigerGraph. Our workflow does not allow us to use the LOAD method for loading data. We have multiple feeds of data coming in and batch processed by Hadoop job into large number of entities and relationships that are added/updated/deleted to multiple graph db(s).

We are currently using GSQL UPDATE/INSERT to insert/update entities and relationships. We would like to improve the ingest performance since it doesn’t look very good at the moment.

Below are example of queries we are using to insert/delete (simplified) entities and relationships.

BEGIN
  CREATE QUERY upsertEntity(STRING uri, STRING typeUri, STRING typeLabel, STRING label) FOR GRAPH TG_3
  {
    SumAccum<INT> @@count = 0;
    S = {Entity.*};
    existing = SELECT s FROM S:s WHERE s.uri == uri
      ACCUM @@count += 1;
    IF @@count > 0 THEN
      UPDATE e FROM S:e
      SET e.typeUri = typeUri,
          e.typeLabel = typeLabel,
          e.label = label
      WHERE e.uri == uri;
    ELSE 
      INSERT INTO Entity (PRIMARY_ID, typeUri, typeLabel, label) VALUES (uri, typeUri, typeLabel, label);
    END;
  }
END

BEGIN
  CREATE QUERY deleteEntityById(STRING id) FOR GRAPH TG_3
  {
    S = {Entity.*};
    DELETE s FROM S:s WHERE s.uri == id;
  }
END

BEGIN
  CREATE QUERY related(String entity, STRING relatedEntity, STRING predicateUri, STRING predicateLabel) FOR GRAPH TG_3 {
    INSERT INTO Relationship VALUES (entity, relatedEntity, predicateUri, predicateLabel);
  }
END

We would appreciate recommendations/insights/directions on how to improve ingest performance. Thanks in advance.

did you try kafka loader?

https://docs.tigergraph.com/dev/data-loader-guides/kafka-loader-user-guide

Also, do you just want to upsert graph elements (edges/vertices)? If so, you can use REST loader.

Each loading job, will have a corresponding REST endpoint. See
https://docs.tigergraph.com/dev/restpp-api/built-in-endpoints#run-a-loading-job

I would recommend you contact us, so that we do a PoC to help you evaluate.

1 Like