We are trying out TigerGraph. Our workflow does not allow us to use the LOAD method for loading data. We have multiple feeds of data coming in and batch processed by Hadoop job into large number of entities and relationships that are added/updated/deleted to multiple graph db(s).
We are currently using GSQL UPDATE/INSERT to insert/update entities and relationships. We would like to improve the ingest performance since it doesn’t look very good at the moment.
Below are example of queries we are using to insert/delete (simplified) entities and relationships.
BEGIN
CREATE QUERY upsertEntity(STRING uri, STRING typeUri, STRING typeLabel, STRING label) FOR GRAPH TG_3
{
SumAccum<INT> @@count = 0;
S = {Entity.*};
existing = SELECT s FROM S:s WHERE s.uri == uri
ACCUM @@count += 1;
IF @@count > 0 THEN
UPDATE e FROM S:e
SET e.typeUri = typeUri,
e.typeLabel = typeLabel,
e.label = label
WHERE e.uri == uri;
ELSE
INSERT INTO Entity (PRIMARY_ID, typeUri, typeLabel, label) VALUES (uri, typeUri, typeLabel, label);
END;
}
END
BEGIN
CREATE QUERY deleteEntityById(STRING id) FOR GRAPH TG_3
{
S = {Entity.*};
DELETE s FROM S:s WHERE s.uri == id;
}
END
BEGIN
CREATE QUERY related(String entity, STRING relatedEntity, STRING predicateUri, STRING predicateLabel) FOR GRAPH TG_3 {
INSERT INTO Relationship VALUES (entity, relatedEntity, predicateUri, predicateLabel);
}
END
We would appreciate recommendations/insights/directions on how to improve ingest performance. Thanks in advance.