Continuous stream of data and purging old data

Dear Tigergraph Team,

I want to build an application with Tigergraph where we stream in data continuously. For that we need to delete old vertices from the graph storage.

We have an parallel running job which calls a query to delete these vertices. After the query is called and has executed the vertices are deleted and not longer shown in the graph, but the amount of space the GSE uses does not change a bit.

My question is:

Is there some way I can release this RAM space so that Tigergraph can reuse it for new data? Maybe like an “optimize table” command?

Thanks,

Felix

Hi Felix,

Upon deleting vertices, TigerGraph doesn’t release the memory back to the system, it is reserved. The next time you load data, we’ll use this reserve.

Hope this answers your question.

Thanks,

Kevin

Thanks for the answer Kevin,

I will look into that. In the last few tests I observed the behavior of the ram usage through the GraphStudio admin portal. And even after the deletion of a big portion of our data the GSE ram usage was still growing.

I am currently preparing my system for a test run of a few days continuous running.

If you are interested I can share the results here with you.

Regards

Felix

Hi Felix,

You can try this if you want to see immediate results :

After deleting vertices, restart GSE - gadmin restart gse - and the memory should be released.

Thanks,

Kevin

Hello Kevin,

restart_gse.png

restarting the GSE did not release the memory (purple). (see the screenshot)

The valleys are the state whilst the GSE is restarting.

The restart only released the memory held by the GPE (green).

Hi Felix,

Could you please provide us with the query being used to delete the vertices?

Also, what version of TigerGraph are you currently using?

Thanks,

Kevin

Hello Kevin,

CREATE DISTRIBUTED QUERY deleteOldVertices(UINT queryTimestamp, UINT timeRangeInSeconds) FOR GRAPH monet SYNTAX v2 {
Start = {ANY};

SelectedVertices = SELECT s FROM Start:s 
                   WHERE s._timestamp <= (queryTimestamp - timeRangeInSeconds)
                   POST_ACCUM DELETE(s);

PRINT SelectedVertices.size() AS numVertices;
}

this is the query I am running to delete all vertices where the “_timestamp” is smaller than a time range, and the tigergraph version is 2.5.

Thanks
Felix

hi felix,

v2 syntax supports read-only queries. dml is not supported in v2 syntax.

i would suggest you use v1 syntax for delete/inseet/update queries.