How to model time series data

shyshiva · June 11, 2024, 5:12pm

What is the best way to model time series data in Tigergraph? I would like to store order data with timeseries . Searching should be fast as well as by time ranges. Is this possible? Can a tigergraph cluster scale to support a growing dataset?

Jon_Herke · June 17, 2024, 10:54pm

@shyshiva The most common way to represent time in TigerGraph modeling would be Time Trees. Below I’ve included details on how you would create Time Trees and included a community blog on the topic.

Modeling Time in TigerGraph Using Time Trees

Time Trees are a structured approach to modeling temporal data in graphs. They represent time hierarchically, often by creating nodes for different time units (such as years, months, days, hours), and linking them in a tree-like structure. This method is particularly useful in TigerGraph for organizing and querying large volumes of temporal data efficiently.

A blog by a community member on Time Trees with TigerGraph

Steps to Create and Use Time Trees in TigerGraph:

Create Time Nodes:

Start by creating nodes for each time unit you want to include in your model. Commonly, this includes years, months, days, and possibly hours or minutes, depending on your data’s granularity.

Example:

CREATE VERTEX Year (PRIMARY_ID year INT, name STRING) WITH primary_id_as_attribute="true";
CREATE VERTEX Month (PRIMARY_ID month_id STRING, name STRING, month INT, year INT) WITH primary_id_as_attribute="true";
CREATE VERTEX Day (PRIMARY_ID day_id STRING, name STRING, day INT, month INT, year INT) WITH primary_id_as_attribute="true";

Establish Hierarchical Relationships:
- Connect these nodes to form a hierarchical tree. For instance, each year node connects to 12 month nodes, and each month node connects to the days in that month.
- Example:
```
CREATE UNDIRECTED EDGE YEAR_HAS_MONTH (FROM Year, TO Month);
CREATE UNDIRECTED EDGE MONTH_HAS_DAY (FROM Month, TO Day);
```

Populate the Time Tree:

Insert data into your time nodes to cover the relevant periods. This could be done programmatically or using batch inserts if you’re covering a large span of time.

Example:

INSERT INTO Year (year) VALUES (2023);
INSERT INTO Month (month_id, name, month, year) VALUES ("2023-06", "June", 6, 2023);
INSERT INTO Day (day_id, name, day, month, year) VALUES ("2023-06-12", "June 12", 12, 6, 2023);

Link Your Data to the Time Tree:

Attach your domain-specific data (e.g., events, transactions) to the appropriate time nodes. This can be done by creating edges from your main entities to the relevant time units.

Example:

CREATE UNDIRECTED EDGE EVENT_OCCURED_ON (FROM Event, TO Day);

// Example of linking an event to a specific day
INSERT INTO Event (id, name, date) VALUES (1, "Event A", "2023-06-12");
CREATE EDGE EVENT_OCCURED_ON FROM (SELECT e FROM Event e WHERE e.id == 1) TO (SELECT d FROM Day d WHERE d.day_id == "2023-06-12");

Querying Time-Based Data:
- With your time tree in place, you can efficiently perform temporal queries. For example, you can quickly find all events that occurred in a specific month or year by traversing the time tree.
- Example Query:
```
// Find all events in June 2023
USE GRAPH <your_graph_name>
SELECT e
FROM Event:e - (EVENT_OCCURED_ON) -> Day:d - (MONTH_HAS_DAY) -> Month:m
WHERE m.month_id == "2023-06";
```

sellynotty · July 30, 2024, 11:58am

Hi,

I think the model time series data in TigercGraph you can use a vertex to represent each data point and edges to connect sequential points. Store the timestamp and relevant attributes in the vertex. For fast searching by time ranges create secondary indexes on the timestamp attribute. TigerGraph clusters can scale horizontally to support a growing dataset ensuring performance remains optimal as data volume increases.

Thanks