How do you explain GSQL supports dynamic schema change?

Hello,

I’ve seen that you said GSQL is a schema-based with the capability of dynamic schema change. But in my perspective, “dynamic” should be self-adaptive, that is to say, the schema should be changed automatically as I add data with different schema.

As you mentioned in GSQL 101. you’ve declared a VERTEX schema by

CREATE VERTEX person (PRIMARY_ID name STRING, name STRING, age INT, gender STRING, state STRING)

and person.csv is confined to four columns in order to fit corresponding VERTEX schema.

name,gender,age,state

Now, what if a want to add a person like

Jason,male,22,tx,teacher

it is obvious that “Jason” is incompatible with person.csv and VERTEX schema. In my mind, the schema should add a new parameter like “job STRING” or something else automatically and person.csv should also take corresponding measures. BUT it cannot do that. I want to know how you tackle this case.

Other languages like Cypher, SPARQL, etc is schemaless and it won’t confront this case.

So, I am eager to know how you explain “dynamic”, and how is “dynamic schema change” reflected on earth

Hi Frank,

Very good observation.

Most nosql or newsql database claim they are schemaless, or label based. That means, when new data come in, they can add just add the necessary labels at entity level, and for the attribute/property level, they tag data with a property label, such that key-value pair are inserted on the fly. The key serves as the meta data. It has the advantage that attribute length

for the same entity (vertex/edge) can be different and it could be dynamic schema for the same entity type.

That being said, what’s the disadvantage of this approach? Extra tag space is needed for each value added and extra compute time is needed to interpret each key-value pair. It also has weak type check. Every value will be a string to allow them fit into the same map.

On the other hand, what do traditional relational databases do? Fixed schema! Each vertex/edge type has a pre-defined schema, so that tag is not needed and strong type check is enforced at data insertion time. It’s well-known strong type check has the benefit of cleaner data and better compression for storage, and less interpretation time.

What does TigerGraph do? The trade off. We require user postulate a fixed schema at entity level–vertex/edge type, such that we have the benefits of strong type check and better compression ratio. However, we allow schema evolvement. That means, as the application evolve, we allow the user to change the schema over time by a schema-change job. See reference doc https://doc.tigergraph.com/GSQL-Language-Reference-Part-1---Defining-Graphs-and-Loading-Data.html#GSQLLanguageReferencePart1-DefiningGraphsandLoadingData-ModifyingaGraphSchema

This schema change job can be online, meaning you evolve your schema while there are queries and data insertion. We handle the concurrency and interleaving semantics. Although it’s less flexible, it’s designed for cleaner data store and high performance DB.

Hope this answered your question.

Best regards,

Mingxi

Hi Mingxi,

Thanks for your explanation.

What a “trade-off”! It’s really a good way to keep a strong check and change the schema at the same time. But it still seems a little bit clumsy and I think it’s not a real dynamic. After all, it needs manual operation like typing ALTER | ADD | DROP.

And the efficiency of concurrency might be a big issue.

Originally, I thought you were referring to the dynamic of self-adaption and automation. And if so, that would be really awesome and astonishing.

Looking forward to seeing your full paper of GSQL. Owing to the power of graph analytics, that will be a big leap in the field of graph query language.

Best regards,

Frank

Frank,

We have a white paper here

Regarding the efficiency of schema change, we see it in seconds if not subsecond, so it’s total practical for big data.

Haven’t had time to submit to an academia publication. Feel free to post your questions here as you use GSQL to unleash the power of graph data.

Mingxi

Hi Frank,

Thanks for your great points! Although TigerGraph is schema-based graph database, if your data is really dynamic, you can still handle it with a generic schema. Here is how to do it:

In the schema, there will be only one vertex type V, and one edge type E, starting from V and targeting at V.

The schema looks like:

The vertex type and edge type contains a string attribute named label, which annotates the type of a vertex or edge instance. In this way, you can contain as many different types of vertex and edge instances as you want.

Also, they contain a map<string, string> type attribute named property, which can contain as many entries as you want according to your data. It is like a document database and each individual vertex and edge instance can have even different entries in this property attribute.

Then, in GSQL queries, you can implement your business logic by conditions involving these attributes. Here is an example of getting all companies that a person Tom has worked with since 2017:

And its executing result will be shown as:

You may notice that the two edges and two company vertices have different entries in their property attribute.

This leads to the conclusion that schemeless (document database style) is a special case of TigerGraph’s schema-based graph definition. You can do it, just sacrifice some performance.

Btw, all the screenshots above are from our GraphStudio UI interface which is delivered as part of TigerGraph’s product. It supports the full graph development life cycle in browser-based drag and drops. It will be really helpful for users to get hands-on with TigerGraph in a short time.

Hi MingXi,

I have some questions. Does GSQL support all graph analytics algorithms or just part of them? Are these algorithms all built-in? Can users customize any graph algorithms they like?

Best regards,

Frank

Frank,

GSQL is turing complete and it can code any graph algorithms. We have built-in graph algorithms, which will be released. All of them are done in GSQL, therefore, users can customize them.

Mingxi