New to graph db. Please help. Thanks!
I’m pretty sure you need a schema. If you want a schemaless Graph DB, I think you have to go elsewhere…
A TG 101 tutorial is available, but I found it to glosses over some things that can bite you later.
(I may not be allowed to say this here but… Neo4J is a lot easier to get up and running for a total graph DB beginner. No schema is needed and you can add attributes immediately but the the downside for this flexibility is the speed of the DB is slower. However, if your future Graph DB gets big, Neo4J will probably not be able to handle the load/size that TG can.
It’s a bit like a compiled vs. an interpreted language. The later is a lot easier to use but runs slower especially on bigger problems.)
Thanks a lot. Then I probably let Neo4J to draw a schema for me first.
If you make a typo with an attribute name in Neo4J, you will have created a new attribute without intending to. TG is safer from that standpoint (classic interpreted vs. compiled language trade off: flexibility vs safety.)
Another trap with Neo4J, is you can accidentally create multiple identical nodes(vertices) and/or edges by accident. (You can avoid some of this by declaring a node attribute being an unique index.)
I also found the Neo4J training materials and documentation better. (It is an older product, so that’s not entirely surprising. Also, it’s just less complex.)
But if you know ahead of time that you will be using a very large graph DB (millions or billions of nodes) or chase nodes many hops away or you need very fast execution, etc. the TG could be a better choice. In that case, it may be better to suffer with the rigor, complexity, and sophistication of GSQL to eventually have execution efficiency.
Mind you, the Neo4J’s language Cypher is different from TG’s GSQL, so if you learn one, you’ll run into stumbling blocks when switch over to the other because their syntax is quite different.
[added] I’m writing some personal notes on the two languages. What I will say at this point, is Cypher presents a lower “cognitive load”, so you can better focus on the problem at hand. GSQL presents a higher “cognitive load”, which is a price you need to pay for the safety and efficiency (although some of the language constructs in GSQL are better than Cypher, which sometimes doesn’t have the equivalents but has work arounds, e.g. GSQL has FOREACH
.
Of course, my opinions on GSQL may change, as I’ve just started struggling with it… and I’ve spent a lot more time with Neo4J’s Cypher.
)
Thank you very much for the detailed comp. What I assume Neo4J can do is importing the csvs and draw a schema based on the csv, with one or two Cypher sentences. As a beginner, I just want to see an original version of the schema then modify it as I understand more. I think AWS neptune does not require a schema before loading data.
Cypher is very flexible (which can get you into trouble!). You do not have to declare anything ahead of time. BUT if you make a typo, then you will create a bit of a mess that you will need to clean up. (You can also make a snap shot of the DB and return to it if you do make a mess.). The advantage of TG, is if you blunder with a typo, it will stop you. (Typical trade off of safety/efficiency vs. flexibility).
Note: when Cypher encounters you creating a new node with a new node Label (e.g. Person
), it will create that new Node Label on the fly. You can also easily rename node Labels. Another great feature is a Node can multiple Labels, which is like multiple inheritance. (Cypher Labels correspond to TG’s Vertex types.)
With Cypher, it is useful to create a uniqueness constraint on a field to prevent accidentally creating duplicate nodes. Cypher also has MERGE
which creates a new thing if it doesn’t already exist or merges if it does already. There are functions in the APOC library which can also help clean up messes… (TG argues that the APOC shows the a weakness of the Cypher language, as you can’t easily add to the language. APOC is written in Java, which is what Neo4J is.)
In terms of relationships (edges), Cypher is ALWAYS directed, but you can make undirected queries. OTOH, you don’t have to declare the reverse relationship. That is:
-
Neo4J uses
->
and<-
in an intuitive and flexible way to indicate directionality of relationships/edges. You can write either
*(FROM_NODE)-[edge]->(TO_NODE)
*(TO_NODE)<-[edge]-(FROM_NODE)
* or even(TO_NODE1)<-[edge1]-(FROM_NODE)-[edge2]->(TO_NODE2)
*(FROM_NODE1)-[edge1]->(TO_NODE)-[edge2]<-(FROM_NODE2)
*(NODE1)-[edge1]-(NODE)-[edge2]-(NODE2)
if you don’t care about direction -
(I believe this to be true) TG reads left to right and the
->
doesn’t actually indicate the direction. You can replace it with a-
(which is probably less confusing) If in TG you have:
*FROM_NODE-(edge)->TO_NODE
* thenTO_NODE-(rev_edge)-FROM_NODE
is how the reverse relationship must be written as whererev_edge
must be declared in the schema as an add onto to the declaration ofedge
. You can’t use<-
My recommendation, is do a small scale import of your data to make sure you understand what’s going on before doing a big import. (This is true for any DB!)
Neo4J also has an APOC library that imports JSON, but I do not know how it works (yet.)
The Neo4J community board is very active and supportive.
Neo4J is easy to install on Linux and Mac (and to a lesser degree on Windows.)
I found importing CSV with Cypher fairly easy.
What is (probably) true, is if you have huge amounts of data, Cypher will take very long vs. TG.
[added]
You can do complex things with the Cypher import. In fact, I used a CSV file with the Cypher import to delete existing nodes based on the data in the CSV file.
BTW, in Neo4J, after you’ve imported the data, you can just call:
call db.schema.visualization()
(The older call db.schema()
no longer works)