My dataset doesn’t have an attribute or a combination of attributes that lends itself as a PRIMARY_ID. Since this is required, what do you suggest on what I should use a primary ID? I looked it up and I don’t see an equivalent of auto ID in SQL.
I am not aware of any auto-generated/auto-incremented ID or sequence-like feature in TigerGraph.
But there should be something that differentiates the nodes/vertices, or else how can you create edges/relationships amongst those? 32 exactly same red dots can not be turned into a (meaningful) graph.
If nothing else, the concatenation of all attributes into a big string should provide a unique value, I would think.
Thanks!
I am trying to load the netflow and host event dataset from Los Alamos National Lab. Below is an example of a network transaction:
epoch_time,duration,src_device, dst_device,protocol,src_port ,dst_port,src_packets ,dst_packets ,src_bytes,dst_bytes
118781,5580,Comp364445,Comp547245,17,Port05507,Port46272,0,755065,0,1042329018
As far as I can see, nothing in the header guarantees uniqueness. If I concat all columns, I may get a unique identifier, but that is by chance, not by design.
I think
gsql_concat($0,"|",$2,":",$5,"-",$3,":",$6)
i.e.
epoch_time|src_device:src_port-dst_device:dst_port
would be effectively a unique ID as at any one time there could be only one communication/transmission between two devices on the specific ports.
But I suppose you would not load the entire row into a single vertex. I see at least device
, transmission
and protocol
vertex types.
As a small test, I loaded 10k rows out of 3billion from the dataset with concatenating all 11 columns. There are 24 duplicated rows already.
What I am trying to do at the moment is to reconstruct this exercise https://datasets.trovares.com/cyber/LANL/index.html in TG. As far as I understand their script, they are loading one row to a vertex. I have to admit I am very new to the graph database, so I don’t know yet what is the best way to construct a schema.
Also I am not a network security expert, so I don’t know if exactly same events showing up multiple times has any significance in determining a security breach. For now I just wanted to load all records as it is.
Now I have read the tutorial more closely. You are right. They only use device as vertex and all other information is used as edges. Thanks very much for your advice. @Szilard_Barany