I am working on this netflow event dataset as defined as below (csv)
epoch_time,duration,src_device, dst_device,protocol,src_port ,dst_port,src_packets ,dst_packets ,src_bytes,dst_bytes
I created a “device” vertex for src_device/dst_device.
I then created a directed edge from device to device with the rest of columns as attributes.
My test csv has 10k rows and I know there are 24 duplicates there, which gives me 9976 unique records.
I then loaded csv file. I got 1471 vertices, which I can understand that the dataset has 1471 unique devices. What is surprising is that I only got 1490 edges. I expect to see at least 9976 edges for unique connections between devices. Below is my gsql
create vertex device(PRIMARY_ID id String)
create directed edge netflow (
from device, to device,
epoch_time UINT, duration UINT, protocol UINT, src_port String, dst_port String,
src_packets UINT, dst_packets UINT, src_bytes UINT, dst_bytes UINT)
create loading job load_devices for graph host_event_netflow
{
define filename in_file;
load in_file
to vertex device values ($2),
to vertex device values ($3),
to edge netflow values ($2, $3, $0, $1, $4, $5, $6, $7, $8, $9, $10)
using HEADER="false", SEPARATOR=",";
}
What did I do wrong?