CSV Loading Multiple Vertex Types

I read through the CSV loader documentation, and saw a great example of it being used in GSQL 101. But it left some questions open for me. For example, say I have two vertex types which do have edges between them.

Maybe: Person -> Car or something like that. How can you, or can you, use the CSV loader for that inside of GSQL? Or does this have to be manually performed with the GUI?

A small example or pseudo code of the CSV headers used would be extremely helpful - if this is possible.

Here is an example about users watching videos:

CREATE VERTEX user (
    PRIMARY_ID username STRING,
    username STRING,
    communityID INT,
    country STRING
)

CREATE VERTEX video (
    PRIMARY_ID videoID INT,
    videoID INT
)

CREATE UNDIRECTED EDGE watched (FROM user, TO video, percentWatched INT, commented BOOL, liked BOOL)

Loading:

CREATE GRAPH youtube (*)
USE GRAPH youtube

CREATE LOADING JOB loader FOR GRAPH youtube {
    DEFINE FILENAME videoInfo ="~/tigergraph/1.csv";
    DEFINE FILENAME userInfoAndConnections ="~/tigergraph/2.csv";

    LOAD videoInfo TO VERTEX video VALUES ($"id", $0, _, $"location") USING header="true", separator=",";
    LOAD userInfoAndConnections TO VERTEX user VALUES ($"user", $0) USING header="true", separator=",";
    LOAD userInfoAndConnections TO EDGE watched VALUES ($"user", $"video", $"%", $"comment", $"like") USING header="true", separator=",";
}
RUN LOADING JOB loader
1 Like

So, here’s a hacky but handy way to find out what GS is doing.
First, do the thing you want in graph studio.
Then export the solution to your local drive.
Then expand it and look inside, you’ll find all sorts of things, but one of them contains all the loading job definitions for whatever crafty idea you have had.
It also has the schema definition scripts which is what we usually use it for.

1 Like

Thanks for the example and the advice. I don’t have access to the GUI (running TG headless on a VM because my OS is unsupported). I’ll just have to tinker with it I guess, but in general it looks possible to do at least. For now thats the major concern.

1 Like

I’m having another try at this now. I was able to load in a variety of vertices to a loading job no trouble whatsoever. But I cannot load edges and am finding the documentation hard to unpack(specifically: https://docs.tigergraph.com/dev/gsql-ref/ddl-and-loading/creating-a-loading-job#more-complex-attribute-expressions under the section Destination Clause).

here’s what I am trying:
Schema:

CREATE DIRECTED EDGE edgeName (FROM VertexA, TO VertexB | FROM VertexB, TO VertexB)

Job:

...
LOAD edgeName TO EDGE step VALUES ($0, $1) USING header="false", separator=",";

When I run this I get the following error:

Semantic Check Fails: The edge type edgeName is composed of edge pairs, the loading statement should specify FROM vertex

When I try to make it a CSV with 4 columns(type_a,ID_a,type_b,ID_b) to denote the vertex types I get:

Semantic Check Fails: values() index number is 4 while edge 'edgeName' has 2 columns!

If I try to make it a triple because only one vertex type is ambiguous I get:

Semantic Check Fails: values() index number is 3 while edge 'edgeName' has 2 columns!

Not sure if this is a bug or I am missing something obvious? Running TG 3.15

Made some progress.

If I specify the vertex type, like:

LOAD edgeFile TO EDGE edgeName VALUES ( $0 vertexA, $1 vertexB) USING header="false", separator=",";

the job will load.

But I haven’t stumbled on the syntax for edges that can connect to multiple kinds of vertices yet. IE:

LOAD edgeFile TO EDGE edgeName VALUES ( $0 (vertexA|vertexB), $1 vertexB) USING header="false", separator=",";

is not valid.

I was able to make a work around by creating loading jobs via the vertex types of the edges. Not ideal, but, it’s off the ground. I’d still love to hear of other solutions/suggestions to this problem.

Found a better way thanks to @Lenny_Chen

For a CSV with the following format:

typeA,ID,typeB,ID
typeB,ID,typeC,ID
typeA,ID,typeB,ID
LOAD edgeFile TO EDGE edgeName VALUES ( $1 $0,  $3 $2) USING header="false", separator=",";

The key there is where and how the commas are used.