What is the best option run Load jobs

porscheme · February 23, 2022, 6:11pm

What are the best options to load 100 GB of data in CSV?

Run using Notebook
Studio
Anything else?

Bruno · February 23, 2022, 6:24pm

depends where the data is located. You need to bring it (upload) to server first if you want to load using Studio. I’m using Python for all my uploads, it can be very fast if you configure it properly. And, it should not run on the server, you need a “jump” machine for it.

porscheme · February 23, 2022, 6:58pm

All CSV data files are in a K8S volume, mounted on the VMs
Therefore, all CSV data files are available local to the Tigergraph
Our nootebook client runs on a cheap hardware

Question

Once we submit the job, what role does notebook client play?
Can we submit the job using notebook client and disconnect?

Bruno · February 24, 2022, 9:11am

Then you can create a loading job, check if the files are readably by server, start the job and disconnet the notebook client. Once started, job is going to run in the background (it’s not REST API driven!)

Robert_Hardaway · February 24, 2022, 2:47pm

In addition to executing the load jobs via pyTigerGraph notebook and GraphStudio, you can use gsql client either locally or remotely (as long as the target files are accessible on the server).

achmdirfand · February 25, 2022, 4:38am

after loading job where is the data stored? it is not in the local data source.

porscheme · February 25, 2022, 5:42pm

@Robert_Hardaway thanks for the reply.

I would love this option, being able to use qsql
We could launch workflows using cronjob that uses qsql

Does TG have any sample cronjob?

Mohamed_Zrouga · March 1, 2022, 5:16pm

@porscheme you can create the query and install it, after that you get an exposed REST endpoint to call that query.

After that, you can crontab a bash script to curl that query.

minhntn · October 16, 2023, 9:21am

Can you share more about this?
How to use crontab to run query in TG (if I have 3 query, and i want to run the sequencely)?