What is the best option run Load jobs

What are the best options to load 100 GB of data in CSV?

  • Run using Notebook
  • Studio
  • Anything else?

depends where the data is located. You need to bring it (upload) to server first if you want to load using Studio. I’m using Python for all my uploads, it can be very fast if you configure it properly. And, it should not run on the server, you need a “jump” machine for it.

  • All CSV data files are in a K8S volume, mounted on the VMs
  • Therefore, all CSV data files are available local to the Tigergraph
  • Our nootebook client runs on a cheap hardware

Question

  • Once we submit the job, what role does notebook client play?
  • Can we submit the job using notebook client and disconnect?

Then you can create a loading job, check if the files are readably by server, start the job and disconnet the notebook client. Once started, job is going to run in the background (it’s not REST API driven!)

1 Like

In addition to executing the load jobs via pyTigerGraph notebook and GraphStudio, you can use gsql client either locally or remotely (as long as the target files are accessible on the server).

1 Like

after loading job where is the data stored? it is not in the local data source.

@Robert_Hardaway thanks for the reply.

  • I would love this option, being able to use qsql
  • We could launch workflows using cronjob that uses qsql

Does TG have any sample cronjob?

@porscheme you can create the query and install it, after that you get an exposed REST endpoint to call that query.

After that, you can crontab a bash script to curl that query.