Is there a way to use PyTigerGraph conn method to load parquet from S3?

gsimeone · June 21, 2022, 9:43am

Hello -

I have a working script that loads data from S3 into our graph using GSQL.
I want to be able to run it from DataBricks using the conn class method conn.uploadFile. But it doesn’t seem to be working.

Here’s my steps:

Step 1: create loading job string

load_job = f'''

use graph {graph}
drop job {test_load_vertices}
drop data_source {data_source}
create data_source S3 {data_source} for graph {graph}

set {data_source} = "/home/ubuntu/s3.config"

CREATE LOADING JOB {test_load_vertices} FOR GRAPH {graph} {{
       DEFINE FILENAME MyDataSource;
       LOAD MyDataSource to VERTEX identity VALUES($"id", $"name", $"countries") USING JSON_FILE = "true";}}

Step 2: load it using the conn class

conn.gsql(load_job)

The above runs successfully:
Step 3: use the conn.uploadFile() function.

conn.uploadFile(filePath="s3://path/to/test_file.parquet", fileTag='MyDataSource', jobName=test_load_vertices, timeout=600000)

This doesn’t produce any output and in graphstudio load data page nothing happens.