How to load the Cora data to ML WorkBench?

Hi there,

The data ingestion tutorial for the ML Workbench misses the necessary files to ingest the data to TigerGraph DB. I don’t know where to download the schema.gsql, load.gsql, nodes.csv, edges.csv.

BTW, the current script seems to only load the graph structure information while the GNN demo indicates that the Cora graph loaded from TIgerGraph DB includes the node feature. Could you explain how to ingest the node feature to the TIgerGraph object?

Hey @dozee, thank you for trying the ML Workbench out! It might be that you have an older version of the workbench that didn’t include the data. The examples repository mlworkbench-docs/tutorials/basics at main · TigerGraph-DevLabs/mlworkbench-docs · GitHub has a data directory where you can access both the raw .csv files and the .gsql scripts.

Hope this helps, feel free to followup with any other questions!

Hi @Parker_Erickson, the new tutorial works! Thanks! One more question, the Cora PyG graph object loaded from the given example script doesn’t have the node feature information (node feature tensor is all zero). Do you know how to ingest feature info to TigerGraph graph instance and convert it to PyG graph object? Thanks!

Hmmm, what is the code you are using to ingest the features? The Cora feature vectors are very sparse; so when you are previewing the data it might seem as though they are all zero. Could you do a data.x.sum() or something along those lines to verify that the feature vectors are truly all zero?

Hi Parker,

I have figured out the problem. The PyG graph object exported from the TIgerGraph includes the feature info originally from node.csv. Thanks for your quick debugging suggestion.

1 Like