Loading Job MAP<String,String> type attribute syntax

svattiku · January 24, 2024, 9:33pm

conn.gsql(’’’
use graph {0}
CREATE LOADING JOB file_load_job FOR GRAPH {0} {{
DEFINE FILENAME MyDataSource;
LOAD MyDataSource TO VERTEX vertex_name_map VALUES($0, $0, $2,_) USING SEPARATOR=",", HEADER=“true”, EOL="\n";
}}’’’.format(graphName))

Error:
Semantic Check Fails: Attribute attributes is MAP<K, V> type, while the corresponding loading column is not a key-value pair, please use “( $x → $y)”.

how to do data mapping in GSQL for a loading job of MAP<K, V> attribute vertex and json data.

Jim_Limprasert · January 29, 2024, 3:58pm

Hi @svattiku

I don’t think there’s an easy way to do data mapping with MAP <K, V> attribute and JSON data right now.

If you just want to extract some specific JSON fields and you know the field names, I would recommend looking at the following section in our documentation in order to load JSON data into specific different fields.

Please let us know if you have any further questions!

Victor_Lee · January 29, 2024, 5:09pm

Could you please share the schema definition of vertex_name_map?
We can’t say what is the correct way to load to your vertex without knowing your vertex’s structure.

svattiku · January 29, 2024, 10:18pm

I rephrased the above problem in detail again I hope it gives more clarity in the problem I am facing

Problem Statement: Load JSON type data as a string or dict/json or map<STRING,STRING> into Tiger Graph and query the data using Tiger graph functions parse_json_object or parse_json_array or any alternatives.

Vertex Definition:
CREATE VERTEX example_TestMap(PRIMARY_ID id STRING, mapdata STRING, name STRING, mapdata_2 STRING) WITH STATS=“OUTDEGREE_BY_EDGETYPE”, PRIMARY_ID_AS_ATTRIBUTE=“true”

Data Source:
{
“company”: {
“name”: “TechCorp”,
“employees”: [
{
“id”: 1,
“name”: “John Doe”,
“position”: “Software Engineer”,
“skills”: [“Java”, “JavaScript”],
“projects”: [
{
“id”: “proj-001”,
“name”: “SmartApp”
}
]
}
]
}
}

create a csv file using python:
small_json_example ={

“company”: {

"name": "TechCorp",

"employees": [

  {

    "id": 1,

    "name": "John Doe",

    "position": "Software Engineer",

    "skills": ["Java", "JavaScript"],

    "projects": [

      {

        "id": "proj-001",

        "name": "SmartApp"

      }

    ]

  }

]

}

Save Data as csv with ‘sep=’\t’,’

small_data_dict=json.dumps(small_json_example)
dkgSignalInterface_df[“json_column”] =small_data_dict // loading as a JSON string
dkgSignalInterface_df[“json_column_2”] =small_data_dict //loading as a dict/JSON
dkgSignalInterface_df[‘json_column’]=dkgSignalInterface_df[‘json_column’].apply(json.loads)
dkgSignalInterface_df[[‘json_column’,‘json_column_2’]].head(2)
o/p:

	json_column	json_column_2
0	{‘company’: {‘name’: ‘TechCorp’, ‘employees’: …	{“company”: {“name”: “TechCorp”, “employees”: …
1	{‘company’: {‘name’: ‘TechCorp’, ‘employees’: …	{“company”: {“name”: “TechCorp”, “employees”: …

#saved data as csv with ‘tab separated values’

dkgSignalInterface_df.to_csv(’/Workspace/Users/saisarath.vattikuti@intel.com/MLAMA-Datasets/dkgSignalInterface.tsv’,sep=’\t’,header=‘true’,index=False)

LOADING JOB:(python)

conn.gsql(’’’

use graph {0}

CREATE LOADING JOB file_load_job_dkgSignalInterfaceTest_5 FOR GRAPH {0} {{ DEFINE FILENAME MyDataSource;

LOAD MyDataSource TO VERTEX dkgSignalInterface_TestMap VALUES($0, $2,$3,$0) USING SEPARATOR="\t", HEADER=“true”, EOL="\n";

}}'''.format(graphName))

Loading JOB O/p:

results = conn.runLoadingJobWithFile(file_path, fileTag=‘MyDataSource’, jobName=‘file_load_job_dkgSignalInterfaceTest_5’,sep="\t")
print(results)

[{‘sourceFileName’: ‘Online_POST’, ‘statistics’: {‘sourceFileName’: ‘Online_POST’, ‘parsingStatistics’: {‘fileLevel’: {‘validLine’: 10}, ‘objectLevel’: {‘vertex’: [{‘typeName’: ‘dkgSignalInterface_TestMap’, ‘validObject’: 10}], ‘edge’: [], ‘deleteVertex’: [], ‘deleteEdge’: []}}}}]

In Tiger Graph file is saved as:

example data:
json_column_2

“{”“company”": {"“name”": ““TechCorp””, ““employees””: [{"“id”": 1, ““name””: ““John Doe””, ““position””: ““Software Engineer””, ““skills””: ["“Java”", ““JavaScript””], ““projects””: [{"“id”": ““proj-001"”, ““name””: ““SmartApp””}]}]}}”

Read Queries in Tiger Graph:

CREATE QUERY searchAttributes_test_example_post() FOR GRAPH DKG_SPEC_LOAD_TEST{
ListAccum @@mapexample1;
ListAccum @@mapexample2;

JSONOBJECT jsstring;
JSONARRAY jsarray;
INIT = {dkgSignalInterface_TestMap.*};
// Get person p’s secret_info and portfolio
X = SELECT v FROM INIT:v
ACCUM @@mapexample1 += v.mapdata;

FOREACH item IN @@mapexample1 DO
// jsstring=parse_json_object(item);
print item;
break;
END;
// PRINT @@portf;
}
O/P:
[
{
“item”: “”{"“company”": {"“name”": ““TechCorp””, ““employees””: [{"“id”": 1, ““name””: ““John Doe””, ““position””: ““Software Engineer””, ““skills””: ["“Java”", ““JavaScript””], ““projects””: [{"“id”": ““proj-001"”, ““name””: ““SmartApp””}]}]}}”"
}
]

Error while trying to parse the data

CREATE QUERY searchAttributes_test_example_post() FOR GRAPH DKG_SPEC_LOAD_TEST{
ListAccum @@mapexample1;
ListAccum @@mapexample2;

JSONOBJECT jsstring;
JSONARRAY jsarray;
INIT = {dkgSignalInterface_TestMap.*};
// Get person p’s secret_info and portfolio
X = SELECT v FROM INIT:v
ACCUM @@mapexample1 += v.mapdata;
// PRINT @@mapexample1;

FOREACH item IN @@mapexample1 DO
jsstring=parse_json_object(item);
print item;
break;
END;
// PRINT @@portf;
}
jsstring=parse_json_object(item);

O/P:

Runtime Error: “{”“company”": {"“name”": ““TechCorp””, ““employees””: [{"“id”": 1, ““name””: ““John Doe””, ““position””: ““Software Engineer””, ““skills””: ["“Java”", ““JavaScript””], ““projects””: [{"“id”": ““proj-001"”, ““name””: ““SmartApp””}]}]}}” cannot be parsed as a json object.

Need help in resolving this issue

Questions:
1)How to load JSON structured data and query it, are there any best know practices to follow for loading the data into tiger graph and query it using the parse_json_object/parse_json_array functions.
2)Faced issues in loading the file to tiger graph as CSV.
3) Faced issues while loading the JSON/DICT type as a column in a CSV or TSV.
4) Is there any BKM for the above 2 and 3 issues?

Any Help will be much Appreciated

Thanks,
Sai Sarath Vattikuti.

Jim_Limprasert · February 2, 2024, 12:48am

Hi @svattiku ,

Something like:

“{”“company”": {"“name”": ““TechCorp””, ““employees””: [{"“id”": 1, ““name””: ““John Doe””, ““position””: ““Software Engineer””, ““skills””: ["“Java”", ““JavaScript””], ““projects””: [{"“id”": ““proj-001"”, ““name””: ““SmartApp””}]}]}}”

is nor a valid JSON.

Please make sure that the JSON you are using is in a valid format. Note that you should only use the double quotes character like this when specifying a string in JSON: "..." and NOT this “...” (e.g. don’t use opening and closing double quotes). It’s a very minor difference. I don’t think we support parsing JSON with “...” instead of the standard "...".

As to how to load in JSON data, there’s several ways to do that, and each has its own limitations:
- one would be to follow the instruction here, but note that you’d have to specify the specific fields of data you want from JSON.
- another would be to just load it as STRING, and then utilize the parse_json_object(…) function. Which I believed is what you are trying to do right now. This is probably the best approach if you want to preserve the nested data, though it requires some pre-processing to convert something into a JSON object.
- Note that we currently don’t support native JSON data as an attribute for vertex or edge right now (as of version 3.9.3).

Based on this discussion, you should make sure that the JSON string you have is a valid JSON string before loading in the data. Once that is done, please let us know if you encountered any further problems.

Best,
Supawish Limprasert (Jim)