Problems with GSQL Query

I am new to TigerGraph and GSQL. The database is running on an Azure VM with 8 vCPUs and 32GB memory. I have a dataset of about 250,000 edges and 300,000 nodes. All vertex attributes are indexed. I have several problems/questions, but first let me show the query and example response.

Here’s the query:

CREATE QUERY downstream(VERTEX<MaterialPlantVertex> starting_vertex) FOR GRAPH SimpleMassBalance { 
  
  vertices = {starting_vertex};
  
  SetAccum<EDGE> @@edges;
  
  WHILE vertices.size() > 0 DO
    vertices = SELECT t FROM vertices:s -( MaterialPlantEdge  >:e)- :t
    ACCUM @@edges += e;

    PRINT @@edges;
    PRINT vertices;
  END;
}

Here’s an example response:

[
  {
    "@@edges": [
      {
        "attributes": {
          "Quantity": -335.29599,
          "Type": "Production",
          "UomCode": "kg",
          "UomDescription": "KG"
        },
        "directed": true,
        "e_type": "MaterialPlantEdge",
        "from_id": "A001 | 10001056",
        "from_type": "MaterialPlantVertex",
        "to_id": "A010 | 10096038",
        "to_type": "MaterialPlantVertex"
      }
    ]
  },
  {
    "vertices": [
      {
        "attributes": {
          "BusinessCode": "27304",
          "BusinessDescription": "27304_ADHESIVES",
          "MaterialCode": "10096038",
          "MaterialDescription": "10096038_ADCOTE 812/BASE BULK",
          "OffsetQuantity": 7220,
          "PerformanceCenterCode": "25718",
          "PerformanceCenterDescription": "25718_LAM POLYURETHANE",
          "PlantCode": "A010",
          "PlantDescription": "A010_West Alexandria AFP",
          "ValueCenterCode": "16762",
          "ValueCenterDescription": "16762_LAMINATING ADHESIVES"
        },
        "v_id": "A010 | 10096038",
        "v_type": "MaterialPlantVertex"
      }
    ]
  },
  {
    "@@edges": [
      {
        "attributes": {
          "Quantity": 1850,
          "Type": "STO",
          "UomCode": "KG",
          "UomDescription": "KG"
        },
        "directed": true,
        "e_type": "MaterialPlantEdge",
        "from_id": "A010 | 10096038",
        "from_type": "MaterialPlantVertex",
        "to_id": "A051 | 11135745",
        "to_type": "MaterialPlantVertex"
      },
      {
        "attributes": {
          "Quantity": 10034,
          "Type": "309_TF tfr ps.mat.to mat",
          "UomCode": "kg",
          "UomDescription": "KG"
        },
        "directed": true,
        "e_type": "MaterialPlantEdge",
        "from_id": "A010 | 10096038",
        "from_type": "MaterialPlantVertex",
        "to_id": "C466 | 99048702",
        "to_type": "MaterialPlantVertex"
      },
      {
        "attributes": {
          "Quantity": -335.29599,
          "Type": "Production",
          "UomCode": "kg",
          "UomDescription": "KG"
        },
        "directed": true,
        "e_type": "MaterialPlantEdge",
        "from_id": "A001 | 10001056",
        "from_type": "MaterialPlantVertex",
        "to_id": "A010 | 10096038",
        "to_type": "MaterialPlantVertex"
      }
    ]
  },
  {
    "vertices": [
      {
        "attributes": {
          "BusinessCode": "16979",
          "BusinessDescription": "16979_CONSTRUCTION CHEMICALS",
          "MaterialCode": "11135745",
          "MaterialDescription": "11135745_PRIMAL EC-2019R AF/BULK",
          "OffsetQuantity": 62648,
          "PerformanceCenterCode": "17014",
          "PerformanceCenterDescription": "17014_CONSTRUCTION",
          "PlantCode": "A051",
          "PlantDescription": "A051_Queretaro Coatings",
          "ValueCenterCode": "17009",
          "ValueCenterDescription": "17009_CONSTRUCTION"
        },
        "v_id": "A051 | 11135745",
        "v_type": "MaterialPlantVertex"
      },
      {
        "attributes": {
          "BusinessCode": "NA",
          "BusinessDescription": "NA",
          "MaterialCode": "99048702",
          "MaterialDescription": "99048702_SHPOL NF-1159M2 THDR 210KG",
          "OffsetQuantity": 8111,
          "PerformanceCenterCode": "NA",
          "PerformanceCenterDescription": "NA",
          "PlantCode": "C466",
          "PlantDescription": "C466_Guangzhou Systems House",
          "ValueCenterCode": "NA",
          "ValueCenterDescription": "NA"
        },
        "v_id": "C466 | 99048702",
        "v_type": "MaterialPlantVertex"
      }
    ]
  },
  {
    "@@edges": [
      {
        "attributes": {
          "Quantity": 1850,
          "Type": "STO",
          "UomCode": "KG",
          "UomDescription": "KG"
        },
        "directed": true,
        "e_type": "MaterialPlantEdge",
        "from_id": "A010 | 10096038",
        "from_type": "MaterialPlantVertex",
        "to_id": "A051 | 11135745",
        "to_type": "MaterialPlantVertex"
      },
      {
        "attributes": {
          "Quantity": 10034,
          "Type": "309_TF tfr ps.mat.to mat",
          "UomCode": "kg",
          "UomDescription": "KG"
        },
        "directed": true,
        "e_type": "MaterialPlantEdge",
        "from_id": "A010 | 10096038",
        "from_type": "MaterialPlantVertex",
        "to_id": "C466 | 99048702",
        "to_type": "MaterialPlantVertex"
      },
      {
        "attributes": {
          "Quantity": -335.29599,
          "Type": "Production",
          "UomCode": "kg",
          "UomDescription": "KG"
        },
        "directed": true,
        "e_type": "MaterialPlantEdge",
        "from_id": "A001 | 10001056",
        "from_type": "MaterialPlantVertex",
        "to_id": "A010 | 10096038",
        "to_type": "MaterialPlantVertex"
      }
    ]
  },
  {
    "vertices": []
  }
]

Problems:

  • The query is timing out (16+ seconds) when result set is greater than ~10 nodes/edges. Is there something wrong with my query?
  • The same edges are listed more than once. Search for A001 | 10001056 in the output
  • The resultant JSON is in a very strange (to me) format. Is it possible to just have an array of vertices and an array of edges?
  • Having the @ character in property names of JSON is not a good practice. Is it possible to remove them in the JSON?

I cannot comment on the performance, but you can easily increase your timeout. You can also add the DISTRIBUTED keyword to your query (CREATE DISTRIBUTED QUERY) if you are running in a clustered environment.

You can pick any name you want when you print the output, so if you want to remove the @ symbol, you could use PRINT @@edges as myEdges;

I don’t see any duplication problem. You are printing the @@edges inside the WHILE loop. Move it outside the loop if you only want to see the final list. Or maybe I didnt understand your question.

You can control the output, and how its nested in JSON. You can also pick a subset of the attributes like this

PRINT vertices[vertices.BusinessCode];

But many people call the query as an API from python or a different language, then process the json in their program.

2 Likes

Hi @bnsmith,

  • I suspect that the query is timing out due to the PRINT statement being inside the WHILE loop. Since @@edges is a global accumulator, it will maintain the values added to it over each iteration of the WHILE loop.
    In your example @@edges is printing the entirety of its contents each iteration most likely leading to the timeout due to sheer data volume. You can set a custom timeout with the Run Configuration option in GraphStudio or with the GSQL-Timeout header if this is the intended functionality of your query.
  • Above is also why A001 appears more than once. You can see that there are more than one instance of @@edges in the output.
  • If you’re looking for the accumulator to just store the names of vertices and edges, then you can just add e.from_id or e.to_id to your accumulator (you’ll need to change the accumulator to accept strings). You can do the same with vertices with v.id.
  • You can print any value with an alias by using PRINT @@edges AS Edges
1 Like

Hey @bnsmith.

So, without knowing more about the specifics of your dataset, it’s not possible to speak specifically about why your query is timing out.

However, there are two common possibilities that come to mind:

  • First, if your MaterialPlantVertex -MaterialPlantEdge-> MaterialPlantVertex topology is cyclical anywhere, it’s be possible to select a starting_vertex that causes the query to time out.
  • Perhaps a bit more nuanced, and again depending upon the topology: If your structure is mostly tree-like and there is a sufficiently large tree, it may be possible to choose a starting_vertex that essentially causes the entire graph to be loaded into the results. Dependent upon the average number of branches n, the result size could potentially grow by a factor of n at each iteration.

Does the query run to completion if you add a maxIterations parameter to the check on the WHILE loop?

Additionally: I want to echo what @markmegerian said about using PRINT inside the WHILE loop. I recommend waiting until the loop has run to completion before executing the PRINT statement.

Thank you everyone! These suggestions are very helpful! I’ve made some changes and things seem to be working better. I haven’t tested performance yet, but it seems better.

Here’s the new query (trying to include attributes of vertex):

CREATE QUERY bns2(STRING plant_code, STRING material_code) FOR GRAPH SimpleMassBalance { 
  
  vertices = {MaterialPlantVertex.*};
  
  SetAccum<VERTEX<MaterialPlantVertex>> @@vertices;
  SetAccum<EDGE> @@edges;
  
  WHILE vertices.size() > 0 DO
    vertices = SELECT t FROM vertices:s -( MaterialPlantEdge  >:e)- :t
                WHERE s.PlantCode == plant_code AND s.MaterialCode == material_code
    ACCUM 
      @@vertices += t,
      @@edges += e;
  
    # PRINT vertices;
  END;
  
  PRINT @@vertices as DownstreamVertices;
  PRINT @@edges as DownstreamEdges;
}

This gives me the following result:

[
  {
    "DownstreamVertices": [
      "H013 | 3337367"
    ]
  },
  {
    "DownstreamEdges": [
      {
        "attributes": {
          "Quantity": 1222,
          "Type": "Trade",
          "UomCode": "KG",
          "UomDescription": "KG"
        },
        "directed": true,
        "e_type": "MaterialPlantEdge",
        "from_id": "A001 | 10001047",
        "from_type": "MaterialPlantVertex",
        "to_id": "H013 | 3337367",
        "to_type": "MaterialPlantVertex"
      }
    ]
  }
]

Notice I’m only getting the IDs from the vertices. How do I get a similar output as edges i.e. v_type, v_id, and attributes?

Also, I’m only getting the target vertex in array of vertices. The source is not included. I have a feeling this is GSQL 101, but I’m struggling here. :laughing:

I would like to add that Indexing every attribute might not be a good use of indexing.

To access the attributes try “. {attribute name}” that you are interested in a PRINT statement. Please note if you didn’t mark the ID as an attribute you will not be able to view it

I would also recommend reaching out to training@tigergraph.com and asking for some time with our trainers. Currently, it is free, and spending 15 hours in training will actually return the time 10 fold with a better understanding of the basics.

1 Like

Thanks for the response Carl. I will look into training for the team. Agree on the indexing. This was just done for testing. It’s not a long-term design.

Hi, Bnsmith,

Have you resolved your question?

Yes. Thank you Carl! The team has learned a ton about GSQL in the past couple weeks. We still have a long way to go, but we have some of these simpler concepts down now.

2 Likes