How can I speed up the query when there are a lot of nodes as input?

wei · September 23, 2021, 8:08am

There are 500000 company names as input and the name is the ID of the vertex Company.
Then I did some tests.

QUERY1

CREATE QUERY test1(Set<vertex<Company>> name) FOR GRAPH TEMP{
    start = {name};
    print start;
}

QUERY2

CREATE QUERY test2(Set<string> name) FOR GRAPH TEMP{
    start = to_vertex_set(name, "Company");
    print start;
}

QUERY3

# all names were written in a file
CREATE QUERY test3(string input_file) FOR GRAPH TEMP{
    SetAccum<vertex<Company>>@@start;
    @@start = {LOADACCUM(input_file,$0,",",false)};
    start = {@@start};
    print start;
}

QUERY1 takes about 14s;
QUERY2 takes about 14s;
QUERY3 takes about 7s;
It still takes too long to complete the query.Why QUERY1 and QUERY2 are slower than QUERY3, How can I speed up the query?

Xinyu_Chang · September 23, 2021, 3:47pm

Hi Wei,

These are pretty much all the possibilities in GSQL to start from a set of vertexes.

1, 2 is more expensive because they use network and 3 reads from disk. as we know disk is more efficient than network.

I think one possibility of optimization is from your use case perspective, why do we need to start from 500000 vertexes? What do you do after finding all of them?

Is it possible to find all of them based on some filtering condition?

Thanks.

wei · September 24, 2021, 2:53am

Thanks for your replay.I did this just for test.