How to check if any item in list attribute matches?

Let’s say I have a vertex attribute called names that’s a list of strings. I’d like to check the existence of a string pattern in a WHERE clause. For example, let’s say I wanted to go out one hop and look for vertices that contain a name that starts with bob. Could be bob smith, bob jones or whatever. Furthermore there could be some names without bob in them – all I want to check is that there’s at least one bob.

It would be nice if I could do something like this:

hop = SELECT v
      FROM   Start:u-((_|_>):e)-ANY:v
      WHERE  ANY(n in v.names WHERE n LIKE 'bob%')

Is there a way to do this in GSQL?

One way is like this. I haven’t run this (sorry) but hopefully it give you a clue:

Firstly is the match itself. To get a pattern match you need a User defined function. Right now it is tricky to get UDF’s into cloud. Nevertheless they work great. A UDF I use for this is at the bottom of this post.

If you don’t want to go down the UDF path, then you need to modify your schema and data to take advantage of the LIKE clause. So you need to explode the list of names into a Names vertex, and link them. Then your code will look a little like:

hop = SELECT v
      FROM   Start:u-((_|_>):e)-ANY:v-()-Name:n
      WHERE  n.fullname LIKE "bob%"

Here’s the UDF stuff:

The header goes at the top of ExprFunctions.hpp, the code somewhere near the bottom. Check out the UDF docs if you wish to go down this path. If you are on cloud, well, it’s extremely hacky to get UDF’s in there right now. Maybe avoid the UDF solution for that.

#include <boost/regex.hpp>

// regex match, this one returns a boolean
  // based on code for tokenizer

  inline bool str_regex_match (string toCheck, string regEx) {
    bool res = false;
    // only performs the check when the input is valid
    try {
      res = boost::regex_match(toCheck, boost::regex(regEx));
    } catch (boost::regex_error err) {
      res = false;
    }
    return res;
  }

After that you’ll need to loop through the names in an ACCUM or POST-ACCUM clause.

In principle you could write a UDF that process a list to avoid the loop. I’ll leave that as an exercise :slight_smile:

1 Like

Thanks this is useful. I ended up using a vertex-attached accumulator to record whether a matching name exists, and filtered on that later in the query.

However I do think GSQL should consider adding builtin functions to do this kind of filtering. For example on vertices you can do the following: v.neighbors("worksFor").filter(worksFor.startYear >= yr)

It would be useful if you could do the same thing on a bag/set/list. Then the size of the result could be examined to determine if any element remains.

I agree. I may write the aforesaid UDF to make it so, if I can work it out. UDF’s on collections are a newness for me though, so it may take a little time. Maybe others in the community can help.

It will still need to be idiomatically appropriate (so based on Accumulators etc so we benefit from the distributed parallelisation), but should be equivalent to the Gremlin’y example you gave.

There is a UDF topic which I monitor, and we have roadmap to make the whole installation of UDF’s much easier, driven by cloud, though that will not be in the short-term.

1 Like

And here it is:

http://dev.tigergraph.com/forum/t/my-library-of-udfs/390/2?u=rik

1 Like