I’m wanting to use a split function inside a query to parse strings into a list of elements.
When reading the reference guide the split function is listed under “Loading a LIST or SET Attribute” and “Loading a MAP Attribute” which has a different usage than parsing a string inside a query.
I would like to do something like this:
ListAccum @@words;
ACCUM @@words += SPLIT(“tom,dick,harry”, ”,”);
I was pleased to see some string functions in the documentation but it’s a very limited list and doesn’t have “split”. How do people typically handle this and can we create our own custom functions?
Thank you.
Hi again George,
Yes, you can write your own custom functions in this file: <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp.
You can return a ListAccum from a function in the file and add it to your @@words accumulator like you did above, I’ll include an example below.
Here is a user-defined function that does what you’re looking for:
inline ListAccum string_split (string str, string delimiter) {
ListAccum<string> newList;
size_t pos = 0;
std::string token;
while ((pos = str.find(delimiter)) != std::string::npos) {
token = str.substr(0, pos);
newList += token;
str.erase(0, pos + delimiter.length());
}
newList += str;
return newList;
}
To use this in your query, you can simply do
@@words += string_split(“tom,dick,harry”,",");
Thanks,
Kevin
Thank you Kevin, it worked perfect and I’m glad to see we can create custom functions like that. I have 2 questions:
-
Is it possible to deploy functions like this to the cloud account and if so, how?
-
I would think there would be a large repository of custom functions like this from internal and user contributions. Is there such a resource?
Thank you.
Ok thank you for clearing this of. By implementing your sample split function I learned that custom functions are not written in GSQL but rather what looks like c++. Can you confirm this is c++ and is there guidelines on what can and cannot be done in writing these custom functions? I searched “custom function” in the TG website and didn’t find anything. For example, in SQL Server we can write CLR functions in c# but there is a guideline explaining what can be done, all the constraints and how to deploy them.
Hello Kevin,
This is very useful information that I could not get form the TigerGraph documentation. The documentation says:
Users can define their own expression functions in C++ in <tigergraph.root.dir>/dev/gdk/gsql/src/QueryUdf/ExprFunctions.hpp. Only bool, int, float, double, and string (NOT std::string) are allowed as the return value type and the function argument type. However, any C++ type is allowed inside a function body. Once defined, the new functions will be added into GSQL automatically next time GSQL is executed.
But, it seems like I can return a list like ListAccum. That is great! Can I pass any of the GSQL container types such as set and map? Do I declare them on C++ side just like they are declared on GSQL side?
Will following UDF declaration be acceptable?
inline ListAccum string_split (ListAccum input) {
ListAccum<INT> newList;
// code ...
return newList;
}
Kumar
inline bool cache_vector(ListAccum p) {
//… … code
return true;
}
gives me compilation error when I run ./compile in UDF directory. Please suggest how a container type can be passed and returned in UDF.
Thanks
Kumar
Hi Kumar,
You can pass in a string as a parameter for your UDF, like I did for the example from way back.
Use a for each statement in the GSQL code and you can iterate through the ListAccum and call the function for each string.
Thanks,
Kevin
Okay, thanks, will do. So, return of a udf can be a container, but not args?