I have a installed query in TG Server for a graph, which has a datetime input. I want to use this query output as a spark dataframe. We are using pyspark and the read command will be something like this
spark.read.jdbc.option(connection_details…).option(“dbtable”, "query my_query(inp_ts=_____).load().
I want to pass a datetime say 2023-07-09 00:00:00 to the query. How should I? I tried with and without quotes(single and double) but getting strange errors like Java lang Out of Range Index
Also i see that jdbc read isn’t fetching the schema of data and all columns of dataframe are strings.
@interesting_ideas I’m not certain if this blog post will be helpful, but would you mind taking a look at it to see if it addresses your needs? If not, I’ll try to dig deeper into the issue.
https://medium.datadriveninvestor.com/an-introduction-to-pyspark-and-tigergraph-9c3396835bc2
Hey Jon, the article doesn’t answer my question. I have seen you post this many times to queries on spark but this is clearly not enough documentation for PySpark/Spark.
@interesting_ideas Your formatting looks correct. The default value for DATETIME parameter is "%Y-%m-%d %H:%M:%S"
as a STRING value. I’ll ask the team that manages the connector if they have additional insights they can add.
not enough documentation for PySpark/Spark
I’ll inform the documentation team about updating more information regarding Spark the Spark connector. This README.md covers a lot of the current functionalities including CRUD operations.
I’ve also included some additional material regarding PySpark/Spark.
How to Use Spark with TigerGraph:
How to Use PySpark with TigerGraph: