How to pass a string or datetime to query in dbtable option in pyspark read

interesting_ideas · July 13, 2023, 6:57am

I have a installed query in TG Server for a graph, which has a datetime input. I want to use this query output as a spark dataframe. We are using pyspark and the read command will be something like this
spark.read.jdbc.option(connection_details…).option(“dbtable”, "query my_query(inp_ts=_____).load().

I want to pass a datetime say 2023-07-09 00:00:00 to the query. How should I? I tried with and without quotes(single and double) but getting strange errors like Java lang Out of Range Index

Also i see that jdbc read isn’t fetching the schema of data and all columns of dataframe are strings.

Jon_Herke · July 13, 2023, 6:02pm

@interesting_ideas I’m not certain if this blog post will be helpful, but would you mind taking a look at it to see if it addresses your needs? If not, I’ll try to dig deeper into the issue.

https://medium.datadriveninvestor.com/an-introduction-to-pyspark-and-tigergraph-9c3396835bc2

interesting_ideas · July 13, 2023, 6:40pm

Hey Jon, the article doesn’t answer my question. I have seen you post this many times to queries on spark but this is clearly not enough documentation for PySpark/Spark.

Jon_Herke · July 18, 2023, 7:48pm

@interesting_ideas Your formatting looks correct. The default value for DATETIME parameter is "%Y-%m-%d %H:%M:%S" as a STRING value. I’ll ask the team that manages the connector if they have additional insights they can add.

not enough documentation for PySpark/Spark

I’ll inform the documentation team about updating more information regarding Spark the Spark connector. This README.md covers a lot of the current functionalities including CRUD operations.

I’ve also included some additional material regarding PySpark/Spark.

How to Use Spark with TigerGraph:

How to Use PySpark with TigerGraph: