Is query only executed on driver?

nquinn · November 14, 2023, 6:11pm

I keep hitting the same error when I try to read data back from TG but it seems like if it is read from executors then it would be able to store the data in a distributed way. I believe that the query is only being executed on the driver which would make sense why it cannot handle a huge dataset. If this is the case, what is the point of executing a read using the jdbc driver except to just get an automatic conversion from map to a data frame. Please advise.

java.lang.IllegalArgumentException: HTTP entity too large to be buffered in memory
at org.shaded.apache.http.util.Args.check(Args.java:36)
at org.shaded.apache.http.util.EntityUtils.toString(EntityUtils.java:206)
at org.shaded.apache.http.util.EntityUtils.toString(EntityUtils.java:308)
at com.tigergraph.jdbc.restpp.driver.RestppResponse.(RestppResponse.java:59)
at com.tigergraph.jdbc.restpp.RestppConnection.executeQuery(RestppConnection.java:651)
at com.tigergraph.jdbc.restpp.RestppPreparedStatement.execute(RestppPreparedStatement.java:95)
at com.tigergraph.jdbc.restpp.RestppPreparedStatement.executeQuery(RestppPreparedStatement.java:70)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:317)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)

Victor_Lee · November 15, 2023, 1:09pm

Could you provide a little more background?

Which JBDC driver are you using?

Do you know roughly at what data size you are hitting this limit?
There is a 2GB limit for the result set size, which is imposed by the database, not by the JDBC driver.

nquinn · November 15, 2023, 9:29pm

JDBC Driver (TG): 1.3.12
I am trying to retrieve a map result size of no more than 500MB based only on the expected size of the keys and values. It seems like the error is on the HTTP response size. I don’t know how to increase this in the JDBC Driver.

interesting_ideas · July 29, 2024, 3:17pm

Hey @Victor_Lee Is there any way to increase the limit? I am trying to get a query result into a dataframe with millions of records.

@nquinn How did you overcome the issue?