We have deployed the Databricks RDB loader (version 4.2.1) with a Databricks cluster (DBR 9.1 LTS). Both are up, running and talking to each other and we can see the manifest
table has been created correctly. We can also see queries being submitted to the cluster in the SparkUI.
However, once the manifest has been created the RDB Loader runs
SHOW columns in hive_metastore.snowplow_schema.events
Which correctly causes the following error as I would expect:
24/08/2022 08:29:45.0697+0000 [ERROR] com.snowplowanalytics.snowplow.rdbloader: Transaction aborted. Sleeping for 30 seconds for the first time. Caught exception: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Table or view not found for 'SHOW COLUMNS': hive_metastore.snowplow_schema.events; line 1 pos 0;
This seems correct to me and I’ve verified this in a Databricks notebook. What I don’t understand is why isn’t the RDB Loader creating the table first?
Any ideas on how we can resolve this one?
Thanks!
Config
{
"region": "eu-west-1",
"messageQueue": ${?SQS_QUEUE_NAME},
"storage" : {
"type": "databricks",
"host": ${DATABRICKS_HOST_NAME},
"password": ${databricks_access_token}
"catalog": "hive_metastore",
"schema": ${SCHEMA_NAME},
"port": 443,
"httpPath": "path",
"userAgent": "snowplow-rdbloader-oss"
"loadAuthMethod": {
"type": "NoCreds"
}
},
"monitoring": {
"metrics": {
"stdout": {
}
"period": "5 minutes"
},
},
"retries": {
"backoff": "30 seconds"
"strategy": "EXPONENTIAL"
"attempts": 3,
"cumulativeBound": "1 hour"
},
"readyCheck": {
"backoff": "15 seconds"
"strategy": "CONSTANT"
},
"initRetries": {
"backoff": "30 seconds"
"strategy": "EXPONENTIAL"
"attempts": 3,
"cumulativeBound": "1 hour"
},
"retryQueue": {
"period": "30 minutes",
"size": 64,
"maxAttempts": 3,
"interval": "5 seconds"
},
"timeouts": {
"loading": "1 hour",
"nonLoading": "10 minutes"
"sqsVisibility": "5 minutes"
}
}
Relevant Logs
[INFO] com.snowplowanalytics.snowplow.rdbloader: RDB Loader 4.2.1 has started. Listening queue.fifo
[INFO] HikariPool-1 - Starting...
[INFO] HikariPool-1 - Driver does not support get/set network timeout for connections. ([Databricks][JDBC](10220) Driver does not support this optional feature.)
[INFO] HikariPool-1 - Start completed.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.databricks.client.jdbc42.internal.io.netty.util.internal.ReflectionUtil (file:/app/snowplow-databricks-loader-4.2.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of com.databricks.client.jdbc42.internal.io.netty.util.internal.ReflectionUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[INFO] Manifest: The manifest table has been created
[INFO] FolderMonitoring: Configuration for monitoring.folders hasn't been provided - monitoring is disabled
[INFO] DataDiscovery: Received a new message
[INFO] DataDiscovery: Total 1 messages received, 0 loaded
[INFO] DataDiscovery: New data discovery at run=2022-08-23-12-05-00-61fec39a-6165-4504-8ffb-bb68cc37537d with following shredded types:
* iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-*-* WIDEROW
* iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-*-* WIDEROW
[INFO] Load: Loading transaction for s3://bucket/snowplow-stream/transformed/archive/run=2022-08-23-12-05-00-61fec39a-6165-4504-8ffb-bb68cc37537d/ has started
[INFO] Load: Loading s3://bucket/snowplow-stream/transformed/archive/run=2022-08-23-12-05-00-61fec39a-6165-4504-8ffb-bb68cc37537d/
[ERROR] com.snowplowanalytics.snowplow.rdbloader: Transaction aborted. Sleeping for 30 seconds for the first time. Caught exception: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Table or view not found for 'SHOW COLUMNS': hive_metastore.snowplow_schema.events; line 1 pos 0;
'ShowColumns
+- 'UnresolvedTableOrView [hive_metastore, snowplow_schema, events], SHOW COLUMNS, true
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:1019)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:759)
\tat scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
\tat org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:112)
\tat org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:47)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:56)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:737)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:722)
\tat java.security.AccessController.doPrivileged(Native Method)
\tat javax.security.auth.Subject.doAs(Subject.java:422)
\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:771)
\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
\tat java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.sql.AnalysisException: Table or view not found for 'SHOW COLUMNS': hive_metastore.snowplow_schema.events; line 1 pos 0;
'ShowColumns
+- 'UnresolvedTableOrView [hive_metastore, snowplow_schema, events], SHOW COLUMNS, true
\tat org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
\tat org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$2(CheckAnalysis.scala:116)
\tat org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$2$adapted(CheckAnalysis.scala:99)
\tat org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:262)
\tat org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:261)
\tat org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:261)
\tat scala.collection.Iterator.foreach(Iterator.scala:941)
\tat scala.collection.Iterator.foreach$(Iterator.scala:941)
\tat scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
\tat scala.collection.IterableLike.foreach(IterableLike.scala:74)
\tat scala.collection.IterableLike.foreach$(IterableLike.scala:73)
\tat scala.collection.AbstractIterable.foreach(Iterable.scala:56)
\tat org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:261)
\tat org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:99)
\tat scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
\tat com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
\tat org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:96)
\tat org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:96)
\tat org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:191)
\tat org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:248)
\tat org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:347)
\tat org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:245)
\tat org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:96)
\tat com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
\tat org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:134)
\tat org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:180)
\tat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
\tat org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:180)
\tat org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:97)
\tat org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:94)
\tat org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:86)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$2(SparkExecuteStatementOperation.scala:848)
\tat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$1(SparkExecuteStatementOperation.scala:842)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:831)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.compileQuery(SparkExecuteStatementOperation.scala:842)
\tat org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:896)
\t... 16 more
, Query: SHOW columns in hive_metastore.snowplow_schema.events.