Mohammad Gufran Jahangir September 30, 2025 0

Here’s a simple, line-by-line summary of the Serverless compute limitations (09/29/2025):


General limitations

  • No Scala or R support → only Python + SQL.
  • Only Spark Connect APIs work → no Spark RDD APIs.
  • No JAR libraries allowed.
  • All workspace users can use serverless compute.
  • Notebook tags not supported → use budget policies instead.
  • ANSI SQL is default → turn off with spark.sql.ansi.enabled=false.
  • No Databricks Container Services.
  • Query timeout: max 9000 sec (~2.5 hrs) per query in notebooks (configurable). Jobs don’t have this limit.
  • Must use Unity Catalog to connect to external data.
  • UDFs cannot access the internet → no CREATE FUNCTION (External).
  • Max row size = 128 MB when creating DataFrame from local data.
  • No Spark UI → use Query Profile instead.
  • No Spark logs → only client-side logs available.
  • Cross-workspace access allowed only in same region, and if no IP ACL/PrivateLink.
  • No global temp views → use session temp views or tables.
  • No Maven coordinates.

Streaming limitations

  • Only Trigger.AvailableNow supported → no default/time-based triggers.
  • All other Standard access mode streaming limits apply.

Machine learning limitations

  • No Spark MLlib.
  • No GPU support.

Notebook limitations

  • Notebook-scoped libraries not cached between sessions.
  • Sharing TEMP tables/views across users not supported.
  • No autocomplete / Variable Explorer for DataFrames.
  • Default save format = .ipynb → if saved in source format, metadata may break.

Job limitations

  • Driver size is fixed (cannot resize).
  • Task logs not isolated → logs from multiple tasks mixed together.
  • No task libraries for notebook tasks → use notebook-scoped libraries.

Compute-specific limitations

  • Not supported: compute policies, init scripts, compute-scoped libraries, instance pools, event logs.
  • Most Spark configs blocked → only supported list allowed.
  • No environment variables → use widgets for parameters.

Caching limitations

  • No cache/un-cache APIs:
    • df.cache(), df.persist(), df.unpersist()
    • spark.catalog.cacheTable(), uncacheTable(), clearCache()
    • SQL: CACHE TABLE, UNCACHE TABLE, REFRESH TABLE, CLEAR CACHE

Hive limitations

  • No Hive SerDe tables and LOAD DATA into them.
  • Supported file types: Avro, CSV, Delta, JSON, Kafka, ORC, Parquet, Text, XML, BinaryFile.
  • No Hive variables (${env:var}, ${configName}, etc.).
    • Instead use: DECLARE VARIABLE, SET VARIABLE, session variables, or IDENTIFIER.

Supported data sources

Write/Update/Delete: CSV, JSON, AVRO, DELTA, KAFKA, PARQUET, ORC, TEXT, UNITY_CATALOG, BINARYFILE, XML, SIMPLESCAN, ICEBERG.

Read: All above plus: MySQL, PostgreSQL, SQL Server, Redshift, Snowflake, Synapse (SQLDW), BigQuery, Oracle, Salesforce, Salesforce Data Cloud, Teradata, Workday RaaS, MongoDB, Databricks.


👉 In short: Serverless is Python + SQL only, works best with Unity Catalog + Delta/Parquet data, but has no JARs, no caching, no Spark UI, no MLlib/GPUs, and strict config limits.


Category: 
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments