Mohammad Gufran Jahangir August 17, 2025 0

What you’ll learn

  • What Unity Catalog functions are and why to use them
  • The difference between scalar and table functions
  • How to create, test, grant, and use functions from notebooks or SQL Warehouses
  • A minimal permissions model (EXECUTE)
  • Tips for versioning, governance, and troubleshooting

Prereqs: Unity Catalog enabled, a catalog/schema you can write to, and a SQL Warehouse or notebook attached to a UC‑enabled cluster.


Quick definitions

  • Function (UDF): Reusable logic you register in Unity Catalog so everyone can call it. Lives at catalog.schema.function_name.
  • Scalar function: Takes columns/values → returns one value per input row.
  • Table function: Takes parameters → returns a result set (rows) that you can SELECT * FROM.

Create your first scalar function (SQL)

Mask personally identifiable info (PII) like email addresses.

USE CATALOG demo;
USE SCHEMA util;

CREATE OR REPLACE FUNCTION demo.util.mask_email(email STRING)
RETURNS STRING
RETURN regexp_replace(email, '(^.).*(@.*$)', '$1***$2');

Use it:

SELECT email, demo.util.mask_email(email) AS masked
FROM demo.curated.customers
LIMIT 5;

Why UC? Once registered, the same function is available from notebooks, jobs, and SQL Warehouses with consistent governance.


Python scalar function (optional)

If you prefer Python for specialized logic:

CREATE OR REPLACE FUNCTION demo.util.slugify(s STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
import re
s = s or ''
s = re.sub(r'\s+', '-', s.strip().lower())
s = re.sub(r'[^a-z0-9\-]', '', s)
return s
$$;

Call it:

SELECT demo.util.slugify('Hello Databricks!') AS slug;

Python scalar UDFs require supported runtimes/warehouses. If your workspace blocks Python UDFs, stick with SQL UDFs.


Create a table function (SQL)

Generate a small date dimension on the fly.

CREATE OR REPLACE FUNCTION demo.util.date_range(start DATE, stop DATE)
RETURNS TABLE (d DATE)
RETURN SELECT sequence(start, stop, interval 1 day) AS d;  -- returns an array column

Flatten the array to rows when querying:

SELECT explode(d) AS d
FROM demo.util.date_range('2025-01-01', '2025-01-07');

Another example: split a CSV string to rows.

CREATE OR REPLACE FUNCTION demo.util.split_csv(s STRING)
RETURNS TABLE (item STRING)
RETURN SELECT trim(x) FROM (SELECT explode(split(s, ',')) AS x);

SELECT * FROM demo.util.split_csv('a, b, c');

Permissions that matter (EXECUTE)

To let a team call functions in a schema:

-- allow browsing the container
GRANT USE CATALOG ON CATALOG demo TO `analysts`;
GRANT USE SCHEMA  ON SCHEMA  demo.util TO `analysts`;

-- allow calling all current & future functions in this schema
GRANT EXECUTE ON SCHEMA demo.util TO `analysts`;

Grant EXECUTE on a single function instead:

GRANT EXECUTE ON FUNCTION demo.util.mask_email TO `analysts`;

Ownership: The object owner has full rights and can delegate with MANAGE.


Testing & troubleshooting

  • 3‑part names: Always reference catalog.schema.function unless your editor has those selected.
  • Function not found: Check USE CATALOG/SCHEMA, spelling, and privileges.
  • Type errors: Cast inputs (CAST(x AS …)) or simplify expressions until they compile.
  • Version safely: Use CREATE OR REPLACE FUNCTION to update logic; keep a v2_… copy if you need a staged rollout.

Patterns you’ll reuse

1) Column masking (SQL UDF + column mask)

CREATE OR REPLACE FUNCTION demo.util.mask_phone(p STRING)
RETURNS STRING
RETURN concat('(***)***-', right(regexp_replace(p,'\D',''), 4));

ALTER TABLE demo.curated.customers
ALTER COLUMN phone SET MASK demo.util.mask_phone;

2) Row‑level security with a predicate function

CREATE OR REPLACE FUNCTION demo.util.is_my_region(rgn STRING)
RETURNS BOOLEAN
RETURN rgn = current_user_region();  -- your own resolver

ALTER TABLE demo.curated.orders
SET ROW FILTER demo.util.is_my_region ON (region);

3) Reusable parsing helper

CREATE OR REPLACE FUNCTION demo.util.json_get(s STRING, path STRING)
RETURNS STRING
RETURN get_json_object(s, path);

Cheat‑sheet

  • Create scalar (SQL): CREATE OR REPLACE FUNCTION cat.sch.fn(arg TYPE, ...) RETURNS TYPE RETURN <sql-expression>;
  • Create scalar (Python): CREATE OR REPLACE FUNCTION cat.sch.fn(arg TYPE, ...) RETURNS TYPE LANGUAGE PYTHON AS $$ <python code> $$;
  • Create table function: CREATE OR REPLACE FUNCTION cat.sch.fn(args...) RETURNS TABLE (col TYPE, ...) RETURN <select-statement>;
  • Call: SELECT cat.sch.fn(col1, ...) (scalar) or SELECT * FROM cat.sch.fn(args) (table).
  • Grant EXECUTE: on the schema to cover all functions, or on a single function.

Wrap‑up

Unity Catalog functions let you centralize business logic—masking, parsing, and reusable transforms—so every team can invoke the same audited implementation from SQL or notebooks. Start with a small util schema, add a few helpers, and grant EXECUTE to your analyst/engineer groups.

Category: 
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments