🧠 How to Run Your First Notebook in Databricks: From Beginner to Advanced
Whether you’re a data analyst, engineer, scientist, or just a curious tech enthusiast, Databricks offers a powerful platform to explore big data and AI effortlessly using notebooks. In this guide, we’ll walk you through the complete journey — from setting up your environment to mastering advanced notebook workflows in Databricks.

🔰 Table of Contents
- What is a Databricks Notebook?
- Step 1: Set Up Your Databricks Workspace
- Step 2: Create a New Cluster
- Step 3: Create and Open a Notebook
- Step 4: Write Your First Code (Beginner)
- Step 5: Attach Notebook to Cluster
- Step 6: Execute Code Cells
- Step 7: Use Magic Commands and Shortcuts
- Step 8: Import Data into Your Notebook
- Step 9: Visualize Data (Charts/Graphs)
- Step 10: Schedule Notebooks with Jobs
- Step 11: Version Control with Git
- Step 12: Best Practices for Notebook Development
- Final Thoughts
🔎 What is a Databricks Notebook?
A Databricks notebook is a collaborative, interactive environment where you can write code in Python, SQL, R, Scala, or Java, and view results immediately. It’s like Jupyter Notebook but powered by Apache Spark under the hood — perfect for data analytics and machine learning workflows.
✅ Step 1: Set Up Your Databricks Workspace
You can get started with either:
- Databricks Community Edition (Free): https://community.cloud.databricks.com
- Paid cloud workspaces on Azure or AWS
Create an Account:
- Sign up → Verify Email → Log in
⚙️ Step 2: Create a New Cluster
A cluster is a collection of computation resources (Spark jobs run here).
How to create:
- Go to Compute tab
- Click Create Cluster
- Choose:
- Cluster name
- Runtime version (e.g., 14.3 LTS for latest features)
- Autoscaling & node type (select small for testing)
🔄 Wait for the cluster to be in
Running
state before attaching a notebook.
📓 Step 3: Create and Open a Notebook
- Go to Workspace
- Click
Create
→Notebook
- Name your notebook (e.g.,
MyFirstNotebook
) - Select Default Language: Python / SQL / Scala / R
✍️ Step 4: Write Your First Code (Beginner)
Try running a simple Python print statement:
print("Hello, Databricks!")
Or a simple SQL query:
%sql
SELECT "Hello from SQL" AS greeting;
🔗 Step 5: Attach Notebook to Cluster
- At the top of the notebook, click
Detached
→ select your running cluster - Now your code will execute on that cluster
▶️ Step 6: Execute Code Cells
- Press Shift + Enter or click the Run icon on each cell
- You can add multiple cells using the
+
sign - Check output immediately below the cell
🧙 Step 7: Use Magic Commands and Shortcuts
Databricks supports magic commands to switch between languages or shell access:
Command | Purpose |
---|---|
%python | Run cell as Python |
%sql | Run cell as SQL |
%fs | File system commands |
%sh | Shell commands |
%md | Markdown rendering |
Example:
%fs ls /databricks-datasets
Useful Shortcuts:
Esc + A
→ Insert cell aboveEsc + B
→ Insert cell belowCtrl + /
→ Comment/Uncomment
📁 Step 8: Import Data into Your Notebook
Option 1: Upload file manually
- Click Data tab →
Add Data
→Upload File
- Access it using:
df = spark.read.csv("/FileStore/tables/mydata.csv", header=True)
Option 2: Use built-in datasets
df = spark.read.csv("/databricks-datasets/airlines/part-00000", header=True)
📊 Step 9: Visualize Data (Charts/Graphs)
- Run a DataFrame cell
- Click on the “+Visualization” icon
- Choose from:
- Bar Chart
- Line Chart
- Pie Chart
- Scatter Plot
- Map
You can also use:
display(df.groupBy("Year").count())
⏰ Step 10: Schedule Notebooks with Jobs
- Go to Workflows →
Jobs
- Click
Create Job
- Select:
- Notebook path
- Cluster
- Schedule (e.g., Daily at 8 AM)
- Monitor job history from dashboard
This is great for ETL or automated reports.
🧬 Step 11: Version Control with Git
Integrate your notebook with GitHub or Azure DevOps.
Steps:
- Click on Revision History
- Link to a Git provider
- Clone repo and commit directly from the notebook interface
💡 Step 12: Best Practices for Notebook Development
Practice | Description |
---|---|
Use markdown cells | Document your code clearly |
Modularize code | Use functions or %run sub-notebooks |
Avoid hardcoding | Use widgets or config files |
Use checkpoints | Export versions or use Git |
Cache dataframes | For repeated heavy queries |
Clean up resources | Stop unused clusters |
🧾 Final Thoughts
Databricks notebooks combine the power of big data with the simplicity of notebooks — making them a go-to tool for everyone from analysts to AI engineers. Once you’ve mastered your first notebook, explore Delta Lake, MLflow, and Unity Catalog to expand your skills even further.
📌 Bonus: Sample Notebook Flow
Step | Code | Description |
---|---|---|
Load CSV | spark.read.csv() | Load data from FileStore |
Clean Data | df.dropna() | Remove nulls |
Transform | df.withColumn() | Add calculated columns |
Analyze | df.groupBy().agg() | Perform aggregation |
Visualize | display() | Generate charts |
Schedule | Jobs | Automate daily run |
🚀 Ready to launch your first project?
Try building a notebook that:
- Loads a dataset
- Cleans & filters it
- Displays visualizations
- Runs on a schedule
Happy Notebooks! 🧪
Leave a Reply