Mohammad Gufran Jahangir February 10, 2025 0

Introduction

The error MLFLOW001 – MLflow experiment not found occurs when an MLflow experiment ID or path is incorrect, the experiment does not exist, or it was deleted, leading to failures in tracking runs or logging models. This guide explores the common causes and solutions to resolve this error.


Symptoms

  • Error: MLFLOW001: Experiment not found when trying to log a run or retrieve an experiment.
  • Experiment list is empty when running mlflow.list_experiments().
  • MLflow UI does not display the expected experiment.
  • Cannot create or retrieve experiments by name or ID.

Common Causes and Fixes

1. Incorrect Experiment ID or Path

Symptoms:

  • MLFLOW001 occurs when trying to access an experiment using the wrong ID or path.

Causes:

  • The experiment ID or path is incorrect or does not exist.
  • The experiment was renamed, deleted, or not created in the specified location.

Fix:
Verify available experiments:

import mlflow

experiments = mlflow.list_experiments()
for exp in experiments:
    print(f"ID: {exp.experiment_id}, Name: {exp.name}, Location: {exp.artifact_location}")

Ensure you are using the correct experiment ID or path:

mlflow.set_experiment("/Shared/my_experiment")  # Correct experiment path

2. Experiment Deleted or Not Created

Symptoms:

  • MLFLOW001 appears when trying to retrieve an experiment that no longer exists.
  • mlflow.get_experiment_by_name("experiment_name") returns None.

Causes:

  • The experiment was deleted manually or by another user.
  • Experiment creation failed due to permission issues.

Fix:
Check if the experiment exists in the MLflow UI:

  • Go to Databricks UI → Experiments → Search for the experiment.

Recreate the experiment if it does not exist:

mlflow.create_experiment("my_new_experiment", artifact_location="/dbfs/mlflow/artifacts/my_new_experiment")
mlflow.set_experiment("my_new_experiment")

3. Incorrect Artifact Location or Missing Permissions

Symptoms:

  • Cannot log models or artifacts; MLFLOW001 appears.
  • Access denied or missing permissions for the artifact storage location.

Causes:

  • The artifact storage path is incorrect or you lack permission to access it.
  • Cloud storage configuration is incorrect (S3, ADLS, GCS).

Fix:
Check the artifact location in the MLflow experiment:

experiment = mlflow.get_experiment_by_name("my_experiment")
print(f"Artifact location: {experiment.artifact_location}")

Ensure the artifact storage path is accessible:

  • For AWS S3: Check IAM role permissions (s3:PutObject, s3:GetObject).
  • For Azure ADLS: Ensure Storage Blob Data Contributor access is assigned.

4. MLflow Tracking Server Misconfiguration

Symptoms:

  • MLFLOW001 error occurs when running MLflow in a custom tracking server.
  • Cannot connect to the specified tracking URI.

Causes:

  • The MLflow tracking server is not reachable.
  • Incorrect tracking URI configuration.

Fix:
Check the tracking URI configuration:

mlflow.get_tracking_uri()

Set the correct tracking URI:

mlflow.set_tracking_uri("http://<your-tracking-server>:5000")

Ensure the MLflow tracking server is running and accessible:

mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns

5. Workspace Permissions or Access Issues

Symptoms:

  • Some users cannot access specific experiments, while others can.
  • MLFLOW001 error appears only for restricted users.

Causes:

  • Workspace permissions restrict access to the experiment.

Fix:
Check and modify workspace permissions:

  • Go to Databricks Admin Console → Access Control → Experiment Permissions.
  • Grant read, write, or manage permissions as required.

Step-by-Step Troubleshooting Guide

  1. List All Available Experiments mlflow.list_experiments()
  2. Check Experiment by Name or ID experiment = mlflow.get_experiment_by_name("my_experiment") print(experiment)
  3. Verify Artifact Location and Access Permissions experiment = mlflow.get_experiment_by_name("my_experiment") print(f"Artifact location: {experiment.artifact_location}")
  4. Check the MLflow Tracking URI print(mlflow.get_tracking_uri())
  5. Ensure Workspace Permissions Are Correct
    • Check Databricks Admin Console → Permissions.

Best Practices for Avoiding MLFLOW001 Errors

Use Experiment Names and Paths Consistently

  • Avoid using different paths or IDs for the same experiment.

Monitor Artifact Storage Permissions

  • Ensure that the artifact storage is always accessible.

Set the Correct Tracking URI for Custom Tracking Servers

  • Use the correct endpoint for external MLflow servers.

Regularly Backup Experiment Metadata

  • Backup metadata to avoid accidental deletion.

Conclusion

The MLFLOW001 – MLflow experiment not found error can occur for several reasons, including incorrect experiment paths, missing permissions, or misconfigured tracking servers. By following the troubleshooting steps and applying best practices, you can ensure smooth experiment tracking and artifact management in MLflow.

Category: 
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments