databricks-logo

MLflow Tracing + Assessments tutorial

(Python)
Loading...

MLflow Tracing & Feedback with tool calling agents

This short tutorial demonstrates how MLflow can capture detailed traces of a LangChain tool-calling agent as it solves mathematical problems, showcasing MLflow's ability to trace agent execution and store feedback about the agent's response. Feedback, which is logged using MLflow's log_feedback API, is very helpful for measuring and improving the quality of an agent.

2
%pip install --upgrade mlflow>=2.21.0 databricks-langchain langchain-community langchain databricks-ai-bridge
Collecting mlflow>=2.21.0 Downloading mlflow-2.21.1-py3-none-any.whl (28.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 28.2/28.2 MB 34.4 MB/s eta 0:00:00 Collecting databricks-langchain Downloading databricks_langchain-0.4.0-py3-none-any.whl (21 kB) Collecting langchain-community Downloading langchain_community-0.3.20-py3-none-any.whl (2.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 34.8 MB/s eta 0:00:00 Collecting langchain Downloading langchain-0.3.21-py3-none-any.whl (1.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 92.1 MB/s eta 0:00:00 Collecting databricks-ai-bridge Downloading databricks_ai_bridge-0.4.0-py3-none-any.whl (13 kB) Collecting mlflow-skinny==2.21.1 Downloading mlflow_skinny-2.21.1-py3-none-any.whl (6.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.1/6.1 MB 64.8 MB/s eta 0:00:00 Requirement already satisfied: Jinja2<4,>=2.11 in /databricks/python3/lib/python3.10/site-packages (from mlflow>=2.21.0) (3.1.2) Collecting Flask<4 Downloading flask-3.1.0-py3-none-any.whl (102 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.0/103.0 kB 16.3 MB/s eta 0:00:00 Collecting alembic!=1.10.0,<2 Downloading alembic-1.15.1-py3-none-any.whl (231 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 231.8/231.8 kB 40.4 MB/s eta 0:00:00 Collecting markdown<4,>=3.3 Downloading Markdown-3.7-py3-none-any.whl (106 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 106.3/106.3 kB 26.0 MB/s eta 0:00:00 Requirement already satisfied: matplotlib<4 in /databricks/python3/lib/python3.10/site-packages (from mlflow>=2.21.0) (3.7.0) Collecting docker<8,>=4.0.0 Downloading docker-7.1.0-py3-none-any.whl (147 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 147.8/147.8 kB 27.4 MB/s eta 0:00:00 Requirement already satisfied: pandas<3 in /databricks/python3/lib/python3.10/site-packages (from mlflow>=2.21.0) (1.5.3) Requirement already satisfied: pyarrow<20,>=4.0.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow>=2.21.0) (8.0.0) Requirement already satisfied: scipy<2 in /databricks/python3/lib/python3.10/site-packages (from mlflow>=2.21.0) (1.10.0) Collecting graphene<4 Downloading graphene-3.4.3-py2.py3-none-any.whl (114 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 114.9/114.9 kB 18.0 MB/s eta 0:00:00 Collecting sqlalchemy<3,>=1.4.0 Downloading sqlalchemy-2.0.39-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 98.0 MB/s eta 0:00:00 Requirement already satisfied: scikit-learn<2 in /databricks/python3/lib/python3.10/site-packages (from mlflow>=2.21.0) (1.1.1) Requirement already satisfied: numpy<3 in /databricks/python3/lib/python3.10/site-packages (from mlflow>=2.21.0) (1.23.5) Collecting gunicorn<24 Downloading gunicorn-23.0.0-py3-none-any.whl (85 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.0/85.0 kB 7.1 MB/s eta 0:00:00 Collecting pydantic<3,>=1.10.8 Downloading pydantic-2.10.6-py3-none-any.whl (431 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 431.7/431.7 kB 35.7 MB/s eta 0:00:00 Collecting opentelemetry-sdk<3,>=1.9.0 Downloading opentelemetry_sdk-1.31.1-py3-none-any.whl (118 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 118.9/118.9 kB 17.1 MB/s eta 0:00:00 Collecting sqlparse<1,>=0.4.0 Downloading sqlparse-0.5.3-py3-none-any.whl (44 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.4/44.4 kB 8.9 MB/s eta 0:00:00 Requirement already satisfied: typing-extensions<5,>=4.0.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow-skinny==2.21.1->mlflow>=2.21.0) (4.4.0) Requirement already satisfied: importlib_metadata!=4.7.0,<9,>=3.7.0 in /usr/lib/python3/dist-packages (from mlflow-skinny==2.21.1->mlflow>=2.21.0) (4.6.4) Collecting opentelemetry-api<3,>=1.9.0 Downloading opentelemetry_api-1.31.1-py3-none-any.whl (65 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.2/65.2 kB 4.8 MB/s eta 0:00:00 Requirement already satisfied: protobuf<6,>=3.12.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow-skinny==2.21.1->mlflow>=2.21.0) (4.25.5) Requirement already satisfied: databricks-sdk<1,>=0.20.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow-skinny==2.21.1->mlflow>=2.21.0) (0.20.0) Requirement already satisfied: click<9,>=7.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow-skinny==2.21.1->mlflow>=2.21.0) (8.0.4) Collecting fastapi<1 Downloading fastapi-0.115.12-py3-none-any.whl (95 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 95.2/95.2 kB 9.5 MB/s eta 0:00:00 Collecting cloudpickle<4 Downloading cloudpickle-3.1.1-py3-none-any.whl (20 kB) Requirement already satisfied: packaging<25 in /databricks/python3/lib/python3.10/site-packages (from mlflow-skinny==2.21.1->mlflow>=2.21.0) (23.2) Requirement already satisfied: requests<3,>=2.17.3 in /databricks/python3/lib/python3.10/site-packages (from mlflow-skinny==2.21.1->mlflow>=2.21.0) (2.28.1) Collecting uvicorn<1 Downloading uvicorn-0.34.0-py3-none-any.whl (62 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.3/62.3 kB 2.0 MB/s eta 0:00:00 Collecting pyyaml<7,>=5.1 Downloading PyYAML-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (751 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 751.2/751.2 kB 68.4 MB/s eta 0:00:00 Collecting gitpython<4,>=3.1.9 Downloading GitPython-3.1.44-py3-none-any.whl (207 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 207.6/207.6 kB 38.2 MB/s eta 0:00:00 Requirement already satisfied: cachetools<6,>=5.0.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow-skinny==2.21.1->mlflow>=2.21.0) (5.3.2) Collecting unitycatalog-langchain[databricks]>=0.1.1 Downloading unitycatalog_langchain-0.2.0-py3-none-any.whl (5.4 kB) Collecting databricks-vectorsearch>=0.50 Downloading databricks_vectorsearch-0.53-py3-none-any.whl (13 kB) Collecting dataclasses-json<0.7,>=0.5.7 Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB) Collecting aiohttp<4.0.0,>=3.8.3 Downloading aiohttp-3.11.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 30.1 MB/s eta 0:00:00 Requirement already satisfied: tenacity!=8.4.0,<10,>=8.1.0 in /databricks/python3/lib/python3.10/site-packages (from langchain-community) (8.1.0) Collecting httpx-sse<1.0.0,>=0.4.0 Downloading httpx_sse-0.4.0-py3-none-any.whl (7.8 kB) Collecting pydantic-settings<3.0.0,>=2.4.0 Downloading pydantic_settings-2.8.1-py3-none-any.whl (30 kB) Collecting langsmith<0.4,>=0.1.125 Downloading langsmith-0.3.18-py3-none-any.whl (351 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 351.9/351.9 kB 54.2 MB/s eta 0:00:00 Collecting langchain-core<1.0.0,>=0.3.45 Downloading langchain_core-0.3.48-py3-none-any.whl (418 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 418.7/418.7 kB 14.6 MB/s eta 0:00:00 Collecting numpy<3 Downloading numpy-2.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.4/16.4 MB 58.2 MB/s eta 0:00:00 Collecting async-timeout<5.0.0,>=4.0.0 Downloading async_timeout-4.0.3-py3-none-any.whl (5.7 kB) Collecting langchain-text-splitters<1.0.0,>=0.3.7 Downloading langchain_text_splitters-0.3.7-py3-none-any.whl (32 kB) Collecting tabulate Downloading tabulate-0.9.0-py3-none-any.whl (35 kB) Collecting databricks-sdk<1,>=0.20.0 Downloading databricks_sdk-0.47.0-py3-none-any.whl (681 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 681.0/681.0 kB 97.5 MB/s eta 0:00:00 Collecting tiktoken>=0.8.0 Downloading tiktoken-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 70.9 MB/s eta 0:00:00 Collecting aiohappyeyeballs>=2.3.0 Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB) Requirement already satisfied: attrs>=17.3.0 in /databricks/python3/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (22.1.0) Collecting aiosignal>=1.1.2 Downloading aiosignal-1.3.2-py2.py3-none-any.whl (7.6 kB) Collecting multidict<7.0,>=4.5 Downloading multidict-6.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (129 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.8/129.8 kB 24.5 MB/s eta 0:00:00 Collecting propcache>=0.2.0 Downloading propcache-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (205 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 205.4/205.4 kB 16.0 MB/s eta 0:00:00 Collecting frozenlist>=1.1.1 Downloading frozenlist-1.5.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (241 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 241.9/241.9 kB 50.4 MB/s eta 0:00:00 Collecting yarl<2.0,>=1.17.0 Downloading yarl-1.18.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (319 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 319.7/319.7 kB 50.5 MB/s eta 0:00:00 Collecting typing-extensions<5,>=4.0.0 Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB) Collecting Mako Downloading Mako-1.3.9-py3-none-any.whl (78 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 19.6 MB/s eta 0:00:00 Requirement already satisfied: google-auth~=2.0 in /databricks/python3/lib/python3.10/site-packages (from databricks-sdk<1,>=0.20.0->mlflow-skinny==2.21.1->mlflow>=2.21.0) (2.28.1) Collecting deprecation>=2 Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB) Collecting marshmallow<4.0.0,>=3.18.0 Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.9/50.9 kB 9.4 MB/s eta 0:00:00 Collecting typing-inspect<1,>=0.4.0 Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) Requirement already satisfied: urllib3>=1.26.0 in /databricks/python3/lib/python3.10/site-packages (from docker<8,>=4.0.0->mlflow>=2.21.0) (1.26.14) Collecting click<9,>=7.0 Downloading click-8.1.8-py3-none-any.whl (98 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.2/98.2 kB 28.2 MB/s eta 0:00:00 Collecting itsdangerous>=2.2 Downloading itsdangerous-2.2.0-py3-none-any.whl (16 kB) Collecting blinker>=1.9 Downloading blinker-1.9.0-py3-none-any.whl (8.5 kB) Collecting Werkzeug>=3.1 Downloading werkzeug-3.1.3-py3-none-any.whl (224 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 224.5/224.5 kB 47.9 MB/s eta 0:00:00 Collecting graphql-relay<3.3,>=3.1 Downloading graphql_relay-3.2.0-py3-none-any.whl (16 kB) Collecting graphql-core<3.3,>=3.1 Downloading graphql_core-3.2.6-py3-none-any.whl (203 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 203.4/203.4 kB 50.0 MB/s eta 0:00:00 Requirement already satisfied: python-dateutil<3,>=2.7.0 in /databricks/python3/lib/python3.10/site-packages (from graphene<4->mlflow>=2.21.0) (2.8.2) Requirement already satisfied: MarkupSafe>=2.0 in /databricks/python3/lib/python3.10/site-packages (from Jinja2<4,>=2.11->mlflow>=2.21.0) (2.1.1) Collecting jsonpatch<2.0,>=1.33 Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB) Collecting zstandard<0.24.0,>=0.23.0 Downloading zstandard-0.23.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 104.7 MB/s eta 0:00:00 Collecting requests-toolbelt<2.0.0,>=1.0.0 Downloading requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.5/54.5 kB 11.8 MB/s eta 0:00:00 Collecting httpx<1,>=0.23.0 Downloading httpx-0.28.1-py3-none-any.whl (73 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73.5/73.5 kB 19.0 MB/s eta 0:00:00 Collecting orjson<4.0.0,>=3.9.14 Downloading orjson-3.10.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (132 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 132.8/132.8 kB 31.6 MB/s eta 0:00:00 Requirement already satisfied: cycler>=0.10 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.21.0) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.21.0) (4.25.0) Requirement already satisfied: pyparsing>=2.3.1 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.21.0) (3.0.9) Requirement already satisfied: contourpy>=1.0.1 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.21.0) (1.0.5) Requirement already satisfied: pillow>=6.2.0 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.21.0) (9.4.0) Requirement already satisfied: kiwisolver>=1.0.1 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.21.0) (1.4.4) Requirement already satisfied: pytz>=2020.1 in /databricks/python3/lib/python3.10/site-packages (from pandas<3->mlflow>=2.21.0) (2022.7) Collecting annotated-types>=0.6.0 Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB) Collecting pydantic-core==2.27.2 Downloading pydantic_core-2.27.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 55.7 MB/s eta 0:00:00 Collecting python-dotenv>=0.21.0 Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB) Requirement already satisfied: charset-normalizer<3,>=2 in /databricks/python3/lib/python3.10/site-packages (from requests<3,>=2.17.3->mlflow-skinny==2.21.1->mlflow>=2.21.0) (2.0.4) Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.10/site-packages (from requests<3,>=2.17.3->mlflow-skinny==2.21.1->mlflow>=2.21.0) (2022.12.7) Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.10/site-packages (from requests<3,>=2.17.3->mlflow-skinny==2.21.1->mlflow>=2.21.0) (3.4) Requirement already satisfied: joblib>=1.0.0 in /databricks/python3/lib/python3.10/site-packages (from scikit-learn<2->mlflow>=2.21.0) (1.2.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /databricks/python3/lib/python3.10/site-packages (from scikit-learn<2->mlflow>=2.21.0) (2.2.0) Collecting numpy<3 Downloading numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 60.6 MB/s eta 0:00:00 Collecting greenlet!=0.4.17 Downloading greenlet-3.1.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (599 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 599.5/599.5 kB 81.1 MB/s eta 0:00:00 Collecting regex>=2022.1.18 Downloading regex-2024.11.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (781 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 781.7/781.7 kB 91.6 MB/s eta 0:00:00 Collecting unitycatalog-ai Downloading unitycatalog_ai-0.3.0-py3-none-any.whl (65 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.2/65.2 kB 6.0 MB/s eta 0:00:00 Collecting starlette<0.47.0,>=0.40.0 Downloading starlette-0.46.1-py3-none-any.whl (71 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.0/72.0 kB 17.8 MB/s eta 0:00:00 Collecting gitdb<5,>=4.0.1 Downloading gitdb-4.0.12-py3-none-any.whl (62 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 kB 14.4 MB/s eta 0:00:00 Requirement already satisfied: rsa<5,>=3.1.4 in /databricks/python3/lib/python3.10/site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.21.1->mlflow>=2.21.0) (4.9) Requirement already satisfied: pyasn1-modules>=0.2.1 in /databricks/python3/lib/python3.10/site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.21.1->mlflow>=2.21.0) (0.3.0) Requirement already satisfied: anyio in /databricks/python3/lib/python3.10/site-packages (from httpx<1,>=0.23.0->langsmith<0.4,>=0.1.125->langchain-community) (3.5.0) Collecting httpcore==1.* Downloading httpcore-1.0.7-py3-none-any.whl (78 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.6/78.6 kB 20.0 MB/s eta 0:00:00 Collecting h11<0.15,>=0.13 Downloading h11-0.14.0-py3-none-any.whl (58 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 15.4 MB/s eta 0:00:00 Collecting jsonpointer>=1.9 Downloading jsonpointer-3.0.0-py2.py3-none-any.whl (7.6 kB) Collecting importlib_metadata!=4.7.0,<9,>=3.7.0 Downloading importlib_metadata-8.6.1-py3-none-any.whl (26 kB) Collecting deprecated>=1.2.6 Downloading Deprecated-1.2.18-py2.py3-none-any.whl (10.0 kB) Collecting zipp>=3.20 Downloading zipp-3.21.0-py3-none-any.whl (9.6 kB) Collecting opentelemetry-semantic-conventions==0.52b1 Downloading opentelemetry_semantic_conventions-0.52b1-py3-none-any.whl (183 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 183.4/183.4 kB 46.0 MB/s eta 0:00:00 Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil<3,>=2.7.0->graphene<4->mlflow>=2.21.0) (1.16.0) Requirement already satisfied: mypy-extensions>=0.3.0 in /databricks/python3/lib/python3.10/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community) (0.4.3) Requirement already satisfied: nest-asyncio in /databricks/python3/lib/python3.10/site-packages (from unitycatalog-ai->unitycatalog-langchain[databricks]>=0.1.1->databricks-langchain) (1.5.6) Collecting unitycatalog-client Downloading unitycatalog_client-0.2.1-py3-none-any.whl (131 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 131.8/131.8 kB 16.3 MB/s eta 0:00:00 Collecting databricks-connect>=15.1.0 Downloading databricks_connect-16.1.2-py2.py3-none-any.whl (2.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 52.5 MB/s eta 0:00:00 Requirement already satisfied: py4j==0.10.9.7 in /databricks/python3/lib/python3.10/site-packages (from databricks-connect>=15.1.0->unitycatalog-ai->unitycatalog-langchain[databricks]>=0.1.1->databricks-langchain) (0.10.9.7) Requirement already satisfied: grpcio-status>=1.59.3 in /databricks/python3/lib/python3.10/site-packages (from databricks-connect>=15.1.0->unitycatalog-ai->unitycatalog-langchain[databricks]>=0.1.1->databricks-langchain) (1.62.0) Requirement already satisfied: grpcio>=1.59.3 in /databricks/python3/lib/python3.10/site-packages (from databricks-connect>=15.1.0->unitycatalog-ai->unitycatalog-langchain[databricks]>=0.1.1->databricks-langchain) (1.62.0) Collecting setuptools>=68.0.0 Downloading setuptools-78.0.2-py3-none-any.whl (1.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 26.3 MB/s eta 0:00:00 Requirement already satisfied: googleapis-common-protos>=1.56.4 in /databricks/python3/lib/python3.10/site-packages (from databricks-connect>=15.1.0->unitycatalog-ai->unitycatalog-langchain[databricks]>=0.1.1->databricks-langchain) (1.62.0) Collecting wrapt<2,>=1.10 Downloading wrapt-1.17.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (82 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 82.8/82.8 kB 5.1 MB/s eta 0:00:00 Collecting smmap<6,>=3.0.1 Downloading smmap-5.0.2-py3-none-any.whl (24 kB) Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /databricks/python3/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.21.1->mlflow>=2.21.0) (0.5.1) Collecting anyio Downloading anyio-4.9.0-py3-none-any.whl (100 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.9/100.9 kB 11.1 MB/s eta 0:00:00 Requirement already satisfied: sniffio>=1.1 in /databricks/python3/lib/python3.10/site-packages (from anyio->httpx<1,>=0.23.0->langsmith<0.4,>=0.1.125->langchain-community) (1.2.0) Collecting exceptiongroup>=1.0.2 Downloading exceptiongroup-1.2.2-py3-none-any.whl (16 kB) Collecting aiohttp-retry>=2.8.3 Downloading aiohttp_retry-2.9.1-py3-none-any.whl (10.0 kB) Installing collected packages: zstandard, zipp, wrapt, Werkzeug, typing-extensions, tabulate, sqlparse, smmap, setuptools, regex, pyyaml, python-dotenv, propcache, orjson, numpy, marshmallow, markdown, Mako, jsonpointer, itsdangerous, httpx-sse, h11, gunicorn, greenlet, graphql-core, frozenlist, exceptiongroup, deprecation, cloudpickle, click, blinker, async-timeout, annotated-types, aiohappyeyeballs, uvicorn, typing-inspect, tiktoken, sqlalchemy, requests-toolbelt, pydantic-core, multidict, jsonpatch, importlib_metadata, httpcore, graphql-relay, gitdb, Flask, docker, deprecated, anyio, aiosignal, yarl, starlette, pydantic, opentelemetry-api, httpx, graphene, gitpython, dataclasses-json, databricks-sdk, alembic, pydantic-settings, opentelemetry-semantic-conventions, langsmith, fastapi, databricks-connect, aiohttp, opentelemetry-sdk, langchain-core, aiohttp-retry, unitycatalog-client, mlflow-skinny, langchain-text-splitters, unitycatalog-ai, mlflow, langchain, databricks-vectorsearch, databricks-ai-bridge, langchain-community, unitycatalog-langchain, databricks-langchain Attempting uninstall: zipp Found existing installation: zipp 1.0.0 Not uninstalling zipp at /usr/lib/python3/dist-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'zipp'. No files were found to uninstall. Attempting uninstall: typing-extensions Found existing installation: typing_extensions 4.4.0 Not uninstalling typing-extensions at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'typing_extensions'. No files were found to uninstall. Attempting uninstall: setuptools Found existing installation: setuptools 65.5.1 Not uninstalling setuptools at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'setuptools'. No files were found to uninstall. Attempting uninstall: numpy Found existing installation: numpy 1.23.5 Not uninstalling numpy at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'numpy'. No files were found to uninstall. Attempting uninstall: click Found existing installation: click 8.0.4 Not uninstalling click at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'click'. No files were found to uninstall. Attempting uninstall: blinker Found existing installation: blinker 1.4 Not uninstalling blinker at /usr/lib/python3/dist-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'blinker'. No files were found to uninstall. Attempting uninstall: importlib_metadata Found existing installation: importlib-metadata 4.6.4 Not uninstalling importlib-metadata at /usr/lib/python3/dist-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'importlib-metadata'. No files were found to uninstall. Attempting uninstall: anyio Found existing installation: anyio 3.5.0 Not uninstalling anyio at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'anyio'. No files were found to uninstall. Attempting uninstall: pydantic Found existing installation: pydantic 1.10.6 Not uninstalling pydantic at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'pydantic'. No files were found to uninstall. Attempting uninstall: databricks-sdk Found existing installation: databricks-sdk 0.20.0 Not uninstalling databricks-sdk at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'databricks-sdk'. No files were found to uninstall. Attempting uninstall: databricks-connect Found existing installation: databricks-connect 14.3.7 Not uninstalling databricks-connect at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-57ad55c2-8b17-4be9-98c3-b184c0929f6e Can't uninstall 'databricks-connect'. No files were found to uninstall. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. jupyter-server 1.23.4 requires anyio<4,>=3.1.0, but you have anyio 4.9.0 which is incompatible. Successfully installed Flask-3.1.0 Mako-1.3.9 Werkzeug-3.1.3 aiohappyeyeballs-2.6.1 aiohttp-3.11.14 aiohttp-retry-2.9.1 aiosignal-1.3.2 alembic-1.15.1 annotated-types-0.7.0 anyio-4.9.0 async-timeout-4.0.3 blinker-1.9.0 click-8.1.8 cloudpickle-3.1.1 databricks-ai-bridge-0.4.0 databricks-connect-16.1.2 databricks-langchain-0.4.0 databricks-sdk-0.47.0 databricks-vectorsearch-0.53 dataclasses-json-0.6.7 deprecated-1.2.18 deprecation-2.1.0 docker-7.1.0 exceptiongroup-1.2.2 fastapi-0.115.12 frozenlist-1.5.0 gitdb-4.0.12 gitpython-3.1.44 graphene-3.4.3 graphql-core-3.2.6 graphql-relay-3.2.0 greenlet-3.1.1 gunicorn-23.0.0 h11-0.14.0 httpcore-1.0.7 httpx-0.28.1 httpx-sse-0.4.0 importlib_metadata-8.6.1 itsdangerous-2.2.0 jsonpatch-1.33 jsonpointer-3.0.0 langchain-0.3.21 langchain-community-0.3.20 langchain-core-0.3.48 langchain-text-splitters-0.3.7 langsmith-0.3.18 markdown-3.7 marshmallow-3.26.1 mlflow-2.21.1 mlflow-skinny-2.21.1 multidict-6.2.0 numpy-1.26.4 opentelemetry-api-1.31.1 opentelemetry-sdk-1.31.1 opentelemetry-semantic-conventions-0.52b1 orjson-3.10.16 propcache-0.3.0 pydantic-2.10.6 pydantic-core-2.27.2 pydantic-settings-2.8.1 python-dotenv-1.1.0 pyyaml-6.0.2 regex-2024.11.6 requests-toolbelt-1.0.0 setuptools-78.0.2 smmap-5.0.2 sqlalchemy-2.0.39 sqlparse-0.5.3 starlette-0.46.1 tabulate-0.9.0 tiktoken-0.9.0 typing-extensions-4.12.2 typing-inspect-0.9.0 unitycatalog-ai-0.3.0 unitycatalog-client-0.2.1 unitycatalog-langchain-0.2.0 uvicorn-0.34.0 wrapt-1.17.2 yarl-1.18.3 zipp-3.21.0 zstandard-0.23.0 Note: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.
3
dbutils.library.restartPython()

Define Mathematical Tools with MLflow Tracing

In this section we define two mathematical tools (addition and multiplication) and instrument them using MLflow's tracing decorator. These functions log their inputs and outputs as MLflow spans (of type TOOL).

5
def add_numbers(input_str: str) -> str:
    """
    Adds two numbers provided as a space-separated string.
    Example input: "2 3"
    """
    a_str, b_str = input_str.split()
    result = int(a_str) + int(b_str)
    return str(result)

def multiply_numbers(input_str: str) -> str:
    """
    Multiplies two numbers provided as a space-separated string.
    Example input: "4 5"
    """
    a_str, b_str = input_str.split()
    result = int(a_str) * int(b_str)
    return str(result)

# Wrap these functions as LangChain Tools
from langchain.agents import Tool

tools = [
    Tool(name="Addition", func=add_numbers, description="Add two numbers. Input format: 'a b', where a and b are floats or integers"),
    Tool(name="Multiplication", func=multiply_numbers, description="Multiply two numbers. Input format: 'a b', where a and b are floats or integers")
]

Define the LangChain Agent using a Real LLM

We now configure a tool–calling agent using the Databricks LangChain community package. This agent uses the real LLM "databricks-meta-llama-3-3-70b-instruct" via ChatDatabricks. The agent uses a Zero-Shot ReAct strategy for reasoning.

Note: Replace the endpoint parameter with your actual Databricks endpoint.

7
from langchain.agents import initialize_agent
from databricks_langchain import ChatDatabricks

llm = ChatDatabricks(
    endpoint="databricks-meta-llama-3-3-70b-instruct",
    temperature=0.1,
)

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", handle_parsing_errors=True)
/home/spark-57ad55c2-8b17-4be9-98c3-b1/.ipykernel/8037/command-2417135134591884-2699065558:9: LangChainDeprecationWarning: LangChain agents will continue to be supported, but it is recommended for new use cases to be built with LangGraph. LangGraph offers a more flexible and full-featured framework for building agents, including support for tool-calling, persistence of state, and human-in-the-loop workflows. For details, refer to the `LangGraph documentation <https://langchain-ai.github.io/langgraph/>`_ as well as guides for `Migrating from AgentExecutor <https://python.langchain.com/docs/how_to/migrate_agent/>`_ and LangGraph's `Pre-built ReAct agent <https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/>`_. agent = initialize_agent(tools, llm, agent="zero-shot-react-description", handle_parsing_errors=True)

Scenario 1: LLM Reasoning with MLflow Tracing and Feedback - Simple mathematical problem

In this scenario, the agent is expected to correctly solve a simple mathematical problem. The correct reasoning should produce the answer "14" for the expression "2 + 3 * 4" (multiplication first).

Submit the problem to the agent and obtain its answer

All operations that the agent performs to solve the problem are captured in an MLflow Trace.

10
import mlflow

# Enable MLflow Tracing for LangChain
mlflow.langchain.autolog()

problem = "What is 2 + 3 * 4?"
answer = agent.run(problem)
/home/spark-57ad55c2-8b17-4be9-98c3-b1/.ipykernel/8037/command-2417135134591887-142160146:7: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. answer = agent.run(problem)
Trace(request_id=tr-33be136cbb5f4b218b59b416576f1f01)

Assess the correctness of the agent's answer

The agent's answer is checked, and corresponding correctness feedback (True = correct, False = incorrect) is added to the Trace as an MLflow Assessment.

12
from mlflow.entities import AssessmentSource, AssessmentSourceType

trace_id = mlflow.get_last_active_trace().info.request_id
if "14" in answer:
    mlflow.log_feedback(
        trace_id=trace_id,
        name="correctness",
        value=True,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id="my-llm-judge-version-1"
        ),
        rationale="The answer 14 is correct for the given expression."
    )
    print(f"Logged correct feedback for trace {trace_id}")
else:
    mlflow.log_feedback(
        trace_id=trace_id,
        name="correctness",
        value=False,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id="my-llm-judge-version-1"
        ),
        rationale=f"The answer {answer} is incorrect. The correct answer is 14"
    )
    print(f"Logged correct feedback for trace {trace_id}")
Logged correct feedback for trace tr-33be136cbb5f4b218b59b416576f1f01

View feedback on the Trace

All feedback on Traces can be accessed using the Trace.info.assessments Python property.

14
mlflow.MlflowClient().get_trace(trace_id).info.assessments
[Assessment(trace_id='tr-33be136cbb5f4b218b59b416576f1f01', name='correctness', source=AssessmentSource(source_type='LLM_JUDGE', source_id='my-llm-judge-version-1'), create_time_ms=1742939348994, last_update_time_ms=1742939348994, expectation=None, feedback=Feedback(value=True), rationale='The answer 14 is correct for the given expression.', metadata=None, error=None, span_id=None, _assessment_id='a-27caf5992c7c42f1a9c3c276d5539304')]
Trace(request_id=tr-33be136cbb5f4b218b59b416576f1f01)

Scenario 2: LLM Reasoning with MLflow Tracing and Feedback - Complex mathematical problem

In this scenario, the agent attempts to correctly solve a more complex mathematical problem. The correct reasoning should produce the answer "14706125" for the expression "((124 + 76) + (9 * 5)) ^ 3 = ?"

Submit the problem to the agent and obtain its answer

All operations that the agent performs to solve the problem are captured in an MLflow Trace.

17
import mlflow

# Enable MLflow Tracing for LangChain
mlflow.langchain.autolog()

problem = "((124 + 76) + (9 * 5)) ^ 3 = ?"
answer = agent.run(problem)
Trace(request_id=tr-34d83d32df0b42498378ea03cc7db7b9)

Assess the correctness of the agent's answer

The agent's answer is checked, and corresponding correctness feedback (True = correct, False = incorrect) is added to the Trace as an MLflow Assessment.

19
from mlflow.entities import AssessmentSource, AssessmentSourceType

trace_id = mlflow.get_last_active_trace().info.request_id
if "14706125" in answer:
    mlflow.log_feedback(
        trace_id=trace_id,
        name="correctness",
        value=True,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id="my-llm-judge-version-1"
        ),
        rationale="The answer 14706125 is correct for the given expression."
    )
    print(f"Logged correct feedback for trace {trace_id}")
else:
    mlflow.log_feedback(
        trace_id=trace_id,
        name="correctness",
        value=False,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id="my-llm-judge-version-1"
        ),
        rationale=f"The answer {answer} is incorrect. The correct answer is 14706125"
    )
    print(f"Logged correct feedback for trace {trace_id}")
Logged correct feedback for trace tr-34d83d32df0b42498378ea03cc7db7b9

View feedback on the Trace

All feedback on Traces can be accessed using the Trace.info.assessments Python property.

21
mlflow.MlflowClient().get_trace(trace_id).info.assessments
[Assessment(trace_id='tr-34d83d32df0b42498378ea03cc7db7b9', name='correctness', source=AssessmentSource(source_type='LLM_JUDGE', source_id='my-llm-judge-version-1'), create_time_ms=1742939358539, last_update_time_ms=1742939358539, expectation=None, feedback=Feedback(value=True), rationale='The answer 14706125 is correct for the given expression.', metadata=None, error=None, span_id=None, _assessment_id='a-4a6e936dc8ad471e8308fba802d477a4')]
Trace(request_id=tr-34d83d32df0b42498378ea03cc7db7b9)

Conclusion

In this tutorial, we:

  • Defined two mathematical tools (Addition and Multiplication) and instrumented them with MLflow tracing.
  • Configured a LangChain tool–calling agent using the real LLM "databricks-meta-llama-3-3-70b-instruct" via ChatDatabricks.
  • Demonstrated two scenarios: one where the agent is likely to produce the correct result (14) for a simple mathematical problem, and one where the agent may have more difficulty solving a more complex mathematical problem
  • Logged feedback about the correctness of the agent's answer using MLflow’s mlflow.log_feedback API for both scenarios.

This setup provides end-to-end observability and evaluation for your LLM-based agents, allowing you to track and improve performance over time.

;