feature: MLFlow E2E Example Notebook (5513)#5701
feature: MLFlow E2E Example Notebook (5513)#5701aviruthen wants to merge 2 commits intoaws:masterfrom
Conversation
sagemaker-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR fixes several issues in the MLflow E2E example notebook: correcting Session instantiation, using search_model_versions instead of deprecated API, removing unnecessary custom translators, using core_endpoint.invoke() instead of direct boto3 calls, and adding sagemaker_session to ModelTrainer. The changes are well-motivated and align with V3 SDK conventions. A few minor issues remain.
| "\n", | ||
| "# Training on SageMaker managed infrastructure\n", | ||
| "model_trainer = ModelTrainer(\n", | ||
| " sagemaker_session=sagemaker_session,\n", |
There was a problem hiding this comment.
Good addition of sagemaker_session=sagemaker_session. However, per SDK conventions, note that sagemaker_session is typically placed as the last parameter in constructor calls (matching the convention that optional session parameters come at the end). Consider moving it after the other parameters for consistency with other V3 example notebooks, though this is a minor style point for a notebook.
| "# Use search_model_versions (compatible with MLflow 3.x)\n", | ||
| "model_versions = client.search_model_versions(\n", | ||
| " filter_string=f\"name='{MLFLOW_REGISTERED_MODEL_NAME}'\",\n", | ||
| " order_by=['version_number DESC'],\n", |
There was a problem hiding this comment.
The order_by=['version_number DESC'] parameter — please verify this is the correct field name for MLflow 3.x's search_model_versions. In some MLflow versions, the field is version_number, but in others it may be creation_timestamp or just version. If this is incorrect, the notebook will fail at runtime. The MLflow 3.x docs indicate order_by supports "version_number DESC" but it's worth double-checking against the version pinned in the install cell.
🤖 Iteration #1 — Review Comments AddressedDescriptionAddress reviewer feedback on the MLflow E2E example notebook:
Related IssueRelated issue: 5513 Changes Made
Merge Checklist
Comments reviewed: 5
|
sagemaker-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR fixes several real issues in the MLflow E2E example notebook: Session instantiation bug, deprecated MLflow API usage, unnecessary custom translators, direct boto3 usage replaced with core_endpoint.invoke(), and version pinning improvements. The changes are well-motivated and align with V3 SDK conventions. A few minor concerns around dependency pinning and robustness.
| " \"MLFLOW_TRACKING_ARN\": MLFLOW_TRACKING_ARN\n", | ||
| " },\n", | ||
| " dependencies={\"auto\": False, \"custom\": [\"mlflow==3.4.0\", \"sagemaker==3.3.1\", \"numpy==2.4.1\", \"cloudpickle==3.1.2\"]},\n", | ||
| " dependencies={\"auto\": False, \"custom\": [\"mlflow>=3.4.0\", \"sagemaker>=3.3.1\", \"numpy>=2.4.1\", \"cloudpickle>=3.1.2\"]},\n", |
There was a problem hiding this comment.
Using >= without an upper bound for deployment dependencies is risky — these packages are installed at inference time inside the container, and a future major version bump of mlflow, sagemaker, numpy, or cloudpickle could break the deployed endpoint silently. Consider using compatible-release constraints (e.g., mlflow>=3.4.0,<4, numpy>=2.4.1,<3, cloudpickle>=3.1.2,<4) to balance freshness with stability. This is especially important for sagemaker which had a major v2→v3 transition.
| "# Note: order_by field name may vary across MLflow versions;\n", | ||
| "# 'creation_timestamp' is broadly supported.\n", | ||
| "model_versions = client.search_model_versions(\n", | ||
| " filter_string=f\"name='{MLFLOW_REGISTERED_MODEL_NAME}'\",\n", |
There was a problem hiding this comment.
Minor: trailing closing paren on max_results=1 line is missing a trailing comma, and the list ['creation_timestamp DESC'] uses single quotes inside double-quoted JSON string — this is fine for Python but worth noting for consistency. More importantly, consider adding a guard for the case where model_versions is empty (no versions registered yet), e.g.:
if not model_versions:
raise RuntimeError(f"No model versions found for '{MLFLOW_REGISTERED_MODEL_NAME}'")This would give users a clear error message instead of an IndexError.
| @@ -35,7 +35,8 @@ | |||
| "outputs": [], | |||
| "source": [ | |||
There was a problem hiding this comment.
The %pip install cell only installs mlflow>=3.4.0, but the training code's requirements.txt (referenced in Step 4) presumably still pins mlflow==3.4.0. The PR description mentions this inconsistency but the diff only shows changes to the notebook cells and the dependencies dict in ModelBuilder. If requirements.txt is a separate file in the repo, it should also be updated to use >= constraints for consistency. Could you confirm whether requirements.txt is part of this PR or needs a separate change?
| ")\n", | ||
| "\n", | ||
| "prediction = json.loads(response['Body'].read().decode('utf-8'))\n", | ||
| "# The invoke() response body may be a streaming object or bytes;\n", |
There was a problem hiding this comment.
The response body handling logic is reasonable for robustness, but it would be good to add a brief comment or reference to the core_endpoint.invoke() return type from sagemaker-core so future maintainers know what to expect. Also, consider whether result.body could be str already — the current code handles bytes and stream but not str, though in practice it's likely always one of the first two.
Description
The issue requests adding a new MLflow E2E example notebook. The notebook already exists at v3-examples/ml-ops-examples/v3-mlflow-train-inference-e2e-example.ipynb but has several issues that need to be fixed:
Session.boto_region_nameis accessed as a class attribute, but it's an instance property. Should beSession().boto_region_name.boto3.client('sagemaker-runtime')directly for inference testing instead of using thecore_endpoint.invoke()pattern established in other V3 example notebooks (see train-inference-e2e-example.ipynb Step 7).%pip installcell pinsmlflow==3.4.0, requirements.txt in training code pinsmlflow==3.4.0, and ModelBuilder dependencies pinmlflow==3.4.0andsagemaker==3.3.1andnumpy==2.4.1- these should be kept consistent but the hard-pinned versions may become outdated quickly. Should use minimum version constraints or reference a variable.sagemaker_session, the MLflow notebook omits it.MLFLOW_MODEL_PATH, the MLflow model already includes signature information. The custom PyTorchInputTranslator/PyTorchOutputTranslator may conflict with MLflow's built-in serialization. The SchemaBuilder should use plain JSON sample_input/output without custom translators for MLflow models.registered_model.latest_versionsis deprecated in MLflow 3.x. Should useclient.search_model_versions()instead.Related Issue
Related issue: 5513
Changes Made
v3-examples/ml-ops-examples/v3-mlflow-train-inference-e2e-example.ipynbAI-Generated PR
This PR was automatically generated by the PySDK Issue Agent.
Merge Checklist
prefix: descriptionformat