Skip to content

Conversation

@weiyuanyue
Copy link
Contributor

@weiyuanyue weiyuanyue commented Dec 15, 2025

Summary

This PR resolves a critical production-breaking bug that blocked all FoundryLocal model downloads in AI Dev Gallery after upgrading to Foundry Local v0.8.x+. We migrate from fragile custom HTTP API calls to the official Microsoft.AI.Foundry.Local.WinML SDK (v0.8.2.1), restoring full functionality and establishing a resilient foundation for future compatibility.

Impact: Users can now reliably download, prepare, and use FoundryLocal models. The system is significantly more robust against upstream API changes.


Background & Root Cause

In Jul 2025, Foundry Local changed the internal format of a critical Name field in its HTTP API response. While this change was handled internally by Foundry Local, it was not communicated externally, causing silent incompatibility with AIDG's direct HTTP-based integration.

After upgrading to Foundry Local v0.8.x:

  • All model download requests failed silently due to field format mismatch
  • Downstream workflows (model preparation → chat/inference) were completely blocked
  • Users experienced inability to download or use any FoundryLocal models

Business Impact:

  • Disrupted critical developer workflows
  • Increased support burden and user frustration
  • Eroded trust in AIDG's stability and reliability

Solution: SDK Migration

To eliminate this entire class of failures and future-proof the integration, we migrate to the official SDK:

Key Benefits:

  1. API Stability: SDK shields us from low-level HTTP API format changes
  2. Versioned Support: Official, stable integration points maintained by Foundry Local team
  3. Simplified Architecture: Cleaner code with built-in concurrency and error handling
  4. Direct Model Access: No web service dependency—models run directly via SDK

Technical Changes

1. Core Architecture Refactor

FoundryClient.cs (297 lines changed)

  • Before: Custom HTTP client with manual JSON parsing, regex-based progress tracking, and fragile blob storage path resolution
  • After: Official SDK usage via FoundryLocalManager, ICatalog, and IModel interfaces
  • Key Improvements:
    • Thread-safe model preparation with semaphore-based locking
    • Eliminated ~150 lines of error-prone HTTP/JSON handling code
    • Proper async/await patterns throughout
    • Implements IDisposable for proper resource cleanup

FoundryLocalModelProvider.cs (138 lines changed)

  • Before: Direct dependency on OpenAI SDK with custom service URL construction
  • After: SDK-based model lifecycle management (catalog → download → prepare → use)
  • Key Changes:
    • Introduced EnsureModelReadyAsync() to prevent deadlocks in synchronous contexts
    • Changed model identification from unstable Name field to stable Alias field
    • Added proper model state management (_preparedModels dictionary)
    • Integrated telemetry for download success/failure tracking

FoundryLocalChatClientAdapter.cs (119 new lines)

  • Purpose: Bridge between FoundryLocal SDK's native chat client and Microsoft.Extensions.AI.IChatClient interface
  • Key Features:
    • Direct SDK model access (no web service needed)
    • Proper streaming response handling
    • Complete ChatOptions parameter mapping (temperature, top-p, frequency penalty, etc.)
    • Critical Fix: Sets MaxTokens default (1024) to prevent empty model outputs

2. Dependency Updates

Directory.Packages.props

  • ✅ Added: Microsoft.AI.Foundry.Local.WinML (v0.8.2.1)
  • ⬆️ Upgraded: Microsoft.ML.OnnxRuntimeGenAI.Managed & .WinML (v0.10.1 → v0.11.4)

nuget.config

  • Added ORT package source: https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/nuget/v3/index.json
  • Configured package source mapping to route *Foundry* packages to ORT feed

3. Build Configuration

ExcludeExtraLibs.props (35 new lines)

  • Resolves APPX1101 duplicate DLL errors by excluding conflicting ONNX Runtime libraries
  • Removes CUDA provider libraries on x64 (except TRT dependencies)
  • Removes QNN provider libraries on ARM64
  • Adopted from official Foundry Local SDK build patterns

Directory.Build.props

  • Temporarily suppressed IDisposableAnalyzers warnings (IDISP001, IDISP003, IDISP017)
  • Note: 237+ analyzer violations introduced by transitive dependency will be addressed in follow-up PR

4. Telemetry & Observability

FoundryLocalDownloadEvent.cs (55 new lines)

  • Logs all download attempts with model alias, success/failure status, and error messages

5. Data Model Simplification

FoundryCatalogModel.cs (89 lines reduced)

  • Removed manual JSON serialization attributes
  • Removed unused fields (Tag, ProviderType, PromptTemplate, Uri)
  • Cleaner structure: Name, DisplayName, Alias, FileSizeMb, License, ModelId, Runtime

Deleted Files:

  • FoundryServiceManager.cs (81 lines)—replaced by SDK's FoundryLocalManager
  • FoundryJsonContext.cs (18 lines)—no longer needed with SDK
  • Utils.cs (40 lines)—functionality absorbed into SDK

Migration Logic & Business Flow

Before (HTTP-based)

User clicks download
  ↓
Custom HTTP POST to /openai/download
  ↓
Manual SSE stream parsing with regex
  ↓
Fragile blob storage path resolution via separate HTTP call
  ↓
Manual JSON deserialization (breaks on format changes)
  ↓
Hope it worked 🤞

After (SDK-based)

User clicks download
  ↓
SDK: catalog.GetModelAsync(alias)
  ↓
SDK: model.DownloadAsync(progress callback)
  ↓
SDK: model.LoadAsync() [prepare for use]
  ↓
Store in _preparedModels dictionary
  ↓
Ready for inference ✅

Inference Flow

User starts chat
  ↓
GetIChatClient(url) → extract alias
  ↓
Check _preparedModels[alias] (must be prepared beforehand)
  ↓
model.GetChatClientAsync() [SDK native client]
  ↓
Wrap in FoundryLocalChatClientAdapter
  ↓
Stream responses directly via SDK (no web service)

Known Limitations & Follow-up Work

IDisposableAnalyzers Build Warnings

  • Root Cause: Microsoft.AI.Foundry.Local.WinML (v0.8.2.1) includes IDisposableAnalyzers (v4.0.8) as a transitive dependency
  • Impact: 237+ analyzer violations across codebase related to improper IDisposable pattern usage
  • Temporary Solution: Suppressed IDISP001, IDISP003, IDISP017 in Directory.Build.props and project files
  • Planned Remediation: Dedicated follow-up PR to address all violations project-wide

Breaking Changes

None for end users. Internal API changes only:

  • Model URL format: fl://<Name>fl://<Alias> (transparent to users)
  • Internal service architecture: HTTP client → SDK (transparent to users)

Functional Testing

Model Catalog & Discovery

  • Verify model catalog listing works correctly
  • Confirm all models display correct DisplayName, Alias, and FileSizeMb
  • Validate models are grouped by Alias properly (multiple variants per alias)
  • Test cached models are correctly identified and marked as downloaded
  • Verify GetAllModelsInCatalog() returns full catalog including non-downloaded models

Model Download Flow

  • Test model download with progress reporting (0% → 100%)
  • Verify progress callback fires at reasonable intervals during download
  • Test download cancellation via CancellationToken works correctly
  • Confirm already-downloaded models skip re-download and return success immediately
  • Verify download failure scenarios log telemetry with error messages
  • Test multiple concurrent downloads are handled gracefully (no race conditions)
  • Verify network interruption during download is handled with proper error messaging

Model Preparation & Loading

  • Test EnsureModelReadyAsync() successfully prepares models for first use
  • Verify model preparation succeeds without deadlocks in UI thread contexts
  • Confirm already-prepared models skip re-preparation (idempotent behavior)
  • Test concurrent EnsureModelReadyAsync() calls for same model are handled safely (semaphore lock)
  • Verify GetPreparedModel() returns null for unprepared models
  • Verify GetPreparedModel() returns valid IModel for prepared models
  • Test model loading (LoadAsync) completes successfully on x64
  • Test model loading (LoadAsync) completes successfully on ARM64

Chat Inference & Streaming

  • Verify GetIChatClient() throws appropriate exception when model not prepared
  • Test chat inference streaming works end-to-end with proper response chunks
  • Verify MaxTokens parameter properly limits output length
  • Test ChatOptions parameters (temperature, top-p, frequency penalty) are correctly applied
  • Verify streaming handles model-generated stop conditions gracefully
  • Test empty or null message inputs are handled with appropriate errors
  • Verify long conversation histories are processed without truncation issues

Telemetry & Observability

  • Verify telemetry events fire correctly for download success
  • Verify telemetry events fire correctly for download failures with error details
  • Confirm FoundryLocalDownloadEvent includes correct ModelAlias, Success, and ErrorMessage
  • Verify failed downloads log with LogLevel.Critical as expected

Code Generation (TODO: Next PR)

  • Verify GetIChatClientString() returns valid, compilable C# code
  • Confirm generated code includes proper SDK initialization pattern
  • Verify generated code uses correct model Alias (not obsolete Name)
  • Confirm generated code includes necessary using statements

Regression Testing

Multi-Provider Compatibility

  • Verify existing non-FoundryLocal model providers (OpenAI, Ollama, etc.) remain unaffected
  • Test switching between FoundryLocal and other providers works seamlessly
  • Verify model picker UI correctly displays all provider types

UI/UX Flows

  • Verify FoundryLocal model picker view displays correctly
  • Test download progress UI updates smoothly (no flickering or freezing)
  • Verify model preparation status is shown to user when model not ready
  • Test error messages are displayed to user on download/preparation failures
  • Verify model details (size, license, description) are rendered correctly

Project Generation(TODO: Next PR)

  • Test project generation with FoundryLocal models produces correct code samples
  • Verify generated projects reference correct NuGet packages (Microsoft.AI.Foundry.Local.WinML, Microsoft.Extensions.AI)
  • Confirm generated projects compile without errors
  • Verify NugetPackageReferences property returns correct package list

Platform-Specific Validation

  • Windows x64
  • Windows ARM64 QNN
  • Intel NPU
  • Verify no APPX1101 duplicate DLL errors occur on either platform

Edge Cases & Error Handling

Service Availability

  • Verify IsAvailable() returns false when FoundryLocal not installed/initialized
  • Test graceful degradation when FoundryLocal service fails to start
  • Verify appropriate user messaging when SDK initialization fails

Invalid Inputs

  • Test GetIChatClient() with invalid URL format throws clear exception
  • Verify DownloadModel() with non-FoundryCatalogModel returns false
  • Test EnsureModelReadyAsync() with non-existent alias throws clear exception

Resource Management

  • Verify FoundryClient.Dispose() is called properly (no resource leaks)
  • Verify _prepareLock semaphore is disposed correctly
  • Confirm models managed by SDK singleton—no manual disposal attempted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants