Introduction to Microsoft Foundry Local and Supported Models in Foundry Local
Product Owners | December 31, 2025
Microsoft Foundry Local Overview
Microsoft Foundry Local is an on-device AI inference solution designed for developers to run popular open-source generative AI models (like Large Language Models or LLMs) directly on a Windows PC, macOS, or Windows Server. That is to say, it is the software you use to interact with a trained AI model such as Microsoft Phi, or OpenAI’s ChatGPT.
What’s the difference between local and online inference?
- Data Privacy and Security: Since the AI inference runs locally, your sensitive data is processed on your device and doesn't need to be sent to the cloud.
- Offline Operation: It allows your AI applications to function even when there is no internet connection or in environments with limited connectivity.
- Cost Predictability: Inference is run on your local hardware, which can help reduce or eliminate cloud computing costs.
- Open-Source Access: It provides a catalog of popular, optimized open-source models that developers can pull and integrate into their apps.
How Microsoft Foundry Local Works
Foundry Local acts as a streamlined and optimized layer between the developer's application and the local hardware's AI acceleration capabilities.
Model Catalog and Management
Foundry Local maintains a catalog of popular open-source predictive language models that have been optimized for local use on Windows devices.
Developers use the Foundry Local CLI (Command-Line Interface) or SDK (Software Development Kit) to easily browse, download, and manage these models. Once downloaded, the models are stored in a local cache on the system.
The core of local language model performance relies on a high-performance graphics controller with copious amounts of vRAM to hold the model in memory.
Foundry Local handles loading the model into system or video RAM and communicating with the model through a standardized interface, your requests are passed to the model and the reply is then passed back through Foundry Local.
Foundry uses quantized models and optimizations (like INT4/INT8 compression) to ensure efficient use of your device's memory and compute resources, maximizing speed and minimizing resource usage.
In essence, Foundry Local helps to simplify the complex process of getting an optimized, high-performance AI model running locally and privately within a Windows environment.
Supported Models in Foundry Local
Foundry Local maintains a catalog (https://www.foundrylocal.ai/models) of popular open-source models that have been optimized for local use on Windows devices including the following:
| Model Name | General Description |
|---|---|
| Microsoft Phi 4 | In both a standard and mini variant, this model is typically used for on-device ChatBot, and data analysis. |
| Microsoft Phi 4 Reasoning | In both a standard and mini variant, this model excels at complex, multi-step logical tasks like math, coding, and scientific inference. |
| OpenAI GPT OSS 20B | This is designed for use within an agentic workflow, providing many of the best features of OpenAI’s ChatGPT 3o including performing reasoning tasks, ChatBot, and data analysis - currently only available for NVIDIA GPUs. |
| Mistral 7B Instruct | A powerful language model capable of on-device ChatBot, and data analysis. |
| Qwen2.5 | Excels at text, image, video, and audio summarization, as well as complex tasks like reasoning and coding. |
| DeepSeek R1 | A complex reasoning model ideal for coding, math, sciences and data analysis. |
Loading Comments