18 KiB
title, slug, date, authors, hide_table_of_contents
title | slug | date | authors | hide_table_of_contents | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
v1.74.7-stable | v1-74-7 | 2025-07-19T10:00:00 |
|
false |
import Image from '@theme/IdealImage'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
Deploy this version
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.74.7-stable.patch.1
pip install litellm==1.74.7.post2
Key Highlights
- Vector Stores - Support for Vertex RAG Engine, PG Vector, OpenAI & Azure OpenAI Vector Stores.
- Bulk Editing Users - Bulk editing users on the UI.
- Health Check Improvements - Prevent unnecessary pod restarts during high traffic.
- New LLM Providers - Added Moonshot AI and Vercel v0 provider support.
Vector Stores API
<Image img={require('../../img/release_notes/vector_stores.png')} />
This release introduces support for using VertexAI RAG Engine, PG Vector, Bedrock Knowledge Bases, and OpenAI Vector Stores with LiteLLM.
This is ideal for use cases requiring external knowledge sources with LLMs.
This brings the following benefits for LiteLLM users:
Proxy Admin Benefits:
- Fine-grained access control: determine which Keys and Teams can access specific Vector Stores
- Complete usage tracking and monitoring across all vector store operations
Developer Benefits:
- Simple, unified interface for querying vector stores and using them with LLM API requests
- Consistent API experience across all supported vector store providers
Bulk Editing Users
<Image img={require('../../img/bulk_edit_graphic.png')} />
v1.74.7-stable introduces Bulk Editing Users on the UI. This is useful for:
- granting all existing users to a default team (useful for controlling access / tracking spend by team)
- controlling personal model access for existing users
Health Check Server
<Image alt="Separate Health App Architecture" img={require('../../img/separate_health_app_architecture.png')} style={{ borderRadius: '8px', marginBottom: '1em', maxWidth: '100%' }} />
This release brings reliability improvements that prevent unnecessary pod restarts during high traffic. Previously, when the main LiteLLM app was busy serving traffic, health endpoints would timeout even when pods were healthy.
Starting with this release, you can run health endpoints on an isolated process with a dedicated port. This ensures liveness and readiness probes remain responsive even when the main LiteLLM app is under heavy load.
New Models / Updated Models
Pricing / Context Window Updates
Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) |
---|---|---|---|---|
Azure AI | azure_ai/grok-3 |
131k | $3.30 | $16.50 |
Azure AI | azure_ai/global/grok-3 |
131k | $3.00 | $15.00 |
Azure AI | azure_ai/global/grok-3-mini |
131k | $0.25 | $1.27 |
Azure AI | azure_ai/grok-3-mini |
131k | $0.275 | $1.38 |
Azure AI | azure_ai/jais-30b-chat |
8k | $3200 | $9710 |
Groq | groq/moonshotai-kimi-k2-instruct |
131k | $1.00 | $3.00 |
AI21 | jamba-large-1.7 |
256k | $2.00 | $8.00 |
AI21 | jamba-mini-1.7 |
256k | $0.20 | $0.40 |
Together.ai | together_ai/moonshotai/Kimi-K2-Instruct |
131k | $1.00 | $3.00 |
v0 | v0/v0-1.0-md |
128k | $3.00 | $15.00 |
v0 | v0/v0-1.5-md |
128k | $3.00 | $15.00 |
v0 | v0/v0-1.5-lg |
512k | $15.00 | $75.00 |
Moonshot | moonshot/moonshot-v1-8k |
8k | $0.20 | $2.00 |
Moonshot | moonshot/moonshot-v1-32k |
32k | $1.00 | $3.00 |
Moonshot | moonshot/moonshot-v1-128k |
131k | $2.00 | $5.00 |
Moonshot | moonshot/moonshot-v1-auto |
131k | $2.00 | $5.00 |
Moonshot | moonshot/kimi-k2-0711-preview |
131k | $0.60 | $2.50 |
Moonshot | moonshot/moonshot-v1-32k-0430 |
32k | $1.00 | $3.00 |
Moonshot | moonshot/moonshot-v1-128k-0430 |
131k | $2.00 | $5.00 |
Moonshot | moonshot/moonshot-v1-8k-0430 |
8k | $0.20 | $2.00 |
Moonshot | moonshot/kimi-latest |
131k | $2.00 | $5.00 |
Moonshot | moonshot/kimi-latest-8k |
8k | $0.20 | $2.00 |
Moonshot | moonshot/kimi-latest-32k |
32k | $1.00 | $3.00 |
Moonshot | moonshot/kimi-latest-128k |
131k | $2.00 | $5.00 |
Moonshot | moonshot/kimi-thinking-preview |
131k | $30.00 | $30.00 |
Moonshot | moonshot/moonshot-v1-8k-vision-preview |
8k | $0.20 | $2.00 |
Moonshot | moonshot/moonshot-v1-32k-vision-preview |
32k | $1.00 | $3.00 |
Moonshot | moonshot/moonshot-v1-128k-vision-preview |
131k | $2.00 | $5.00 |
Features
- 🆕 Moonshot API (Kimi)
- New LLM API integration for accessing Kimi models - PR #12592, Get Started
- 🆕 v0 Provider
- New provider integration for v0.dev - PR #12751, Get Started
- OpenAI
- Use OpenAI DeepResearch models with
litellm.completion
(/chat/completions
) - PR #12627 DOC NEEDED - Add
input_fidelity
parameter for OpenAI image generation - PR #12662, Get Started
- Use OpenAI DeepResearch models with
- Azure OpenAI
- Anthropic
- Tool cache control support - PR #12668
- Bedrock
- Claude 4 /invoke route support - PR #12599, Get Started
- Application inference profile tool choice support - PR #12599
- Gemini
- VertexAI
- Added Vertex AI RAG Engine support (use with OpenAI compatible
/vector_stores
API) - PR #12752, Get Started
- Added Vertex AI RAG Engine support (use with OpenAI compatible
- vLLM
- Added support for using Rerank endpoints with vLLM - PR #12738, Get Started
- AI21
- Added ai21/jamba-1.7 model family pricing - PR #12593, Get Started
- Together.ai
- [New Model] add together_ai/moonshotai/Kimi-K2-Instruct - PR #12645, Get Started
- Groq
- Add groq/moonshotai-kimi-k2-instruct model configuration - PR #12648, Get Started
- Github Copilot
- Change System prompts to assistant prompts for GH Copilot - PR #12742, Get Started
Bugs
- Anthropic
- Fix streaming + response_format + tools bug - PR #12463
- XAI
- grok-4 does not support the
stop
param - PR #12646
- grok-4 does not support the
- AWS
- Role chaining with web authentication for AWS Bedrock - PR #12607
- VertexAI
- Add project_id to cached credentials - PR #12661
- Bedrock
- Fix bedrock nova micro and nova lite context window info in PR #12619
LLM API Endpoints
Features
- /chat/completions
- Include tool calls in output of trim_messages - PR #11517
- /v1/vector_stores
- New OpenAI-compatible vector store endpoints - PR #12699, Get Started
- Vector store search endpoint - PR #12749, Get Started
- Support for using PG Vector as a vector store - PR #12667, Get Started
- /streamGenerateContent
- Non-gemini model support - PR #12647
Bugs
- /vector_stores
- Knowledge Base Call returning error when passing as
tools
- PR #12628
- Knowledge Base Call returning error when passing as
MCP Gateway
Features
- Access Groups
- Namespacing
- Gateway Features
- Allow using MCPs with all LLM APIs (VertexAI, Gemini, Groq, etc.) when using /responses - PR #12546
Bugs
- Fix to update object permission on update/delete key/team - [PR #12701](https://github.com/BerriAI/litellm/pull/12701)
- Include /mcp in list of available routes on proxy - [PR #12612](https://github.com/BerriAI/litellm/pull/12612)
Management Endpoints / UI
Features
- Keys
- Regenerate Key State Management improvements - PR #12729
- Models
- Usage Page
- Fix Y-axis labels overlap on Spend per Tag chart - PR #12754
- Teams
- Users
- New
/user/bulk_update
endpoint - PR #12720
- New
- Logs Page
- Add
end_user
filter on UI Logs Page - PR #12663
- Add
- MCP Servers
- Copy MCP Server name functionality - PR #12760
- Vector Stores
- General
- Add Copy-on-Click for all IDs (Key, Team, Organization, MCP Server) - PR #12615
- SCIM
- Add GET /ServiceProviderConfig endpoint - PR #12664
Bugs
- Teams
Logging / Guardrail Integrations
Features
- Google Cloud Model Armor
- New guardrails integration - PR #12492
- Bedrock Guardrails
- Allow disabling exception on 'BLOCKED' action - PR #12693
- Guardrails AI
- Support
llmOutput
based guardrails as pre-call hooks - PR #12674
- Support
- DataDog LLM Observability
- Add support for tracking the correct span type based on LLM Endpoint used - PR #12652
- Custom Logging
- Allow reading custom logger python scripts from S3 or GCS Bucket - PR #12623
Bugs
- General Logging
- StandardLoggingPayload on cache_hits should track custom llm provider - PR #12652
- S3 Buckets
- S3 v2 log uploader crashes when using with guardrails - PR #12733
Performance / Loadbalancing / Reliability improvements
Features
- Health Checks
- Caching
- Add Azure Blob cache support - PR #12587
- Router
- Handle ZeroDivisionError with zero completion tokens in lowest_latency strategy - PR #12734
Bugs
- Database
- Cache
- Fix: redis caching for embedding response models - PR #12750
Helm Chart
- DB Migration Hook: refactor to support use_prisma_migrate - for helm hook PR
- Add envVars and extraEnvVars support to Helm migrations job - PR #12591
General Proxy Improvements
Features
- Control Plane + Data Plane Architecture
- Control Plane + Data Plane support - PR #12601
- Proxy CLI
- Add "keys import" command to CLI - PR #12620
- Swagger Documentation
- Add swagger docs for LiteLLM /chat/completions, /embeddings, /responses - PR #12618
- Dependencies
- Loosen rich version from ==13.7.1 to >=13.7.1 - PR #12704
Bugs
-
Verbose log is enabled by default fix - PR #12596
-
Add support for disabling callbacks in request body - PR #12762
-
Handle circular references in spend tracking metadata JSON serialization - PR #12643
New Contributors
- @AntonioKL made their first contribution in https://github.com/BerriAI/litellm/pull/12591
- @marcelodiaz558 made their first contribution in https://github.com/BerriAI/litellm/pull/12541
- @dmcaulay made their first contribution in https://github.com/BerriAI/litellm/pull/12463
- @demoray made their first contribution in https://github.com/BerriAI/litellm/pull/12587
- @staeiou made their first contribution in https://github.com/BerriAI/litellm/pull/12631
- @stefanc-ai2 made their first contribution in https://github.com/BerriAI/litellm/pull/12622
- @RichardoC made their first contribution in https://github.com/BerriAI/litellm/pull/12607
- @yeahyung made their first contribution in https://github.com/BerriAI/litellm/pull/11795
- @mnguyen96 made their first contribution in https://github.com/BerriAI/litellm/pull/12619
- @rgambee made their first contribution in https://github.com/BerriAI/litellm/pull/11517
- @jvanmelckebeke made their first contribution in https://github.com/BerriAI/litellm/pull/12725
- @jlaurendi made their first contribution in https://github.com/BerriAI/litellm/pull/12704
- @doublerr made their first contribution in https://github.com/BerriAI/litellm/pull/12661