Files
Homelab/Development/litellm/docs/my-website/release_notes/v1.75.5-stable/index.md

14 KiB
Raw Blame History

title, slug, date, authors, hide_table_of_contents
title slug date authors hide_table_of_contents
v1.75.5-stable - Redis latency improvements v1-75-5 2025-08-10T10:00:00
name title url image_url
Krrish Dholakia CEO, LiteLLM https://www.linkedin.com/in/krish-d/ https://pbs.twimg.com/profile_images/1298587542745358340/DZv3Oj-h_400x400.jpg
name title url image_url
Ishaan Jaffer CTO, LiteLLM https://www.linkedin.com/in/reffajnaahsi/ https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
false

import Image from '@theme/IdealImage'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Deploy this version

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.75.5.rc.1
pip install litellm==1.75.5.post1

Key Highlights

  • Redis - Latency Improvements - Reduces P99 latency by 50% with Redis enabled.
  • Responses API Session Management - Support for managing responses API sessions with images.
  • Oracle Cloud Infrastructure - New LLM provider for calling models on Oracle Cloud Infrastructure.
  • Digital Ocean's Gradient AI - New LLM provider for calling models on Digital Ocean's Gradient AI platform.

Risk of Upgrade

If you build the proxy from the pip package, you should hold off on upgrading. This version makes prisma migrate deploy our default for managing the DB. This is safer, as it doesn't reset the DB, but it requires a manual prisma generate step.

Users of our Docker image, are not affected by this change.


Redis Latency Improvements

<Image img={require('../../img/release_notes/faster_caching_calls.png')} style={{width: '100%', display: 'block', margin: '2rem auto'}} />


This release adds in-memory caching for Redis requests, enabling faster response times in high-traffic. Now, LiteLLM instances will check their in-memory cache for a cache hit, before checking Redis. This reduces caching-related latency from 100ms for LLM API calls to sub-1ms, on cache hits.


Responses API Session Management w/ Images

<Image img={require('../../img/release_notes/responses_api_session_mgt_images.jpg')} style={{width: '100%', display: 'block', margin: '2rem auto'}} />


LiteLLM now supports session management for Responses API requests with images. This is great for use-cases like chatbots, that are using the Responses API to track the state of a conversation. LiteLLM session management works across ALL LLM API's (including Anthropic, Bedrock, OpenAI, etc). LiteLLM session management works by storing the request and response content in an s3 bucket, you can specify.


New Models / Updated Models

New Model Support

Provider Model Context Window Input ($/1M tokens) Output ($/1M tokens)
Bedrock bedrock/us.anthropic.claude-opus-4-1-20250805-v1:0 200k $15 $75
Bedrock bedrock/openai.gpt-oss-20b-1:0 200k 0.07 0.3
Bedrock bedrock/openai.gpt-oss-120b-1:0 200k 0.15 0.6
Fireworks AI fireworks_ai/accounts/fireworks/models/glm-4p5 128k 0.55 2.19
Fireworks AI fireworks_ai/accounts/fireworks/models/glm-4p5-air 128k 0.22 0.88
Fireworks AI fireworks_ai/accounts/fireworks/models/gpt-oss-120b 131072 0.15 0.6
Fireworks AI fireworks_ai/accounts/fireworks/models/gpt-oss-20b 131072 0.05 0.2
Groq groq/openai/gpt-oss-20b 131072 0.1 0.5
Groq groq/openai/gpt-oss-120b 131072 0.15 0.75
OpenAI openai/gpt-5 400k 1.25 10
OpenAI openai/gpt-5-2025-08-07 400k 1.25 10
OpenAI openai/gpt-5-mini 400k 0.25 2
OpenAI openai/gpt-5-mini-2025-08-07 400k 0.25 2
OpenAI openai/gpt-5-nano 400k 0.05 0.4
OpenAI openai/gpt-5-nano-2025-08-07 400k 0.05 0.4
OpenAI openai/gpt-5-chat 400k 1.25 10
OpenAI openai/gpt-5-chat-latest 400k 1.25 10
Azure azure/gpt-5 400k 1.25 10
Azure azure/gpt-5-2025-08-07 400k 1.25 10
Azure azure/gpt-5-mini 400k 0.25 2
Azure azure/gpt-5-mini-2025-08-07 400k 0.25 2
Azure azure/gpt-5-nano-2025-08-07 400k 0.05 0.4
Azure azure/gpt-5-nano 400k 0.05 0.4
Azure azure/gpt-5-chat 400k 1.25 10
Azure azure/gpt-5-chat-latest 400k 1.25 10

Features

Bugs

  • OpenAI
    • Add service_tier and safety_identifier as supported responses api params - PR #13258
    • Correct pricing for web search on 4o-mini - PR #13269
  • Mistral
    • Handle $id and $schema fields when calling mistral - PR #13389

LLM API Endpoints

Features

  • /responses
    • Responses API Session Handling w/ support for images - PR #13347
    • failed if input containing ResponseReasoningItem - PR #13465
    • Support custom tools - PR #13418

Bugs

  • /chat/completions
    • Fix completion_token_details usage object missing text tokens - PR #13234
    • (SDK) handle tool being a pydantic object - PR #13274
    • include cost in streaming usage object - PR #13418
    • Exclude none fields on /chat/completion - allows usage with n8n - PR #13320
  • /responses
    • Transform function call in response for non-openai models (gemini/anthropic) - PR #13260
    • Fix unsupported operand error with model groups - PR #13293
    • Responses api session management for streaming responses - PR #13396
  • /v1/messages
    • Added litellm claude code count tokens - PR #13261
  • /vector_stores
    • Fix create/search vector store errors - PR #13285

MCP Gateway

Features

Bugs

  • Fix auth on UI for bearer token servers - PR #13312
  • allow access group on mcp tool retrieval - PR #13425

Management Endpoints / UI

Features

  • Teams
    • Add team deletion check for teams with keys - PR #12953
  • Models
    • Add ability to set model alias per key/team - PR #13276
    • New button to reload model pricing from model cost map - PR #13464, PR #13470
  • Keys
    • Make team field required when creating service account keys - PR #13302
    • Gray out key-based logging settings for non-enterprise users - prevents confusion on if logging all up is supported - PR #13431
  • Navbar
    • Add logo customization for LiteLLM admin UI - PR #12958
  • Logs
    • Add token breakdowns on logs + session page - PR #13357
  • Usage
    • Ensure Usage Page loads after the DB has large entries - PR #13400
  • Test Key Page
    • allow uploading images for /chat/completions and /responses - PR #13445
  • MCP
    • Add auth tokens to local storage auth - PR #13473

Bugs

  • Custom Root Path
    • Fix login route when SSO is enabled - PR #13267
  • Customers/End-users
    • Allow calling /v1/models when end user over budget - allows model listing to work on OpenWebUI when customer over budget - PR #13320
  • Teams
    • Remove user - team membership, when user removed from team - PR #13433
  • Errors
    • Bubble up network errors to user for Logging and Alerts page - PR #13427
  • Model Hub
    • Show pricing for azure models, when base model is set - PR #13418

Logging / Guardrail Integrations

Features

  • Bedrock Guardrails
    • Redacted sensitive information in bedrock guardrails error message - PR #13356
  • Standard Logging Payload
    • Fix cant register atextexit bug - PR #13436

Bugs

  • Braintrust
    • Allow setting of braintrust callback base url - PR #13368
  • OTEL

Performance / Loadbalancing / Reliability improvements

Features

  • Team-BYOK models
  • Caching
    • GCP IAM auth support for caching - PR #13275
  • Latency
    • reduce p99 latency w/ redis enabled by 50% - only updates model usage if tpm/rpm limits set - PR #13362

General Proxy Improvements

Features

  • Models
    • Support /v1/models/{model_id} retrieval - PR #13268
  • Multi-instance
    • Ensure disable_llm_api_endpoints works - PR #13278
  • Logs
  • Helm

Bugs

  • Non-root image
    • Fix non-root image for migration - PR #13379
  • Get Routes
    • Load get routes when using fastapi-offline - PR #13466
  • Health checks
    • Generate unique trace IDs for Langfuse health checks - PR #13468
  • Swagger
    • Allow using Swagger for /chat/completions - PR #13469
  • Auth
    • Fix JWTs access not working with model access groups - PR #13474

New Contributors

Full Changelog