Added LiteLLM to the stack

2025-08-18 09:40:50 +00:00
parent 0648c1968c
commit d220b04e32
2682 changed files with 533609 additions and 1 deletions
--- a/Development/litellm/docs/my-website/release_notes/v1.75.5-stable/index.md
+++ b/Development/litellm/docs/my-website/release_notes/v1.75.5-stable/index.md
@@ -0,0 +1,299 @@
+---
+title: "v1.75.5-stable - Redis latency improvements"
+slug: "v1-75-5"
+date: 2025-08-10T10:00:00
+authors:
+  - name: Krrish Dholakia
+    title: CEO, LiteLLM
+    url: https://www.linkedin.com/in/krish-d/
+    image_url: https://pbs.twimg.com/profile_images/1298587542745358340/DZv3Oj-h_400x400.jpg
+  - name: Ishaan Jaffer
+    title: CTO, LiteLLM
+    url: https://www.linkedin.com/in/reffajnaahsi/
+    image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
+
+hide_table_of_contents: false
+---
+
+import Image from '@theme/IdealImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Deploy this version
+
+<Tabs>
+<TabItem value="docker" label="Docker">
+
+``` showLineNumbers title="docker run litellm"
+docker run \
+-e STORE_MODEL_IN_DB=True \
+-p 4000:4000 \
+ghcr.io/berriai/litellm:v1.75.5.rc.1
+```
+</TabItem>
+
+<TabItem value="pip" label="Pip">
+
+``` showLineNumbers title="pip install litellm"
+pip install litellm==1.75.5.post1
+```
+
+</TabItem>
+</Tabs>
+
+---
+
+## Key Highlights
+
+- **Redis - Latency Improvements** - Reduces P99 latency by 50% with Redis enabled. 
+- **Responses API Session Management** - Support for managing responses API sessions with images.
+- **Oracle Cloud Infrastructure** - New LLM provider for calling models on Oracle Cloud Infrastructure.
+- **Digital Ocean's Gradient AI** - New LLM provider for calling models on Digital Ocean's Gradient AI platform.
+
+
+### Risk of Upgrade
+
+If you build the proxy from the pip package, you should hold off on upgrading. This version makes `prisma migrate deploy` our default for managing the DB. This is safer, as it doesn't reset the DB, but it requires a manual `prisma generate` step. 
+
+Users of our Docker image, are **not** affected by this change. 
+
+---
+
+## Redis Latency Improvements
+
+<Image 
+  img={require('../../img/release_notes/faster_caching_calls.png')}
+  style={{width: '100%', display: 'block', margin: '2rem auto'}}
+/>
+
+<br/>
+
+This release adds in-memory caching for Redis requests, enabling faster response times in high-traffic. Now, LiteLLM instances will check their in-memory cache for a cache hit, before checking Redis. This reduces caching-related latency from 100ms for LLM API calls to sub-1ms, on cache hits. 
+
+---
+
+## Responses API Session Management w/ Images
+
+<Image 
+  img={require('../../img/release_notes/responses_api_session_mgt_images.jpg')}
+  style={{width: '100%', display: 'block', margin: '2rem auto'}}
+/>
+
+<br/>
+
+LiteLLM now supports session management for Responses API requests with images. This is great for use-cases like chatbots, that are using the Responses API to track the state of a conversation. LiteLLM session management works across **ALL** LLM API's (including Anthropic, Bedrock, OpenAI, etc). LiteLLM session management works by storing the request and response content in an s3 bucket, you can specify. 
+
+---
+
+
+## New Models / Updated Models
+
+#### New Model Support
+
+| Provider    | Model                                  | Context Window | Input ($/1M tokens) | Output ($/1M tokens) |
+| ----------- | -------------------------------------- | -------------- | ------------------- | -------------------- | 
+| Bedrock | `bedrock/us.anthropic.claude-opus-4-1-20250805-v1:0` | 200k | $15 | $75 |
+| Bedrock | `bedrock/openai.gpt-oss-20b-1:0` | 200k | 0.07 | 0.3 |
+| Bedrock | `bedrock/openai.gpt-oss-120b-1:0` | 200k | 0.15 | 0.6 |
+| Fireworks AI | `fireworks_ai/accounts/fireworks/models/glm-4p5` | 128k | 0.55 | 2.19 |
+| Fireworks AI | `fireworks_ai/accounts/fireworks/models/glm-4p5-air` | 128k | 0.22 | 0.88 |
+| Fireworks AI | `fireworks_ai/accounts/fireworks/models/gpt-oss-120b` | 131072 | 0.15 | 0.6 |
+| Fireworks AI | `fireworks_ai/accounts/fireworks/models/gpt-oss-20b` | 131072 | 0.05 | 0.2 |
+| Groq | `groq/openai/gpt-oss-20b` | 131072 | 0.1 | 0.5 |
+| Groq | `groq/openai/gpt-oss-120b` | 131072 | 0.15 | 0.75 |
+| OpenAI | `openai/gpt-5` | 400k | 1.25 | 10 | 
+| OpenAI | `openai/gpt-5-2025-08-07` | 400k | 1.25 | 10 | 
+| OpenAI | `openai/gpt-5-mini` | 400k | 0.25 | 2 |
+| OpenAI | `openai/gpt-5-mini-2025-08-07` | 400k | 0.25 | 2 | 
+| OpenAI | `openai/gpt-5-nano` | 400k | 0.05 | 0.4 | 
+| OpenAI | `openai/gpt-5-nano-2025-08-07` | 400k | 0.05 | 0.4 | 
+| OpenAI | `openai/gpt-5-chat` | 400k | 1.25 | 10 | 
+| OpenAI | `openai/gpt-5-chat-latest` | 400k | 1.25 | 10 | 
+| Azure | `azure/gpt-5` | 400k | 1.25 | 10 | 
+| Azure | `azure/gpt-5-2025-08-07` | 400k | 1.25 | 10 | 
+| Azure | `azure/gpt-5-mini` | 400k | 0.25 | 2 | 
+| Azure | `azure/gpt-5-mini-2025-08-07` | 400k | 0.25 | 2 | 
+| Azure | `azure/gpt-5-nano-2025-08-07` | 400k | 0.05 | 0.4 | 
+| Azure | `azure/gpt-5-nano` | 400k | 0.05 | 0.4 | 
+| Azure | `azure/gpt-5-chat` | 400k | 1.25 | 10 | 
+| Azure | `azure/gpt-5-chat-latest` | 400k | 1.25 | 10 | 
+
+#### Features
+
+- **[OCI](../../docs/providers/oci)**
+    - New LLM provider - [PR #13206](https://github.com/BerriAI/litellm/pull/13206)
+- **[JinaAI](../../docs/providers/jina_ai)**
+    - support multimodal embedding models - [PR #13181](https://github.com/BerriAI/litellm/pull/13181)
+- **GPT-5 ([OpenAI](../../docs/providers/openai)/[Azure](../../docs/providers/azure))**
+    - Support drop_params for temperature - [PR #13390](https://github.com/BerriAI/litellm/pull/13390)
+    - Map max_tokens to max_completion_tokens - [PR #13390](https://github.com/BerriAI/litellm/pull/13390)
+- **[Anthropic](../../docs/providers/anthropic)**
+    - Add claude-opus-4-1 on model cost map - [PR #13384](https://github.com/BerriAI/litellm/pull/13384)
+- **[OpenRouter](../../docs/providers/openrouter)**
+    - Add gpt-oss to model cost map - [PR #13442](https://github.com/BerriAI/litellm/pull/13442)
+- **[Cerebras](../../docs/providers/cerebras)**
+    - Add gpt-oss to model cost map - [PR #13442](https://github.com/BerriAI/litellm/pull/13442)
+- **[Azure](../../docs/providers/azure)**
+    - Support drop params for ‘temperature’ on o-series models - [PR #13353](https://github.com/BerriAI/litellm/pull/13353)
+- **[GradientAI](../../docs/providers/gradient_ai)**
+    - New LLM Provider - [PR #12169](https://github.com/BerriAI/litellm/pull/12169)
+
+#### Bugs
+
+- **[OpenAI](../../docs/providers/openai)**
+    - Add ‘service_tier’ and ‘safety_identifier’ as supported responses api params - [PR #13258](https://github.com/BerriAI/litellm/pull/13258)
+    - Correct pricing for web search on 4o-mini - [PR #13269](https://github.com/BerriAI/litellm/pull/13269)
+- **[Mistral](../../docs/providers/mistral)**
+    - Handle $id and $schema fields when calling mistral - [PR #13389](https://github.com/BerriAI/litellm/pull/13389)
+---
+
+## LLM API Endpoints
+
+#### Features
+
+- `/responses` 
+    - Responses API Session Handling w/ support for images - [PR #13347](https://github.com/BerriAI/litellm/pull/13347)
+    - failed if input containing ResponseReasoningItem - [PR #13465](https://github.com/BerriAI/litellm/pull/13465)
+    - Support custom tools - [PR #13418](https://github.com/BerriAI/litellm/pull/13418)
+
+#### Bugs
+
+- `/chat/completions` 
+    - Fix completion_token_details usage object missing ‘text’ tokens - [PR #13234](https://github.com/BerriAI/litellm/pull/13234)
+    - (SDK) handle tool being a pydantic object - [PR #13274](https://github.com/BerriAI/litellm/pull/13274)
+    - include cost in streaming usage object - [PR #13418](https://github.com/BerriAI/litellm/pull/13418)
+    - Exclude none fields on /chat/completion - allows usage with n8n - [PR #13320](https://github.com/BerriAI/litellm/pull/13320)
+- `/responses` 
+    - Transform function call in response for non-openai models (gemini/anthropic) - [PR #13260](https://github.com/BerriAI/litellm/pull/13260)
+    - Fix unsupported operand error with model groups - [PR #13293](https://github.com/BerriAI/litellm/pull/13293)
+    - Responses api session management for streaming responses - [PR #13396](https://github.com/BerriAI/litellm/pull/13396)
+- `/v1/messages`
+    - Added litellm claude code count tokens - [PR #13261](https://github.com/BerriAI/litellm/pull/13261)
+- `/vector_stores`
+    - Fix create/search vector store errors - [PR #13285](https://github.com/BerriAI/litellm/pull/13285)
+---
+
+## [MCP Gateway](../../docs/mcp)
+
+#### Features
+
+- Add route check for internal users - [PR #13350](https://github.com/BerriAI/litellm/pull/13350)
+- MCP Guardrails - docs - [PR #13392](https://github.com/BerriAI/litellm/pull/13392)
+
+
+#### Bugs
+
+- Fix auth on UI for bearer token servers - [PR #13312](https://github.com/BerriAI/litellm/pull/13312)
+- allow access group on mcp tool retrieval - [PR #13425](https://github.com/BerriAI/litellm/pull/13425)
+
+
+---
+
+## Management Endpoints / UI
+
+#### Features
+
+- **Teams**
+    - Add team deletion check for teams with keys - [PR #12953](https://github.com/BerriAI/litellm/pull/12953)
+- **Models**
+    - Add ability to set model alias per key/team - [PR #13276](https://github.com/BerriAI/litellm/pull/13276)
+    - New button to reload model pricing from model cost map - [PR #13464](https://github.com/BerriAI/litellm/pull/13464), [PR #13470](https://github.com/BerriAI/litellm/pull/13470)
+- **Keys**
+    - Make ‘team’ field required when creating service account keys - [PR #13302](https://github.com/BerriAI/litellm/pull/13302)
+    - Gray out key-based logging settings for non-enterprise users - prevents confusion on if ‘logging’ all up is supported - [PR #13431](https://github.com/BerriAI/litellm/pull/13431)
+- **Navbar**
+    - Add logo customization for LiteLLM admin UI - [PR #12958](https://github.com/BerriAI/litellm/pull/12958)
+- **Logs**
+    - Add token breakdowns on logs + session page - [PR #13357](https://github.com/BerriAI/litellm/pull/13357)
+- **Usage**
+    - Ensure Usage Page loads after the DB has large entries - [PR #13400](https://github.com/BerriAI/litellm/pull/13400)
+- **Test Key Page**
+    - allow uploading images for /chat/completions and /responses - [PR #13445](https://github.com/BerriAI/litellm/pull/13445)
+- **MCP**
+    - Add auth tokens to local storage auth - [PR #13473](https://github.com/BerriAI/litellm/pull/13473)
+
+#### Bugs
+
+- **Custom Root Path**
+    - Fix login route when SSO is enabled - [PR #13267](https://github.com/BerriAI/litellm/pull/13267)
+- **Customers/End-users**
+    - Allow calling /v1/models when end user over budget - allows model listing to work on OpenWebUI when customer over budget - [PR #13320](https://github.com/BerriAI/litellm/pull/13320)
+- **Teams**
+    - Remove user - team membership, when user removed from team - [PR #13433](https://github.com/BerriAI/litellm/pull/13433)
+- **Errors**
+    - Bubble up network errors to user for Logging and Alerts page - [PR #13427](https://github.com/BerriAI/litellm/pull/13427)
+- **Model Hub**
+    - Show pricing for azure models, when base model is set - [PR #13418](https://github.com/BerriAI/litellm/pull/13418)
+---
+
+## Logging / Guardrail Integrations
+
+#### Features
+
+- **Bedrock Guardrails**
+    - Redacted sensitive information in bedrock guardrails error message - [PR #13356](https://github.com/BerriAI/litellm/pull/13356)
+- **Standard Logging Payload**
+    - Fix ‘can’t register atextexit’ bug - [PR #13436](https://github.com/BerriAI/litellm/pull/13436)
+
+#### Bugs
+
+- **Braintrust**
+    - Allow setting of braintrust callback base url - [PR #13368](https://github.com/BerriAI/litellm/pull/13368)
+- **OTEL**
+    - Track pre_call hook latency  - [PR #13362](https://github.com/BerriAI/litellm/pull/13362)
+
+---
+
+## Performance / Loadbalancing / Reliability improvements
+
+#### Features
+
+- **Team-BYOK models**
+    - Add wildcard model support - [PR #13278](https://github.com/BerriAI/litellm/pull/13278)
+- **Caching**
+    - GCP IAM auth support for caching - [PR #13275](https://github.com/BerriAI/litellm/pull/13275)
+- **Latency**
+    - reduce p99 latency w/ redis enabled by 50% - only updates model usage if tpm/rpm limits set - [PR #13362](https://github.com/BerriAI/litellm/pull/13362)
+
+---
+
+## General Proxy Improvements
+
+#### Features
+
+- **Models**
+    - Support /v1/models/\{model_id\} retrieval - [PR #13268](https://github.com/BerriAI/litellm/pull/13268)
+- **Multi-instance**
+    - Ensure disable_llm_api_endpoints works - [PR #13278](https://github.com/BerriAI/litellm/pull/13278)
+- **Logs**
+    - Add apscheduler log suppress - [PR #13299](https://github.com/BerriAI/litellm/pull/13299)
+- **Helm**
+    - Add labels to migrations job template - [PR #13343](https://github.com/BerriAI/litellm/pull/13343) s/o [@unique-jakub](https://github.com/unique-jakub)
+
+#### Bugs
+
+- **Non-root image**
+    - Fix non-root image for migration - [PR #13379](https://github.com/BerriAI/litellm/pull/13379)
+- **Get Routes**
+    - Load get routes when using fastapi-offline - [PR #13466](https://github.com/BerriAI/litellm/pull/13466)
+- **Health checks**
+    - Generate unique trace IDs for Langfuse health checks - [PR #13468](https://github.com/BerriAI/litellm/pull/13468)
+- **Swagger**
+    - Allow using Swagger for /chat/completions - [PR #13469](https://github.com/BerriAI/litellm/pull/13469)
+- **Auth**
+    - Fix JWTs access not working with model access groups - [PR #13474](https://github.com/BerriAI/litellm/pull/13474)
+    
+---
+
+## New Contributors
+
+* @bbartels made their first contribution in https://github.com/BerriAI/litellm/pull/13244
+* @breno-aumo made their first contribution in https://github.com/BerriAI/litellm/pull/13206
+* @pascalwhoop made their first contribution in https://github.com/BerriAI/litellm/pull/13122
+* @ZPerling made their first contribution in https://github.com/BerriAI/litellm/pull/13045
+* @zjx20 made their first contribution in https://github.com/BerriAI/litellm/pull/13181
+* @edwarddamato made their first contribution in https://github.com/BerriAI/litellm/pull/13368
+* @msannan2 made their first contribution in https://github.com/BerriAI/litellm/pull/12169
+
+
+## **[Full Changelog](https://github.com/BerriAI/litellm/compare/v1.74.15-stable...v1.75.5-stable.rc-draft)**