Bedrock Runtime: InvokeModelWithResponseStream, exceeded maximum number of attem...

2025年11月27日 21:51 状态: processing

🚨 错误信息

InvokeModelWithResponseStream: operation error Bedrock Runtime: InvokeModelWithResponseStream, exceeded maximum number of attempts, 3, https response error StatusCode: 429, RequestID: 96f78182-b0e4-4e57-991d-79639de48165, ThrottlingException: Too many requests, please wait before trying again. (request id: 20251127215017529953871hSy7pGA2) (request id: 20251127215013419007644vXFKl4BB) (request id: 20251127215013410983617jXN9N6vx) (request id: 20251127215002807406249KbPEBQcW)

🤖 AI解决方案

错误分析

1. 错误翻译


原文：ThrottlingException: Too many requests, please wait before trying again.
译文：限流异常：请求过多，请在重试前等待。

2. 原因分析

这是一个AWS Bedrock服务的API限流错误。根本原因是：

短时间内向Bedrock API发送了过多的请求

超出了服务端的请求速率限制（每分钟/每秒的请求配额）

系统自动重试3次后仍然无法成功

3. 解决方案

python
import time
import boto3
from botocore.exceptions import ClientError
def invoke_bedrock_with_retry(prompt, max_retries=5, base_delay=1):
    """
    带指数退避重试的Bedrock调用函数
    """
    bedrock_runtime = boto3.client('bedrock-runtime')
    
    for attempt in range(max_retries):
        try:
            response = bedrock_runtime.invoke_model_with_response_stream(
                modelId='anthropic.claude-3-sonnet-20240229-v1:0',
                body=json.dumps({
                    "messages": [{"role": "user", "content": prompt}],
                    "max_tokens": 1000
                })
            )
            return response
            
        except ClientError as e:
            if e.response['Error']['Code'] == 'ThrottlingException':
                if attempt < max_retries - 1:
                    # 指数退避：等待时间逐渐增加
                    delay = base_delay  (2 * attempt)
                    print(f"请求被限流，等待 {delay} 秒后重试...")
                    time.sleep(delay)
                    continue
                else:
                    raise Exception(f"重试 {max_retries} 次后仍然失败")
            else:
                raise e
使用示例

try:
    response = invoke_bedrock_with_retry("你的提示词")
    # 处理响应...
except Exception as e:
    print(f"调用失败: {e}")

4. ️ 预防措施

请求配额管理：

监控AWS Bedrock服务的配额使用情况

实现客户端请求队列，控制并发请求数量

使用令牌桶算法平滑请求频率

⚡ 重试策略优化：

实施指数退避算法，避免雪崩效应

设置合理的最大重试次数（建议3-5次）

添加随机抖动(jitter)避免所有客户端同时重试

推荐监控工具： AWS CloudWatch可以监控Bedrock服务的限流指标和配额使用情况。

🔍 分析新问题 📝 查看更多问题