OpenViking Multi-Model Support: Use Claude, GPT-4, Qwen, and Ollama Together
Problem
I was building an AI agent with OpenViking and wanted to switch between GPT-4o for complex reasoning and DeepSeek for cost-sensitive operations. But OpenViking’s documentation showed only Volcengine (Doubao) and OpenAI configurations.
I needed to know: Can I use Claude? Can I run local models? How do I configure different providers?
What I Found
OpenViking supports three VLM provider types: volcengine, openai, and litellm. The litellm provider is the key - it unlocks access to Claude, DeepSeek, Gemini, Qwen, Ollama, vLLM, and 100+ other models.
Here’s the basic structure in ov.conf:
{ "vlm": { "provider": "litellm", "model": "your-model-name", "api_key": "your-api-key" }}How to Configure Each Provider
OpenAI (GPT-4o)
This is the simplest configuration if you’re already using OpenAI:
{ "vlm": { "provider": "openai", "model": "gpt-4o", "api_key": "your-openai-api-key", "api_base": "https://api.openai.com/v1" }}I tested this first and it worked immediately.
Anthropic (Claude)
To use Claude, switch the provider to litellm:
{ "vlm": { "provider": "litellm", "model": "claude-3-5-sonnet-20240620", "api_key": "your-anthropic-api-key" }}LiteLLM automatically detects that this is an Anthropic model from the model name and routes it to the correct API.
DeepSeek
DeepSeek is a cost-effective alternative for many tasks:
{ "vlm": { "provider": "litellm", "model": "deepseek-chat", "api_key": "your-deepseek-api-key" }}I tested DeepSeek for simple extraction tasks and it worked well at a fraction of the cost.
Qwen (DashScope)
For Qwen models through Alibaba’s DashScope:
{ "vlm": { "provider": "litellm", "model": "dashscope/qwen-turbo", "api_key": "your-dashscope-api-key", "api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1" }}Note the dashscope/ prefix and the custom api_base - these are required for Qwen.
Local Models with Ollama
For privacy-sensitive data or offline use, I configured Ollama:
# Start Ollama serverollama serve
# Pull a modelollama pull llama3.1Then configure OpenViking:
{ "vlm": { "provider": "litellm", "model": "ollama/llama3.1", "api_base": "http://localhost:11434" }}No API key needed for local models.
vLLM for Production
For production deployments with self-hosted models:
{ "vlm": { "provider": "litellm", "model": "hosted_vllm/llama-3.1-8b", "api_base": "http://localhost:8000/v1" }}Model Selection Strategy
I created a decision table for when to use each model:
| Use Case | Model | Why |
|---|---|---|
| Complex reasoning | Claude 3.5 Sonnet | Best for nuanced analysis |
| Production semantic processing | GPT-4o | Reliable, well-tested |
| Cost-sensitive operations | DeepSeek | 10x cheaper than GPT-4 |
| Chinese language tasks | Qwen-turbo | Optimized for Chinese |
| Privacy-sensitive data | Ollama (local) | Data never leaves your machine |
| High-volume processing | vLLM | No API rate limits |
How LiteLLM Works
I was curious why LiteLLM can handle so many providers with minimal configuration. The answer is automatic model detection.
When I specify claude-3-5-sonnet-20240620, LiteLLM:
- Recognizes this is an Anthropic model
- Routes to Anthropic’s API
- Formats the request in Anthropic’s format
- Returns a standardized response
The same happens for DeepSeek, Gemini, and others. No manual routing needed.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐│ OpenViking │────▶│ LiteLLM │────▶│ Provider ││ Request │ │ Router │ │ API │└──────────────┘ └──────────────┘ └──────────────┘ │ ▼ Model Detection claude-* -> Anthropic deepseek-* -> DeepSeek ollama/* -> Local gpt-* -> OpenAIEmbedding Models
OpenViking also supports multiple embedding providers:
{ "embedding": { "dense": { "provider": "openai", "model": "text-embedding-3-large", "dimension": 3072, "api_key": "your-openai-key" } }}Supported embedding providers:
volcengine: Doubao embeddingsopenai: text-embedding-3-large/smalljina: Jina embeddings
Full Configuration Example
Here’s my complete ov.conf for a multi-model setup:
{ "storage": { "workspace": "/home/user/openviking_workspace" }, "embedding": { "dense": { "provider": "openai", "model": "text-embedding-3-large", "dimension": 3072, "api_key": "your-openai-key" } }, "vlm": { "provider": "litellm", "model": "claude-3-5-sonnet-20240620", "api_key": "your-anthropic-key" }}Switching Models at Runtime
I sometimes need to switch models without changing the config file. Here’s how I do it:
import json
def switch_model(provider: str, model: str, api_key: str = None): with open('ov.conf', 'r') as f: config = json.load(f)
config['vlm']['provider'] = provider config['vlm']['model'] = model if api_key: config['vlm']['api_key'] = api_key
with open('ov.conf', 'w') as f: json.dump(config, f, indent=2)
# Switch to DeepSeek for cost savingsswitch_model('litellm', 'deepseek-chat', 'your-deepseek-key')
# Switch to local model for privacyswitch_model('litellm', 'ollama/llama3.1')Common Issues
Issue 1: Model Not Found
When I used gpt-4o with litellm provider:
Error: Model gpt-4o not foundThe fix: Use the correct model identifier. For LiteLLM with OpenAI, just use the model name directly:
{ "vlm": { "provider": "litellm", "model": "gpt-4o", "api_key": "your-openai-key" }}Or use the OpenAI provider directly:
{ "vlm": { "provider": "openai", "model": "gpt-4o", "api_key": "your-openai-key" }}Issue 2: Ollama Connection Refused
When I tried to use Ollama without starting the server:
Error: Connection refused to localhost:11434The fix: Make sure Ollama is running:
# Check if Ollama is runningcurl http://localhost:11434/api/tags
# If not, start itollama serveIssue 3: Qwen API Base Missing
When I configured Qwen without the custom API base:
Error: Invalid API endpointThe fix: Always include the DashScope API base:
{ "vlm": { "provider": "litellm", "model": "dashscope/qwen-turbo", "api_key": "your-key", "api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1" }}Summary
In this post, I showed how to configure OpenViking to work with multiple LLM providers through LiteLLM. The key point is that LiteLLM acts as a unified interface - you specify the model name, and it handles the provider-specific formatting automatically.
I covered configurations for OpenAI, Claude, DeepSeek, Qwen, Ollama, and vLLM. Each has its use case: GPT-4o for production, Claude for complex reasoning, DeepSeek for cost savings, Qwen for Chinese tasks, and Ollama for privacy.
This flexibility future-proofs your agent infrastructure against provider lock-in, cost changes, and availability issues. Switch models by changing one line in your config file.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments