How to Use MiniMax Image-01 for Text-to-Image and Image-to-Image Generation

Mar 30, 2026

Purpose

This post shows how to use MiniMax’s image-01 model for text-to-image generation, image-to-image transformation, and aspect ratio control.

Problem

I tried generating images with MiniMax’s older models and got this error:

Error: Model 'M2' has been deprecated. Please use 'M2.7' or 'image-01' instead.

I was using code from an old tutorial:

# This code NO longer works
result = minimax.image.generate(
    model="M2",  # Deprecated!
    prompt="A sunset over mountains"
)

MiniMax deprecated M2, M2.1, M2.5, and VL-01 models in the 2026.3.28 release. I needed to switch to image-01.

Environment

OpenClaw 2026.3.28 or later
MiniMax API key configured
Python 3.10+ or Node.js 18+

Solution

MiniMax image-01 supports three modes:

Text-to-Image: Describe what you want
Image-to-Image: Transform an existing image
Aspect Ratio Control: Set output dimensions

Here’s the correct way to use image-01:

┌──────────────────┐     ┌──────────────┐     ┌─────────────┐
│  Input Source    │────▶│  image-01    │────▶│   Output    │
│                  │     │   Model      │     │   Image     │
├──────────────────┤     ├──────────────┤     ├─────────────┤
│ Text Description │     │              │     │ 16:9, 1:1,  │
│ OR               │     │ aspect_ratio │     │ 4:3, etc.   │
│ Source Image     │     │   parameter  │     │             │
└──────────────────┘     └──────────────┘     └─────────────┘

Text-to-Image Generation

I tested text-to-image first. Here’s the working code:

from minimax import MiniMaxClient

client = MiniMaxClient(api_key="${MINIMAX_API_KEY}")

# Generate image from text description
result = client.image.generate(
    model="image-01",
    prompt="A cyberpunk cat with neon collar in a futuristic city at night, highly detailed, 4k quality",
    aspect_ratio="16:9"  # Cinematic widescreen format
)

# Result contains image URL
print(f"Generated image: {result.image_url}")

When I ran this:

Generated image: https://cdn.minimax.com/images/abc123.png

The image matched my prompt. The cyberpunk cat had a glowing neon collar, standing in a rain-soaked futuristic cityscape.

Aspect Ratio Options

I tested different aspect ratios:

# Square format - good for profile pictures
result = client.image.generate(
    model="image-01",
    prompt="Portrait of a woman, soft lighting, professional",
    aspect_ratio="1:1"
)

# Portrait - good for phone wallpapers
result = client.image.generate(
    model="image-01",
    prompt="Mountain landscape with lake reflection",
    aspect_ratio="9:16"
)

# Landscape - good for presentations
result = client.image.generate(
    model="image-01",
    prompt="Team meeting in modern office",
    aspect_ratio="4:3"
)

# Cinematic - good for video thumbnails
result = client.image.generate(
    model="image-01",
    prompt="Action scene with explosions",
    aspect_ratio="16:9"
)

I compared the outputs:

| Aspect Ratio | Dimensions | Best Use Case              |
|--------------|------------|---------------------------|
| 1:1          | 1024x1024  | Profile pictures, icons   |
| 4:3          | 1024x768   | Presentations, blogs      |
| 16:9         | 1024x576   | Video thumbnails, banners |
| 9:16         | 576x1024   | Phone wallpapers, stories |
| 21:9         | 1024x439   | Cinematic posters         |

Image-to-Image Transformation

I then tested image-to-image. I wanted to transform an existing image:

# Transform existing image with new style
result = client.image.generate(
    model="image-01",
    source_image="original_cat.png",  # My source image
    prompt="Same cat but in watercolor painting style, artistic, soft colors",
    aspect_ratio="1:1"  # Keep square format
)

I noticed a key difference from text-to-image:

┌─────────────────────────────────────────────────────────────┐
│                    Mode Comparison                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  TEXT-TO-IMAGE:                                             │
│  prompt only → model creates from scratch                   │
│  example: "A cyberpunk cat..."                              │
│                                                              │
│  IMAGE-TO-IMAGE:                                            │
│  source_image + prompt → model transforms existing          │
│  example: original_cat.png + "watercolor style"             │
│                                                              │
│  KEY: Keep prompts simple for image-to-image                │
│       Complex prompts can distort the source too much       │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Mistake I Made

At first, I used overly complex prompts for image-to-image:

# WRONG: Too complex for image-to-image
result = client.image.generate(
    model="image-01",
    source_image="my_cat.png",
    prompt="Transform into a cyberpunk cat with neon collar, add futuristic city background, rain effects, dramatic lighting, 4k quality, artstation trending"
)

The result distorted my original cat too much. The model tried to add too many new elements.

The fix was to keep prompts focused on transformation:

# CORRECT: Focus on style transformation only
result = client.image.generate(
    model="image-01",
    source_image="my_cat.png",
    prompt="Watercolor painting style, soft artistic brushstrokes"
)

This preserved the original cat’s pose while applying the watercolor effect.

OpenClaw Configuration

I integrated image-01 into my OpenClaw config:

minimax:
  api_key: ${MINIMAX_API_KEY}

  image:
    model: image-01
    default_aspect_ratio: "16:9"
    timeout_seconds: 30

  chat:
    model: M2.7  # Use M2.7 for text generation
    temperature: 0.7

Important: I use M2.7 for chat/text generation, image-01 for image generation. They serve different purposes.

Common Issues

Issue 1: Using Deprecated Models

When I tried older models:

Error: Model 'M2.5' not found. Available models: M2.7, image-01

The fix was simple:

# WRONG
model = "M2.5"

# CORRECT
model = "image-01"  # For images
model = "M2.7"      # For text/chat

Issue 2: Missing Aspect Ratio

I forgot aspect ratio once:

# No aspect_ratio specified
result = client.image.generate(
    model="image-01",
    prompt="A landscape scene"
)

I got a 1:1 square image unexpectedly. The fix:

# Always specify aspect_ratio for predictable output
result = client.image.generate(
    model="image-01",
    prompt="A landscape scene with mountains",
    aspect_ratio="16:9"  # Widescreen for landscapes
)

Issue 3: API Key Not Set

When my API key was missing:

Error: MINIMAX_API_KEY environment variable not set

I fixed it by setting the environment variable:

export MINIMAX_API_KEY="your-api-key-here"

# Or in .env file
echo "MINIMAX_API_KEY=your-api-key-here" >> .env

Best Practices

DO

Specify aspect ratio for every request

aspect_ratio="16:9"  # Always specify

Keep image-to-image prompts simple

prompt="Watercolor style"  # Focus on transformation

Use M2.7 for text, image-01 for images

chat: { model: M2.7 }
image: { model: image-01 }

DON’T

Don’t use deprecated models

# WRONG
model = "M2"  # Deprecated!

# CORRECT
model = "image-01"

Don’t skip aspect ratio

# WRONG - unpredictable output size
client.image.generate(model="image-01", prompt="...")

# CORRECT - predictable dimensions
client.image.generate(model="image-01", prompt="...", aspect_ratio="16:9")

Don’t use complex prompts for image-to-image

# WRONG - distorts source too much
prompt="Add cyberpunk city background, rain, neon lights..."

# CORRECT - focused transformation
prompt="Cyberpunk color palette style"

MiniMax image-01 differs from other image generation models:

┌─────────────────┬────────────────────┬─────────────────────────┐
│ Model           │ Capabilities       │ Notes                   │
├─────────────────┼────────────────────┼─────────────────────────┤
│ DALL-E 3        │ Text-to-image      │ High quality, slow      │
│ Stable Diffusion│ Text + img2img     │ Open source, flexible   │
│ Midjourney      │ Text-to-image      │ Artistic, subscription  │
│ MiniMax image-01│ Text + img2img     │ Fast, integrated API    │
│                 │ + aspect ratio     │                         │
└─────────────────┴────────────────────┴─────────────────────────┘

MiniMax image-01 combines text-to-image and image-to-image in one model with aspect ratio control. This simplifies the stack - one API for all image generation needs.

Summary

In this post, I showed how to use MiniMax image-01 for text-to-image and image-to-image generation. The key points are:

Use image-01 model (M2, M2.1, M2.5, VL-01 are deprecated)
Specify aspect_ratio for predictable output dimensions
Keep image-to-image prompts simple - focus on style transformation
Use M2.7 for text generation, image-01 for image generation
Configure OpenClaw with separate model settings for chat vs image

MiniMax image-01 unifies text-to-image and image-to-image with aspect ratio control. One model handles all image generation needs without juggling multiple services.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!