Introducing visual references: create more consistent AI videos

Guide AI video generation with up to five images and one video. Create more consistent characters, styles, and motion with Visual References.

Ryan Cheng 6min read
A person in a leopard print coat stands in a supermarket aisle, with text overlays about AI video generation using visual references.

Every great creative project starts long before the final piece comes to life.

A director doesn’t walk onto set with a single sentence and hope for the best. They arrive with moodboards, references, storyboards, visual inspirations, and examples that help communicate exactly what they’re trying to create. Designers collect inspiration. Filmmakers study camera movements. Photographers build visual references before they ever pick up a camera.

In creative work, showing is often more powerful than telling.

While prompts are incredibly powerful, they can only communicate so much. Sometimes you know exactly how you want something to look, feel, or move — but describing that vision in words can feel like explaining a color to someone who can’t see it.

That’s why we have launched visual references in Envato’s AI Video Generator.

You can now upload up to 5 images and 1 video alongside your prompt, providing the model with visual context for your subject, style, motion, and creative direction. The result is more consistent, more on-brief videos that start closer to your vision from the very first generation.

The challenge with limited context

Text prompts are great at communicating ideas.

They’re less effective at communicating specifics.

You might prompt for “a cinematic beach scene at golden hour” and receive something beautiful. But if you have a very specific version of golden-hour lighting in mind—or a particular character, product, environment, or camera style—there’s often a gap between what you imagine and what the model interprets.

The same challenge exists when working from a single image. While one reference can provide useful guidance, it rarely captures every detail, angle, style cue, or creative nuance that makes a project unique.

This is one of the biggest challenges creators face when working with AI tools. Not because the models aren’t capable, but because creative direction is rarely communicated through a single prompt or image alone.

When creative teams work together, they share references.

When clients provide feedback, they share examples.

When directors brief cinematographers, they point to scenes, images, and footage that capture the feeling they’re trying to achieve.

Visual references brings that same workflow into AI video creation.

What are visual references in Envato’s AI video generator?

Visual references allow you to upload multiple images and a video alongside your prompt to guide the final result.

Instead of relying entirely on text, you can show the model what matters.

Use images to define:

  • Characters and subjects
  • Products and objects
  • Visual styles
  • Locations and environments

Use video references to define:

  • Motion and action
  • Camera movement
  • Timing and pacing
  • Environmental context
  • Overall energy and feel

Combined with a clear prompt, these references give the AI video generator significantly more context to work from.

Think of it less as adding files and more as providing creative direction.

More context means more creative control

One of the most common frustrations with AI video generation is unpredictability.

You know what you want.

The model knows what you said.

But somewhere between those two things, the result can drift.

Visual references help reduce that gap.

By providing examples of the subjects, styles, and motion you want to see, you’re giving the model stronger signals about what should remain consistent throughout the generation process.

This creates several practical benefits:

1. Better character consistency

Characters are often one of the hardest elements to maintain across AI-generated content.

A character’s clothing might change. Their appearance may shift. Their overall look can drift from what you originally imagined.

By providing visual references for your character, you’re giving the model a much clearer understanding of who should appear in the final video.

2. Stronger visual direction

Whether you’re aiming for a cinematic look, a stylized aesthetic, or a specific color palette, reference images help communicate visual intent far more effectively than adjectives alone.

Instead of describing a mood, you can show it.

3. More intentional motion

Video references add another layer of control.

You can use them to influence camera movement, pacing, energy, and overall motion, helping the final result feel closer to the style you’re aiming for.

It’s the difference between saying “make it feel cinematic” and showing exactly what cinematic movement looks like.

How to use visual references

Getting started only takes a few steps:

  1. Open Envato’s AI video generator.
  2. Upload up to five images and one video.
  3. Write a prompt explaining how each reference should influence the final output.
  4. Generate your video.

A simple example

Imagine you want to create a slow-motion video of a basketball player performing a slam dunk in a packed arena.

With visual references, you could upload:

  • An image of a basketball player
  • An image of a professional basketball jersey
  • An image that captures your preferred cinematic color grading
  • A video reference showing a slow-motion slam dunk

Then prompt:

Create a cinematic basketball highlight reel featuring the player from the reference image wearing the jersey, performing a powerful slam dunk. Use the motion and camera movement from the video reference, with the dramatic lighting and color grading from the style reference.

Rather than asking the model to imagine every detail from scratch, you’re giving it visual context for the character, accessories, style, and environment all at once.

The result is a generation that’s much more aligned with your original idea.

Built for experimentation

Creative work is rarely linear.

You try something.

You tweak it.

You discover a better direction halfway through.

Visual references support that workflow.

Adding or changing references gives you a flexible way to explore different creative directions. For example, if you want to generate a different character, product, or visual style, you can update your references and generate a new result while refining the rest of your prompt as needed. This makes it easier to experiment with different ideas and iterate on your creative concepts.

Sometimes a small change in reference material is all it takes to unlock a dramatically better result.

Show the AI what you mean

The best creative briefs have always included references.

Now your AI video workflow can too.

Visual references help bridge the gap between what’s in your head and what appears on screen by providing our AI video generator with the context it needs to better understand your creative intent.

Instead of relying entirely on text, you can now guide your generations with images, motion, style, and visual examples that communicate exactly what you’re trying to create.

Because sometimes the fastest way to explain an idea isn’t with more words.

It’s with a picture.

Ready to try it?

Upload your images, add a video reference, and see how much further visual context can take your next generation.

Related Posts