Simple ComfyUI Workflow for Hunyuan Image-to-Video (i2v) Using GGUF Models

AI-powered video generation is evolving rapidly, and Hunyuan Video I2V stands out as a versatile tool for converting images into motion. With support for GGUF models, it enables smooth video creation even on low-power GPUs, making it a practical choice for a wide range of users.

This guide will walk you through a straightforward workflow for using Hunyuan I2V with GGUF models. Whether you’re new to AI-generated motion or looking for an efficient and lightweight setup, this method will help you create high-quality, dynamic videos with ease.

Models Download

GGUF Models: The Hunyuan i2v GGUF models are here. I have a RTX 3090 with 24GB VRAM and I used hunyuan_video_I2V-Q8_0.gguf. If you have less VRAM, use the other variants like Q4 or Q6.
Text Encoder: Download the llava_llama3_fp8_scaled.safetensors file and clip_l.safetensors. Place them in ComfyUI\models\text_encoders
Clip Vision: Download the llava_llama3_vision.safetensors file and place it in ComfyUI\models\clip_vision
VAE: Download the hunyuan_video_vae_bf16.safetensors file and put it in your ComfyUI\models\vae folder

Installation

Update your ComfyUI to the latest version.
Drag the following full size image to the ComfyUI canvas.
Use the ComfyUI Manager to install missing nodes.
Restart ComfyUI if necessary

Nodes

This loads the GGUF model.

This loads the two clip models.

This loads the VAE model.

This loads the clip_vision model.

Positive prompt. Note that there is no negative prompt for Hunyuan i2v.

Specify steps here. I found that 20 steps are not enough for quality generations. 30 steps are better.

Specify the size and length. Length is the number of frames. 49 frames are about 2 seconds if the frame rate is 24.

Specify frame rate here. crf is for the quality of the h264 encoding. Use 17 to 19 for better results.

Examples

Example 1: 480 x 848 49 frames 30 steps

Input image:

Prompt: The girl uses one hand to gently touch her hair. she tilts her head gently. her eyes blink naturally. the camera slowly zooms in for a soft-focus close-up

Output video:

Video Player

Media error: Format(s) not supported or source(s) not found

Download File: https://www.kombitz.com/wp-content/uploads/2025/03/hunyuan_00010.mp4?_=1

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Example 2: I don’t quite like the output of the first example because the character consistency is not good. I tried to increase the resolution: 720 x 1280 49 frames 30 steps. The consistency is better, but still not quite right.

Video Player

Media error: Format(s) not supported or source(s) not found

Download File: https://www.kombitz.com/wp-content/uploads/2025/03/hunyuan_00019.mp4?_=2

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Example 3: 480 x 848 49 frames 30 steps

Input image:

Prompt: The girl walks forward slowly. the camera zooms in slowly.

Output video:

Video Player

Media error: Format(s) not supported or source(s) not found

Download File: https://www.kombitz.com/wp-content/uploads/2025/03/hunyuan_00015.mp4?_=3

00:00

Use Up/Down Arrow keys to increase or decrease volume.

Comparison with Wan2.1

Comparing the Hunyaun i2v model to Wan2.1 i2v model, here are my observations.

Feature	Hunyuan	Wan2.1
Video Quality	OK	Good
Similarity to Input Image	OK	Good
Speed	Fast	Slow
Prompt Adherence	OK	Good
VRAM Requirement	Lower	Higher

This comparison highlights the key differences between Hunyuan Video I2V and Wan2.1, helping users decide which model best suits their needs:

Hunyuan Video I2V is faster and requires less VRAM, making it ideal for users with lower-end hardware. However, its video quality, prompt adherence, and similarity to the input image are only okay compared to Wan2.1.
Wan2.1, on the other hand, delivers better video quality, stronger prompt adherence, and a higher resemblance to the input image, but it comes at the cost of slower speed and higher VRAM requirements.

If you prioritize speed and efficiency, Hunyuan is a solid choice. If you value fidelity and quality, Wan2.1 may be the better option despite the extra resources needed.

Conclusion

By leveraging GGUF models, Hunyuan Video I2V makes image-to-video conversion more efficient and accessible. This streamlined workflow allows you to quickly generate smooth animations, experiment with different settings, and enhance your creative projects with minimal effort.

As AI video tools continue to improve, lightweight solutions like Hunyuan I2V GGUF provide an excellent way to explore motion generation without heavy hardware requirements. Whether for animation, content creation, or AI-driven visuals, this approach offers a solid and practical foundation for bringing static images to life.

Further Reading

Simple ComfyUI Workflow for WAN2.1 Image-to-Video (i2v) Using GGUF Models

Simple ComfyUI Workflow for WAN2.1 Text-to-Video (t2v) Using GGUF Models

A Simple ComfyUI Workflow for Video Upscaling and Interpolation

This post may contain affiliated links. When you click on the link and purchase a product, we receive a small commision to keep us running. Thanks.

kombitz

Tech tips, tricks, how-tos and new products

Simple ComfyUI Workflow for Hunyuan Image-to-Video (i2v) Using GGUF Models

Conclusion

Related

Be the first to comment

Leave a ReplyCancel reply

Conclusion

Share this:

Related

Be the first to comment

Leave a ReplyCancel reply