GENAI - IMAGE AND VIDEO
Thur 6th March 2025

Technical Advancements in Generative Models

Alibaba’s Wan 2.1 Text/Image-to-Video Model

Alibaba released Wan 2.1, a powerful text-to-video and image-to-video model. It has topped the VBench leaderboard, producing high-quality videos that rival OpenAI’s Sora.

Tencent’s HunyuanVideo (Image-to-Video)

Tencent’s HunyuanVideo is a 13B-parameter model with high-quality 720p video generation. The image-to-video component was open-sourced on March 5th, allowing creators to animate still images.

CogView4 High-Resolution Image Model

Researchers from Tsinghua University released CogView4, a text-to-image diffusion model capable of 2048×2048 resolution. It is fully open-source under Apache 2.0 and optimized for high-quality outputs.

“Chroma” – Community-Developed SD Model

A community-driven project, Chroma, is an 8.9B parameter diffusion model built on the FLUX architecture. It aims to be fully open, uncensored, and community-driven, with training logs available for transparency.

Community Projects, Collaborations & Showcases

Animating Old Photos and Memories

Users have been bringing still images to life using AI video generation. Many shared results using Wan 2.1, including historical photos and pet pictures animated into short video clips. One user recreated a moving video of their grandparents from a single image, while another animated a decades-old picture of their childhood pet.

AI Meme Recreations

The community revisited viral AI-generated memes, recreating classics like Will Smith eating spaghetti using open-source tools. Users compared outputs across different models, testing how modern generative AI can improve upon early viral AI clips.

Creative AI Videos and Art

Community members pushed artistic boundaries with AI-generated animations. Highlights include an Elden Ring-inspired video created using HunyuanVideo and Wan 2.1, as well as a short animated skit titled “Don’t touch her belly”. Another user generated a faux weather forecast using AI-generated clips.

Collaborative Community Efforts

The subreddit has been a hub for collaboration, with users sharing feedback and techniques. Discussions covered reducing flicker in AI videos, refining prompt engineering, and improving character consistency across frames. Community-driven models like Chroma have also benefited from collective input.

Tutorials, Tools & Workflow Improvements

Running Wan 2.1 Locally (Performance Tips)

Users have shared guides on optimizing Wan 2.1 for local use. It can run on consumer GPUs with as little as 12GB VRAM, producing 8-second 480p videos. Performance optimizations include using lower precision settings and leveraging developer kijai’s Wan wrapper for up to 50% speed improvements.

ComfyUI Integration & Workflows

Users quickly integrated HunyuanVideo and Wan 2.1 into ComfyUI, a node-based workflow tool. Shared guides include step-by-step setups and templates like the “Channel Wan” workflow, which generates AI-generated weather reports.

New Model Resources and Tutorials

For CogView4, users compiled prompt guides, interactive demos, and fine-tuning scripts. Similarly, Chroma offers training logs and a ComfyUI workflow JSON for immediate experimentation.

Avoiding Pitfalls and Improving Quality

Discussions covered reducing flicker, maintaining character consistency, and leveraging TrueCFG for coherent outputs. Many users combined Stable Diffusion XL with video models for multimodal workflows, improving stability via frame interpolation.