CogVideoX是什么?
CogVideoX是由智譜AI開發的視頻生成大模型,具備強大的視頻生成能力、只需輸入文本或圖片就可以輕松完成視頻制作。此次開源的是CogVideoX-2B,20億參數規格的模型,是CogVideoX 系列視頻生成模型中的第一個模型,與智譜推出的AI視頻生成產品清影同源。功能更強大、參數更大的模型即將推出。
CogVideoX-2B支持以英語輸入最長226個tokens的提示詞,消耗36GB顯存,生成分辨率為720*480的6秒視頻。


CogVideoX的核心技術
- 三維變分自編碼器結構(3D VAE):智譜AI自主研發的這一結構能將原始視頻數據壓縮至原始大小的2%,降低訓練成本和難度。結合3D RoPE位置編碼模塊,提升了時間維度上幀間關系的捕捉能力,建立視頻中的長期依賴關系。
- 端到端視頻理解模型:增強了模型對文本的理解和對指令的遵循能力,確保生成的視頻更符合用戶需求,能處理超長且復雜的prompt指令。
- 文本、時間、空間三維一體融合的transformer架構:創新性設計了Expert Block實現文本與視頻模態空間的對齊,并通過Full Attention機制優化模態間交互效果。
CogVideoX的生成案例
生成該視頻的提示詞:
A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting.
一艘精致的木制玩具船,桅桿和船帆上雕刻著復雜的圖案,在模擬海浪的藍色長毛絨地毯上平穩地滑行。船身漆成濃郁的棕色,并帶有小窗戶。地毯柔軟而有質感,提供了一個完美的背景,就像一片廣闊的海洋。船的周圍環繞著各種玩具和兒童用品,暗示著一個充滿童趣的環境。這個場景捕捉到了童年的天真和想象力,玩具船的旅程象征著在異想天開的室內環境中的無盡冒險。
生成該視頻的提示詞:
The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from its tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
鏡頭跟在一輛帶黑色車頂行李架的白色復古越野車后面,越野車在陡峭的山坡上沿著松樹環繞的陡峭土路上飛馳,輪胎上的塵土飛揚,陽光照在越野車上,越野車在土路上飛馳,給整個場景投下了溫暖的光輝。土路緩緩彎向遠方,看不到其他車輛。道路兩旁的樹木都是紅杉,還有零星的綠色植物。從后方看,汽車輕松地沿著彎道行駛,仿佛在崎嶇的地形上行駛。土路本身被陡峭的丘陵和山脈環繞,頭頂是晴朗的藍天和飄渺的白云。
生成該視頻的提示詞:
A street artist, clad in a worn-out denim jacket and a colorful bandana, stands before a vast concrete wall in the heart, holding a can of spray paint, spray-painting a colorful bird on a mottled wall.
一位街頭藝術家身著破舊的牛仔夾克,扎著彩色頭巾,站在市中心一堵巨大的水泥墻前,手持一罐噴漆,在斑駁的墻面上噴繪著一只色彩斑斕的小鳥。
生成該視頻的提示詞:
In the haunting backdrop of a war-torn city, where ruins and crumbled walls tell a story of devastation, a poignant close-up frames a young girl. Her face is smudged with ash, a silent testament to the chaos around her. Her eyes glistening with a mix of sorrow and resilience, capturing the raw emotion of a world that has lost its innocence to the ravages of conflict.
在一個飽受戰爭蹂躪的城市,廢墟和殘垣斷壁訴說著滿目瘡痍,在這個令人心碎的背景下,一個凄美的特寫鏡頭定格了一個年輕的女孩。她的臉上沾滿了灰燼,無聲地證明著周圍的混亂。她的眼睛里閃爍著悲傷和堅韌,捕捉到了這個因沖突而失去天真世界的原始情感。
如何使用CogVideoX?
CogVideoX已提供模型下載、在線體驗和官方API服務。
1、模型和代碼下載:
- CogVideoX-2B模型下載地址:https://huggingface.co/THUDM/CogVideoX-2b
- CogVideoX GitHub地址:https://github.com/THUDM/CogVideo
2、企業和開發者:通過智譜大模型開放平臺bigmodel.cn調用API服務https://open.bigmodel.cn/dev/howuse/cogvideox
3、個人用戶:CogVideoX模型已在智譜清言的PC端、移動應用端及小程序端上線,可通過清影免費體驗。
