{"id":28056,"date":"2025-09-07T03:00:41","date_gmt":"2025-09-07T03:00:41","guid":{"rendered":"https:\/\/metaverseplanet.net\/blog\/?p=28056"},"modified":"2026-01-05T09:10:45","modified_gmt":"2026-01-05T09:10:45","slug":"ai-that-turns-photos-into-3d-worlds-tencent-voyager","status":"publish","type":"post","link":"https:\/\/metaverseplanet.net\/blog\/ai-that-turns-photos-into-3d-worlds-tencent-voyager\/","title":{"rendered":"AI That Turns Photos into 3D Worlds: Tencent Voyager"},"content":{"rendered":"\n<p>Tencent has introduced <strong>Voyager<\/strong>, an impressive new AI model that can transform a single photograph into a three-dimensional scene. The model simultaneously generates both an RGB video and depth information, offering a powerful approach to 3D reconstruction without the need for traditional modeling techniques. However, it requires a significant amount of hardware to run effectively.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">How Voyager Works<\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"720\" height=\"480\" src=\"https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/09\/indir-1-6.webp\" alt=\"\" class=\"wp-image-28057\" srcset=\"https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/09\/indir-1-6.webp 720w, https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/09\/indir-1-6-300x200.webp 300w, https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/09\/indir-1-6-150x100.webp 150w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/figure>\n\n\n\n<p>The <strong>HunyuanWorld-Voyager<\/strong> model takes a single image and a user-defined camera path\u2014such as a pan, tilt, or dolly-in motion\u2014to generate a short video. It produces both the video and a simultaneous depth map, ensuring that the spatial relationships of objects in the scene remain consistent. The system maintains geometric coherence by comparing each new frame with the previous content using 3D point clouds. However, distortions can still occur with long or complex camera movements, particularly with 360-degree rotations.<\/p>\n\n\n\n<p><strong><em><a href=\"https:\/\/metaverseplanet.net\/blog\/gaming-giant-tencent-winks-at-metaverse\/\" data-type=\"post\" data-id=\"4187\">Tencent<\/a><\/em><\/strong>&#8216;s technical report highlights an additional component called the <strong>&#8220;world cache,&#8221;<\/strong> which stores data from each new frame. This allows for data reuse in subsequent frames, significantly preserving geometric consistency over videos that are several minutes long.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Training and Requirements<\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"720\" height=\"275\" src=\"https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/09\/indir-12.webp\" alt=\"\" class=\"wp-image-28058\" srcset=\"https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/09\/indir-12.webp 720w, https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/09\/indir-12-300x115.webp 300w, https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/09\/indir-12-150x57.webp 150w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/figure>\n\n\n\n<p>Voyager was trained on a massive dataset of over <strong>100,000 real and synthetic video clips<\/strong>, including scenes from <strong>Unreal Engine<\/strong> environments. This extensive training helped the model understand various camera movements. The training process used an automated depth estimation method, eliminating the need for manual labeling.<\/p>\n\n\n\n<p>While technologically powerful, Voyager has high hardware requirements. Running the model at a <strong>540p resolution<\/strong> requires <strong>60 GB of GPU memory<\/strong>, and optimal results need <strong>80 GB<\/strong>. The system supports multi-GPU scaling, with an <strong>8-GPU<\/strong> setup running approximately <strong>6.7 times faster<\/strong> than a single GPU. The model weights have been made available to researchers on Hugging Face.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Voyager vs. Other AI Models<\/h2>\n\n\n\n<p>Voyager&#8217;s approach sets it apart from existing video generation models. Unlike OpenAI&#8217;s <strong>Sora<\/strong>, which focuses on visual realism, Voyager prioritizes <strong>geometric consistency<\/strong> between frames. This focus helped it achieve a top score of <strong>77.62<\/strong> on Stanford&#8217;s <strong>WorldScore benchmark<\/strong>, outperforming competitors like WonderWorld and CogVideoX-I2V. However, it still has some limitations in precise camera control.<\/p>\n\n\n\n<p>Additionally, there are some licensing restrictions for Voyager. Its use is prohibited in the <strong>European Union<\/strong>, the <strong>United Kingdom<\/strong>, and <strong>South Korea<\/strong>. Commercial applications serving over <strong>100 million active users<\/strong> require an additional agreement.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">You Might Also Like;<\/h3>\n\n\n<ul class=\"wp-block-latest-posts__list wp-block-latest-posts\"><li><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/metaverseplanet.net\/blog\/can-we-actually-survive-on-the-moon\/\">The Harsh Reality of Lunar Colonies: Can We Actually Survive on the Moon?<\/a><\/li>\n<li><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/metaverseplanet.net\/blog\/the-hard-truth-about-the-gta-vi-pc-release-delay\/\">The Hard Truth About the GTA VI PC Release Delay<\/a><\/li>\n<li><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/metaverseplanet.net\/blog\/boston-dynamics-in-crisis\/\">Boston Dynamics in Crisis: The Mass Production Clash with Hyundai<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Tencent has introduced Voyager, an impressive new AI model that can transform a single photograph into a three-dimensional scene. The model simultaneously generates both an RGB video and depth information, offering a powerful approach to 3D reconstruction without the need for traditional modeling techniques. However, it requires a significant amount of hardware to run effectively. &hellip;<\/p>\n","protected":false},"author":1,"featured_media":28060,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"googlesitekit_rrm_CAown96uCw:productID":"","footnotes":""},"categories":[332,323],"tags":[335,334,210,301,331],"class_list":["post-28056","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-information","category-cyberculture","tag-ai-news","tag-ai-tools","tag-ai-tools-news","tag-ai-videos","tag-videos"],"amp_enabled":false,"_links":{"self":[{"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/posts\/28056","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/comments?post=28056"}],"version-history":[{"count":0,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/posts\/28056\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/media\/28060"}],"wp:attachment":[{"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/media?parent=28056"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/categories?post=28056"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/tags?post=28056"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}