{"id":31662,"date":"2025-10-21T14:49:25","date_gmt":"2025-10-21T14:49:25","guid":{"rendered":"https:\/\/metaverseplanet.net\/blog\/?p=31662"},"modified":"2026-01-05T09:00:31","modified_gmt":"2026-01-05T09:00:31","slug":"deepseek-ocr-ai-doesnt-just-read-texts-it-sees-them","status":"publish","type":"post","link":"https:\/\/metaverseplanet.net\/blog\/deepseek-ocr-ai-doesnt-just-read-texts-it-sees-them\/","title":{"rendered":"Deepseek OCR: AI Doesn&#8217;t Just Read Texts, It &#8220;Sees&#8221; Them"},"content":{"rendered":"\n<p>Deepseek&#8217;s new OCR system processes texts as images and compresses them up to <strong>10 times<\/strong>. This technology, capable of analyzing 33 million pages in a day, allows AI to read much longer documents.<\/p>\n\n\n\n<p>Deepseek, a Chinese artificial intelligence company, is attracting attention with its new <strong>OCR (Optical Character Recognition)<\/strong> system developed for more efficient processing of text-based documents. The system compresses image-based texts, enabling AI models to process much longer documents without hitting their memory limits.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Processing Text as Visual Data<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"720\" height=\"405\" src=\"https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/10\/indir-3-8.webp\" alt=\"\" class=\"wp-image-31664\" srcset=\"https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/10\/indir-3-8.webp 720w, https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/10\/indir-3-8-300x169.webp 300w, https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/10\/indir-3-8-390x220.webp 390w, https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/10\/indir-3-8-150x84.webp 150w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/figure>\n\n\n\n<p>According to Deepseek&#8217;s technical report, the system analyzes text data in image format instead of processing it directly. This approach significantly reduces the computational load. The new OCR system can compress texts by up to <strong>10 times<\/strong> while retaining <strong>97% of the information<\/strong>.<\/p>\n\n\n\n<p>As known, large language models represent text as <strong>tokens<\/strong>, with each token containing a few characters. Researchers are working to develop models that can process long documents and conversations exceeding millions of tokens, thereby expanding the <strong>context window<\/strong>. However, as the number of tokens that can be processed simultaneously increases, so do the computational costs. Thus, a large token capacity prevents the model&#8217;s memory from filling up even with long documents, but it increases the cost. Deepseek&#8217;s OCR solution, however, processes very long content as if it were an <strong>image<\/strong>, effectively viewing the content as pixels.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Seeing Long Texts as Pixels<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"720\" height=\"320\" src=\"https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/10\/indir-4-5.webp\" alt=\"\" class=\"wp-image-31663\" srcset=\"https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/10\/indir-4-5.webp 720w, https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/10\/indir-4-5-300x133.webp 300w, https:\/\/metaverseplanet.net\/blog\/wp-content\/uploads\/2025\/10\/indir-4-5-150x67.webp 150w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/figure>\n\n\n\n<p>The core of the system consists of two main components: <strong>DeepEncoder<\/strong> and <strong>Deepseek3B-MoE<\/strong>. DeepEncoder, which handles the image processing, operates with 380 million parameters. Deepseek3B-MoE, responsible for text generation, has 570 million active parameters. DeepEncoder combines Meta&#8217;s 80-million-parameter <strong>SAM (Segment Anything Model)<\/strong> and OpenAI&#8217;s 300-million-parameter <strong>CLIP model<\/strong>. An intermediary <strong>16x compressor<\/strong> significantly reduces the image data, increasing processing speed. For example, 4,096 tokens of a $1,024 \\times 1,024$ pixel image are reduced to only <strong>256 tokens<\/strong> after compression.<\/p>\n\n\n\n<p>Deepseek OCR can operate using between 64 and 400 &#8220;<strong>vision tokens<\/strong>,&#8221; depending on the resolution. This number significantly lightens operations that typically require thousands of tokens in classic OCR systems. In OmniDocBench tests, the system outperformed GOT-OCR 2.0 using only 100 vision tokens. It also surpassed the performance of MinerU 2.0, which required over 6,000 tokens, while operating under 800 tokens.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The system, optimized for different document types, uses 64 tokens for simple presentations, 100 tokens for books and reports, and 800 tokens using a special mode called &#8220;<strong>Gundam mode<\/strong>&#8221; for complex newspapers.<\/li>\n\n\n\n<li>Deepseek OCR can process not only text but also complex visual elements like <strong>diagrams, chemical formulas, and geometric shapes<\/strong>. Furthermore, it works in approximately <strong>100 languages<\/strong>, can preserve formatting, and can generate plain text or general visual descriptions if desired.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Processes 33 Million Pages a Day<\/strong><\/h2>\n\n\n\n<p>Approximately 30 million PDF pages were used to train the system. 25 million of this data consisted of English and Chinese documents, and the rest comprised 10 million synthetic diagrams, 5 million chemical formulas, and 1 million geometric shapes.<\/p>\n\n\n\n<p>In real-world use, Deepseek OCR achieves a very high processing capacity. The system can process over 200,000 documents a day on a single Nvidia A100 GPU. With 20 servers, each housing eight A100 GPUs, this capacity increases to <strong>33 million pages per day<\/strong>. This speed has the potential to greatly facilitate the production of training data for new AI models. Both the code and model weights are publicly available (accessible via the source section).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">You Might Also Like;<\/h3>\n\n\n<ul class=\"wp-block-latest-posts__list wp-block-latest-posts\"><li><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/metaverseplanet.net\/blog\/the-dark-side-of-nanotechnology\/\">The Dark Side of Nanotechnology: Could Microscopic Swarms Erase Billions?<\/a><\/li>\n<li><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/metaverseplanet.net\/blog\/the-illusion-of-digital-immortality\/\">The Illusion of Digital Immortality: Are You Really Uploading Your Mind?<\/a><\/li>\n<li><a class=\"wp-block-latest-posts__post-title\" href=\"https:\/\/metaverseplanet.net\/blog\/artemis-2s-deep-space-eclipse\/\">The View That Changes Everything: Artemis 2\u2019s Deep Space Eclipse<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Deepseek&#8217;s new OCR system processes texts as images and compresses them up to 10 times. This technology, capable of analyzing 33 million pages in a day, allows AI to read much longer documents. Deepseek, a Chinese artificial intelligence company, is attracting attention with its new OCR (Optical Character Recognition) system developed for more efficient processing &hellip;<\/p>\n","protected":false},"author":1,"featured_media":31665,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"googlesitekit_rrm_CAown96uCw:productID":"","footnotes":""},"categories":[332],"tags":[335],"class_list":["post-31662","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-information","tag-ai-news"],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/posts\/31662","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/comments?post=31662"}],"version-history":[{"count":0,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/posts\/31662\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/media\/31665"}],"wp:attachment":[{"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/media?parent=31662"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/categories?post=31662"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metaverseplanet.net\/blog\/wp-json\/wp\/v2\/tags?post=31662"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}