Shopping

【naughtly librarians and the eroticism of intellec】

2025-06-27 00:58:25 views

DeepSeek has released a new paper,naughtly librarians and the eroticism of intellec with co-founder Liang Wenfeng credited as a contributor, detailing how its latest large language model DeepSeek-V3 achieves efficient training and inference using only 2,048 H800 GPUs – significantly fewer than the tens of thousands typically required. The team attributes this efficiency to four key innovations: memory optimization through multi-head latent attention (MLA), computational savings via a Mixture-of-Experts (MoE) design with FP8 precision, communication improvements using a multi-plane network topology, and faster inference through multi-token prediction (MTP). With MLA, KV cache memory usage is cut to just 70KB per token, up to 1/7 that of competing models. MoE architecture activates only 37 billion of the model’s 671 billion parameters per forward pass, reducing training costs by 90% compared to dense models. FP8 training further halves compute and memory usage, with minimal accuracy tradeoff. Beyond the model, the paper also outlines five future directions for AI hardware design, advocating for tighter integration between software and hardware to address memory, compute, and networking bottlenecks. [36Kr, in Chinese]

Tags:

Expert writer and contributor. Passionate about sharing knowledge and insights on various topics.

Related Articles

Great white shark leaps into tiny boat, fisherman treats it like NBD

Great white shark leaps into tiny boat, fisherman treats it like NBD

2025-06-27 00:57 801 views

Read More

SpaceX booster accidentally falls into ocean after rough conditions

SpaceX booster accidentally falls into ocean after rough conditions

2025-06-27 00:32 882 views

Read More

TikTok and Universal Music reach a deal, restoring artists to platform

TikTok and Universal Music reach a deal, restoring artists to platform

2025-06-27 00:23 1033 views

Read More

SpaceX landed three of its boosters for the first time, and yep, it was impressive

SpaceX landed three of its boosters for the first time, and yep, it was impressive

2025-06-27 00:20 2662 views

Read More

Best Samsung Galaxy Watch Ultra deal: Save $200 at Best Buy

Best Samsung Galaxy Watch Ultra deal: Save $200 at Best Buy

2025-06-26 23:47 1362 views

Read More

Substack introduces new Chat features

Substack introduces new Chat features

2025-06-26 23:34 1903 views

Read More

Former Xpeng Motors purchasing head investigated for corruption · TechNode

Former Xpeng Motors purchasing head investigated for corruption · TechNode

2025-06-26 23:34 1614 views

Read More

Lenovo leads the global PC market in Q3 · TechNode

Lenovo leads the global PC market in Q3 · TechNode

2025-06-26 22:59 859 views

Read More

Best GPU deal: Get the MSI RTX 5080 for $1,249.99 at Best Buy

Best GPU deal: Get the MSI RTX 5080 for $1,249.99 at Best Buy

2025-06-26 22:13 2583 views

Read More

Expert writer and contributor. Passionate about sharing knowledge and insights.

120+ Articles

10K+ Followers

5+ Years

nkkio 2025-06-26 23:58

Best Presidents' Day deal: Save $250 on Peloton Bike

View Article

Gk3A 2025-06-26 23:48

CSK vs. PBKS 2024 livestream: Watch IPL for free

View Article

08mNc 2025-06-26 22:56

TikTok takes further action amid disinformation on the Israel

View Article

NDHiT 2025-06-26 22:56

Alibaba to close research institute Luohan Academy after five years of operations · TechNode

View Article

ZJkmT 2025-06-26 22:16

Trump who? Tech giants join massive effort to uphold Paris Agreement

View Article