Yihua's Blog

Home
About
Archive

Archive

stay hungry stay foolish

Show All ³⁹ C++ ²⁰ Efficient AI ⁵ Reinforcement Learning ⁵ DeepSeek-R1 ³ Trustworthy AI ³ AI Infra ¹ DeepSeek ¹ Machine Unlearning ¹ Mixture of Expert ¹ Music ¹ Visual SLAM ¹

A Role Shift for AI Infra: From Foundational Support to a Core Engine of Innovation

AI Market Insights

From GRPO to DAPO and GSPO: What, Why and How [En/中]

GRPO 的进化之路：从 GRPO 走向 DAPO 和 GSPO

Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence [En/中]

从 RL for LLM 视角重新理解 KL 估计：读《Approximating KL Divergence》笔记

Decorators in Machine Learning Projects [En/中]

机器学习中的装饰器

《Bauklötze》音乐解构

积木崩塌时的命运回响，泽野弘之用音符砌筑的巨人悲歌

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Background [En/中]

DualPipe 深入浅出：没有分布式训练基础也能看懂的 DualPipe 全方位讲解

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment [En/中]

大语言模型 RLHF 全链路揭秘：从策略梯度、PPO、GAE 到 DPO 的实战指南

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge [En/中]

DeepSeek-R1 技术剖析：没有强化学习基础也能看懂的 PPO & GRPO

Why Cache 32 Heads When One Latent Variable Suffices? A Theory-to-Code Guide to DeepSeek’s MLA for KV-Cache [En/中]

从多头共享到潜变量：DeepSeek的MLA在低秩投影与按需解压中重新定义 KV-Cache

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning [En/中]

千呼万唤始出来：DeepSeek-R1 如何通过强化学习实现复杂推理

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons [En/中]

关于 MoE 大模型负载均衡策略演进的回顾：坑点与经验教训

2024

Patching the Foundation Models: Pitfalls and Pains in Machine Unlearning [En/中]

给大模型打打补丁：机器反学习方法中的陷阱与痛点

2020

读书笔记——重述《Effective C++》[中/En]

Reading Notes - Talk about Effective C++ in My Own Words

漫谈C++——C++17中的constexpr

C++学习笔记

漫谈C++——const真理大讨论之 const的线程安全

C++学习笔记

漫谈C++——const真理大讨论之 mutable

C++学习笔记

漫谈C++——const真理大讨论之语法和语义const

C++学习笔记

漫谈C++——谈谈在函数中使用auto

C++学习笔记

漫谈C++——工厂模式中的通行证策略

C++学习笔记

漫谈C++——动态与静态

C++学习笔记

Visual Slam笔记——李群和李代数

Visual SLAM笔记

漫谈C++——谈谈在变量中使用auto

C++学习笔记

漫谈C++——override和final让你的虚函数更安全

C++学习笔记

漫谈C++——从编译期常量到constexpr（三）

C++学习笔记

漫谈C++——从编译期常量到constexpr（二）

C++学习笔记

漫谈C++——从编译期常量到constexpr（一）

C++学习笔记

漫谈C++——C++17新特性之std::optional

C++学习笔记

漫谈C++——用了这么久，你真的懂nullptr吗

C++学习笔记

漫谈C++——起底万能指针void*

C++学习笔记

漫谈C++——右值引用与移动语义

C++学习笔记

漫谈C++——强枚举到底有多强

C++学习笔记

算法笔记——单调栈

Leetcode刷题总结五

漫谈C++——顶层和底层const

C++学习笔记

刷题笔记——Bitset详解

C++学习笔记

刷题笔记——卡特兰数

LeetCode刷题总结四

刷题笔记——快慢指针

LeetCode刷题总结三

刷题笔记——滑动窗口

LeetCode刷题总结二

刷题笔记——异或的用法

LeetCode 刷题总结一

2019

LeetCode 刷题记录

思路与坑

Copyright © Yihua's Blog 2025