RL on Blowfish

RL on Blowfishhttps://huggingaha.github.io/tags/rl/Recent content in RL on BlowfishHugo -- gohugo.iozh-cnhuggingaha@gmail.com (时影)huggingaha@gmail.com (时影)© 2026 时影Tue, 23 Sep 2025 00:00:00 +0000译文-Small Leak Can Sink a Great Ship—Boost RL Training on MoE with 𝑰𝒄𝒆𝑷𝒐𝒑!https://huggingaha.github.io/blogs/llm/small-leak-can-sink-a-great-ship-boost-rl-training-on-moe-with-icepop/Tue, 23 Sep 2025 00:00:00 +0000huggingaha@gmail.com (时影)https://huggingaha.github.io/blogs/llm/small-leak-can-sink-a-great-ship-boost-rl-training-on-moe-with-icepop/击败 LLM 推理中的非确定性-Thinking Machineshttps://huggingaha.github.io/blogs/llm/effective-context-engineering-for-ai-agents-claude/Sun, 14 Sep 2025 00:00:00 +0000huggingaha@gmail.com (时影)https://huggingaha.github.io/blogs/llm/effective-context-engineering-for-ai-agents-claude/GSPO：组序列策略优化https://huggingaha.github.io/blogs/llm/gspo-rl-llm/Mon, 28 Jul 2025 00:00:00 +0000huggingaha@gmail.com (时影)https://huggingaha.github.io/blogs/llm/gspo-rl-llm/