<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>RL on Blowfish</title><link>https://huggingaha.github.io/tags/rl/</link><description>Recent content in RL on Blowfish</description><generator>Hugo -- gohugo.io</generator><language>zh-cn</language><managingEditor>huggingaha@gmail.com (时影)</managingEditor><webMaster>huggingaha@gmail.com (时影)</webMaster><copyright>© 2026 时影</copyright><lastBuildDate>Tue, 23 Sep 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://huggingaha.github.io/tags/rl/index.xml" rel="self" type="application/rss+xml"/><item><title>译文-Small Leak Can Sink a Great Ship—Boost RL Training on MoE with 𝑰𝒄𝒆𝑷𝒐𝒑!</title><link>https://huggingaha.github.io/blogs/llm/small-leak-can-sink-a-great-ship-boost-rl-training-on-moe-with-icepop/</link><pubDate>Tue, 23 Sep 2025 00:00:00 +0000</pubDate><author>huggingaha@gmail.com (时影)</author><guid>https://huggingaha.github.io/blogs/llm/small-leak-can-sink-a-great-ship-boost-rl-training-on-moe-with-icepop/</guid><description/></item><item><title>击败 LLM 推理中的非确定性-Thinking Machines</title><link>https://huggingaha.github.io/blogs/llm/effective-context-engineering-for-ai-agents-claude/</link><pubDate>Sun, 14 Sep 2025 00:00:00 +0000</pubDate><author>huggingaha@gmail.com (时影)</author><guid>https://huggingaha.github.io/blogs/llm/effective-context-engineering-for-ai-agents-claude/</guid><description/></item><item><title>GSPO：组序列策略优化</title><link>https://huggingaha.github.io/blogs/llm/gspo-rl-llm/</link><pubDate>Mon, 28 Jul 2025 00:00:00 +0000</pubDate><author>huggingaha@gmail.com (时影)</author><guid>https://huggingaha.github.io/blogs/llm/gspo-rl-llm/</guid><description/></item></channel></rss>