<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>多模态学习 on Awen's Paper Libarary</title><link>https://a23wen.github.io/paper-libarary/categories/%E5%A4%9A%E6%A8%A1%E6%80%81%E5%AD%A6%E4%B9%A0/</link><description>Recent content in 多模态学习 on Awen's Paper Libarary</description><generator>Hugo</generator><language>zh-cn</language><lastBuildDate>Mon, 13 Apr 2026 00:24:23 +0800</lastBuildDate><atom:link href="https://a23wen.github.io/paper-libarary/categories/%E5%A4%9A%E6%A8%A1%E6%80%81%E5%AD%A6%E4%B9%A0/index.xml" rel="self" type="application/rss+xml"/><item><title>Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models</title><link>https://a23wen.github.io/paper-libarary/papers/vision-r1/</link><pubDate>Mon, 13 Apr 2026 00:21:57 +0800</pubDate><guid>https://a23wen.github.io/paper-libarary/papers/vision-r1/</guid><description>Vision-R1 研究如何把 DeepSeek-R1 式强化学习迁移到多模态大模型上。作者先通过 modality bridging 构造 200K 的多模态 CoT 冷启动数据，再用 PTST 配合 GRPO 和硬格式结果奖励逐步放开推理长度，最终让 7B 模型在 MathVista 上达到 73.5%，逼近 OpenAI o1 级别的多模态数学推理表现。</description></item></channel></rss>