DeepSeek_R1论文

文件大小：1.05MB
运行平台：Windows
开发工具：PDF
下载鸟蛋：免费

说明

DeepSeek语言模型

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs viaReinforcement Learning

Abstract

We introduce our ffrst-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised ffne-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguingreasoning behaviors. However, it encounters challenges such as poor readability, and languagemixing. To address these issues and further enhance reasoning performance, we introduceDeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support theresearch community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models(1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

图片.png

相关问题

还没有评论，发表第一个评论吧

2	0	17	2	2
提问	回答	资料	博客	粉丝

DeepSeek_R1论文

自然科学问答平台

关注问老鸟

小程序