Gradient-Adaptive Policy Optimization:Towards Multi-Objective Alignment of Large Language Models
2025.acl-long.549.pdfhttps://aclanthology.org/2025.acl-long.549.pdf1.概述大型语言模型(LLMs)(Anthropic,2023;OpenAI,2024)已经在广泛的实际应用中展示了显著的能力(Bubecketal.,2023),包括内容创作(Yuanetal.,2022)、编程辅助(Chenetal.,2021;Gaoetal.