<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>大模型评估 on 黄文卓 | DevOps Engineer</title><link>https://socake.github.io/tags/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E8%AF%84%E4%BC%B0/</link><description>Recent content in 大模型评估 on 黄文卓 | DevOps Engineer</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor><webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster><copyright>© 2026 Wenzhuo Huang</copyright><lastBuildDate>Thu, 05 Feb 2026 10:20:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/tags/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E8%AF%84%E4%BC%B0/index.xml" rel="self" type="application/rss+xml"/><item><title>RAG 评估体系：RAGAS 指标与幻觉检测实践</title><link>https://socake.github.io/posts/rag-evaluation-ragas/</link><pubDate>Thu, 05 Feb 2026 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/rag-evaluation-ragas/</guid><description>RAG 系统上线后，&amp;lsquo;感觉回答质量还不错&amp;rsquo;不是一个可持续的评估方式。RAGAS 提供了一套可量化的评估框架，让你能追踪 Faithfulness、Answer Relevancy 等指标随时间的变化，并在每次改动后自动验证系统质量没有退化。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/rag-evaluation-ragas/featured.jpg"/></item></channel></rss>