随着大语言模型(LLM)的广泛应用,如何评估和改进这些应用的性能成为了一个关键问题。UpTrain作为一个开源平台,提供了一系列评估功能,使得开发者能够对LLM应用进行全面的检测,并提供问题解决的指导。在这篇文章中,我们将介绍如何使用UpTrain的回调处理器在开发链中进行多样化评估,并详细展示如何实现这些功能。
UpTrain提供了超过20个预配置检查项(涵盖语言、代码、嵌入等用例),通过根因分析识别失败实例,并提供改进建议。它与Langchain中的检索器进行无缝集成,自动化评估链的性能,并在输出中展示结果。Langchain中的几个选定检索器被用于展示应用场景,包括Vanilla RAG、Multi Query Generation以及Context Compression and Reranking。
Vanilla RAG
Multi Query Generation
Context Compression and Reranking
%pip install -qU langchain langchain_openai langchain-community uptrain faiss-cpu flashrank
from getpass import getpass
from langchain.chains import RetrievalQA
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.callbacks.uptrain_callback import UpTrainCallbackHandler
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers.string import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables.passthrough import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
chunks = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever()
llm = ChatOpenAI(temperature=0, model="gpt-4")
选择合适的API Key类型并设置API Key:
KEY_TYPE = "openai" # 或 "uptrain"
API_KEY = getpass()
template = """
Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
rag_prompt_text = ChatPromptTemplate.from_template(template)
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| rag_prompt_text
| llm
| StrOutputParser()
)
uptrain_callback = UpTrainCallbackHandler(key_type=KEY_TYPE, api_key=API_KEY)
config = {"callbacks": [uptrain_callback]}
query = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(query, config=config)
multi_query_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=llm)
chain = (
{"context": multi_query_retriever, "question": RunnablePassthrough()}
| rag_prompt_text
| llm
| StrOutputParser()
)
question = "What did the president say about Ketanji Brown Jackson"
docs = chain.invoke(question, config=config)
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)
result = chain.invoke(query, config=config)
通过集成UpTrain,可以轻松评估LLM应用的性能并在生产环境中进行实时监控。这对于需要高可靠性和准确性的应用尤其重要,比如法律、医疗、金融等领域。
如果遇到问题欢迎在评论区交流。
—END—