放飞自我的Coder

Technical Considerations for Complex RAG (Retrieval Augmented Generation)

If you’re looking for a non-technical introduction to RAG, including answers to various getting-started questions and a discussion of relevant use-cases, check out our breakdown of RAG here.

In this article, we discuss various technical considerations when implementing RAG, exploring the concepts of chunking, query augmentation, hierarchies, multi-hop reasoning, and knowledge graphs. We also discuss unsolved problems & opportunities in the RAG infrastructure space, and introduce some infrastructure solutions for building RAG pipelines.

The first obstacles and design choices you will be making when building a RAG system are in how to prepare the documents for storage and information extraction. That will be the primary focus of this article.

As a refresher, here’s an overview of a RAG system architecture.

Technical Considerations for Complex RAG (Retrieval Augmented Generation)_第1张图片

Source: RAG and LLM business process automation: A technical strategy

Relevance vs Similarity

When discussing effective information retrieval in RAG, it is crucial to understand the difference between “relevance” and “similarity.” Whereas similarity is about the similarity in words matching, relevance is about the connectedness of ideas. You can identify semantically close content using a vector database query, but identifying and retrieving relevant content requires more sophisticated tooling.

This is an important concept to keep in mind as we explore various RAG techniques below. If you haven’t yet, you should check out Llamaindex’s helpful video on building production RAG apps. This is a good primer for our discussion on various RAG system development techniques.

Technical Considerations for Complex RAG (Retrieval Augmented Generation)_第2张图片

Source: https://www.youtube.com/watch?v=TRjq7t2Ms5I&ab_channel=AIEngineer

Technical Implementation of RAG

Chunking Strategy

In the context of natural language processing, “chunking” refers to the segmentation of text into small, concise, meaningful ‘chunks.’ A RAG system can more quickly and accurately locate relevant context in smaller text chunks than in large documents.

How can you ensure you’re selecting the right chunk? The effectiveness of your chunking strategy largely depends on the quality and structure of these chunks.

Determining the optimal chunk size is about striking a balance — capturing all essential information without sacrificing speed.

While larger chunks can capture more context, they introduce more noise and require more time and compute costs to process. Smaller chunks have less noise, but may not fully capture the necessary context. Overlapping chunks is a way to balance both of these constraints. By overlapping chunks, a query will likely retrieve enough relevant data across multiple vectors in order to generate a properly contextualized response.

One limitation is that this strategy assumes that all of the information you must retrieve can be found in a single document. If the required context is split across multiple different documents, you may want to consider leveraging solutions like document hierarchies and knowledge graphs.

Document Hierarchies

A document hierarchy is a powerful way of organizing your data to improve information retrieval. You can think of a document hierarchy as a table of contents for your RAG system. It organizes chunks in a structured manner that allows RAG systems to efficiently retrieve and process relevant, related data. Document hierarchies play a crucial role in the effectiveness of RAG by helping the LLM decide which chunks contain the most relevant data to extract.

Technical Considerations for Complex RAG (Retrieval Augmented Generation)_第3张图片

A Document Hierarchy — Source: Retrieval Augmented Generation at Planet Scale | Arcus

Document hierarchies associate chunks with nodes, and organize nodes in parent-child relationships. Each node contains a summary of the information contained within, making it easier for the RAG system to quickly traverse the data and understand which chunks to extract.

Why would you need a document hierarchy if an LLM supposedly is able to understand the words in a document?

Think of a document hierarchy as a table of contents or a file directory. Although the LLM can extract relevant chunks of text from a vector database, you can improve the speed and reliability of retrieval by using a document hierarchy as a pre-processing step to locate the most relevant chunks of text. This strategy improves retrieval reliability, speed, repeatability, and can help reduce hallucinations due to chunk extraction issues. Document hierarchies may require domain-specific or problem-specific expertise to construct to ensure the summaries are fully relevant to the task at hand.

Let’s take a use case in the HR space. Let’s say that a company has 10 offices and each office has their own country-specific HR policy, but uses the same template to document these policies. As a result, each office’s HR policy document has roughly the same format, but each section would detail country-specific policies for public holidays, healthcare, etc.

In a vector database, every “public holidays” paragraph chunk would look very similar. In this case, a vector query could retrieve a lot of the same, unhelpful data, which can lead to hallucinations. By using a document hierarchy, a RAG system can more reliably answer a question about public holidays for the Chicago office by first searching for documents that are relevant to the Chicago office.

It should become increasingly clear that most of the work that goes into building a RAG system is making sense of unstructured data, and adding additional contextual guardrails that allow the LLM to make more deterministic information extraction. I think of this as akin to the instruction one needs to give to an intern to prepare them on how to reason through a corpus of data when they start on the job. Like an intern, an LLM can understand individual words in documents and how they might be similar to the question being asked, but it is not aware of the first principles needed to piece together a contextualized answer.

Knowledge Graphs

Knowledge graphs are a great data framework for document hierarchies to enforce consistency. A knowledge graph is a deterministic mapping of relationships between concepts and entities. Unlike a similarity search in a vector database, a knowledge graph can consistently and accurately retrieve related rules and concepts, and dramatically reduce hallucinations. The benefit of using knowledge graphs to map document hierarchies is that you can map information retrieval workflows into instructions that the LLM can follow. (i.e. to answer X question, I know I need to pull information from document A and then compare X with document B).

Knowledge graphs map relationships using natural language, which means that even non-technical users can build and modify rules and relationships to control their enterprise RAG systems. For example, a rule can look like: ‘When answering a question about leave policies, first refer to the correct office’s HR policy document, and then within the document, check the section on holidays.’’

Technical Considerations for Complex RAG (Retrieval Augmented Generation)_第4张图片

An example of a knowledge graph. Source: https://www.researchgate.net/figure/Control-interfaces-for-Knowledge-graph-and-Hierarchy-More-details-in-Sarrafzadeh-et-al_fig4_318764253

Query Augmentation

Query augmentation addresses the issue of badly phrased questions, a common issue in RAG that we discuss here. What we are solving for here is to make sure any questions that are missing specific nuances are given the appropriate context to maximize relevancy.

Badly phrased questions can often be due to the complicated nature of language. For example, a single word can mean two different things based on the context in which it is used. As Agustinus (Head of AI at CarSales.AU) notes, this is largely a domain-specific issue. Consider this example: is “fried chicken” more similar to ‘chicken soup’ or ‘fried rice?’ “The answer varies based on context. If ingredients are the focus, ‘fried chicken’ aligns closest with ‘chicken soup.’ But from a preparation perspective, it’s closer to ‘fried rice.’ Such interpretations are domain-centric.”

What if you want to contextualize an LLM with company or domain-specific words? An easy example of this is company acronyms (i.e. ARP means Accounting Reconciliation Process). Further, consider a more complicated example from one of our clients, a travel agency. As a travel company, our client needed to make a distinction between the phrases ‘near the beach’ and ‘beachfront’. To most LLMs, these terms are fairly indistinguishable. In the context of travel, however, a beachfront house and a house near the beach are very different things. Our solution was to map ‘near the beach’ properties to a specific segment of properties, and ‘beachfront’ properties to another by pre-processing the query and adding company-specific context to refer to the appropriate segments.

Query Planning

Query Planning represents the process of generating the sub-questions needed to properly contextualize and generate answers that, when combined, fully answer the original question. This process of adding relevant context can be similar in principle to query augmentation.

Let’s take the example of a question which asks ‘Which city has the highest population?”. To answer this question, the RAG system must generate answers to the following sub-questions as shown in the image below, before ranking the cities by population:

‘What is the population of Toronto?’
‘What is the population of Chicago?’
‘What is the population of Houston?’
‘What is the population of Boston?’
‘What is the population of Atlanta?’

Technical Considerations for Complex RAG (Retrieval Augmented Generation)_第5张图片

LlamaIndex uses this strategy, among others, to determine the relevant sub-questions it needs to answer in order to answer the top-level question. LlamaIndex also leverages various other strategies, which are largely variations of the above core concept.

Here’s the code snippet that LlamaIndex’s Query Planning agent uses to identify sub-questions.

‘dependencies’: {‘title’: ‘Dependencies’,
‘description’: ‘List of sub-questions that need to be answered in order to answer the question given by `query_str`.Should be blank if there are no sub-questions to be specified, in which case `tool_name` is specified.’,

LLMs are known to have difficulty in reasoning without assistance, so the main challenge with sub-question generation has thus been accuracy:

“To verify this behavior, we implemented the example using the LlamaIndex Sub-question query engine. Consistent with our observations, the system often generates the wrong sub-questions and also uses the wrong retrieval function for the sub-questions” — Pramod Chunduri on building Advanced RAG pipelines (Oct 30 ‘23)

To be explicit, this is not a reflection on LlamaIndex, but a reflection of the difficulties of relying solely on LLMs for reasoning.

We will likely require external reasoning structures and rules to be able to enforce certain principles and personal approaches to answering questions through generated or stored sub-questions. This gets exponentially more challenging when you consider how each industry’s, company’s, or individual’s preferences may differ from the LLM’s.

Let’s consider an external reasoning rule for the city population question above. This rule is written in natural language and then read by an LLM agent when answering a question:

When considering cities with the highest population, ask which continent they want to look at, and then retrieve all the cities in that continent to compare the population.

A criticism of this approach is that it represents manual intervention into the reasoning process and one cannot possibly imagine every single sub-question for every potential question. This is true. Given the state of LLMs, one should only seek to intervene with external reasoning rules at the point of failure of LLMs, and not seek to recreate every possible sub-question.

Combining everything together into a RAG system capable of multi-hop reasoning and query modification

In our previous article, we discussed the role of multi-hop retrieval within complex RAG, and the various scenarios where complex RAG may emerge in a workflow. Here are issues that arise when building multi-hop retrieval.

Data Integration and Quality: It is crucial that interconnected data sources are of high quality, relevant, and up-to-date. Poor or biased data can lead to inaccurate multi-step conclusions.
Contextual Understanding and Linking: The system must not only understand each query and sub-query but also how they connect to form a coherent whole. This involves advanced natural language understanding to discern subtle links between different pieces of information.
User Intent Recognition: Recognizing the user’s underlying intent and how it evolves with each hop is key. The system should adapt its retrieval strategy based on the evolving nature of the query. This overlaps significantly with query augmentation.

Let us deconstruct with an example from the medical field. In this article, Wisecube proposes the following question: “What are the latest advancements in Alzheimer’s disease treatment?” A RAG system leveraging the aforementioned strategies would then employ the following steps:

Query Planning:

“What are the current treatments and side-effects for Alzheimer’s disease?”
“What is the latest research on these treatments?”

Query Augmentation:

“What is the latest research on these treatments?” With access to a knowledge graph, the agent can consistently retrieve structured data about Alzheimer’s treatments, such as “cholinesterase inhibitors” and “memantine.”
The RAG system would then refine the question into “What is the latest research on cholinesterase inhibitors and memantine in Alzheimer’s disease treatment?”

Document Hierarchies and Vector Database retrieval:

Using a document hierarchy, identify which documents and chunks are the most relevant to “cholinesterase inhibitors” and “memantine” and return the relevant answer.
You may also have an LLM include these chunks into the knowledge graph of latent knowledge so they can increasingly add more contextual data over time. The LLM can then repeat the vector database retrieval process again, with an enhanced latent knowledge base (and now structured by the knowledge graph) and a newly augmented query to retrieve more relevant information from the vector database to reach a satisfactory answer.
An example of a similar principle of using an LLM to ‘walk to the relevant document chunks’ was articulated by Hrishi, CTO of Greywing (YCW21).

Augmented Response:

As a post-processing step, you may also elect to enhance the post-processing output with a healthcare-industry-specific knowledge graph. For example, you could include a default health warning specific to memantine treatments or any additional information associated with the two treatments or side-effects.

An advantage of using a knowledge graph over a vector database for query augmentation is that a knowledge graph can enforce consistent retrieval for certain key topics and concepts where the relationships are known. In the augmented response step, the RAG system can automatically include certain warnings or related concepts that are necessary to include whenever an answer includes a particular drug or disease or concept. This is exactly the type of exciting work we’re doing at WhyHow.AI.

Technical Considerations for Complex RAG (Retrieval Augmented Generation)_第6张图片

Unsolved problems in RAG that represent future opportunities

In the short-term, there are many opportunities to enhance cost efficiency and accuracy in RAG. This opens avenues for developing more sophisticated data retrieval processes that are more precise and resource-efficient.

In the long-term, there is a significant opportunity to devise methods to build and store semantic reasoning in a scalable way. This involves exploring new frontiers in knowledge representation, such as advanced encoding techniques for complex data relationships and innovative storage solutions. These developments will enable RAG systems to effectively manage and utilize growing data complexities.

Challenges in RAG Development

Technical Considerations for Complex RAG (Retrieval Augmented Generation)_第7张图片

Technical Considerations for Complex RAG (Retrieval Augmented Generation)_第8张图片

In our next article, we will review specific implementation techniques of knowledge graphs for complex RAG and multi-hop processes. If you have any questions at all about RAG, feel free to message us at [email protected].

Articles to read

Towards Data Science: https://towardsdatascience.com/10-ways-to-improve-the-performance-of-retrieval-augmented-generation-systems-5fa2cee7cd5c
LlamaIndex: High-Level Concepts - LlamaIndex 0.9.19
Better Programming RAG Evaluation: https://betterprogramming.pub/exploring-end-to-end-evaluation-of-rag-pipelines-e4c03221429
Neo4j RAG and Knowledge Graphs: Knowledge Graphs & LLMs: Fine-Tuning vs. Retrieval-Augmented Generation - Graph Database & Analytics
Pinecone’s RAG with Guardrails: Making Retrieval Augmented Generation Fast | Pinecone

GPT-4 在 AIGC 中的微调技巧：让模型更懂你的需求 AIGC应用创新大全 AI人工智能与大数据应用开发 MCP&Agent 云算力网络 AIGC ai
GPT-4在AIGC中的微调技巧：让模型更懂你的需求关键词：GPT-4、AIGC、模型微调、监督学习、指令优化、过拟合预防、个性化生成摘要：AIGC（人工智能生成内容）正在重塑内容创作行业，但通用的GPT-4模型可能无法精准匹配你的垂直需求——比如写电商爆款文案时总“跑题”，或生成技术文档时专业术语不够。本文将用“教小朋友学画画”的通俗类比，从微调的底层逻辑讲到实战技巧，带你掌握让GPT-4“更懂
AIGC内容生成实战：如何用ChatGPT+DALL·E打造高转化内容 AI大模型应用工坊 AI大模型开发实战 AIGC chatgpt ai
AIGC内容生成实战：如何用ChatGPT+DALL·E打造高转化内容关键词：AIGC、ChatGPT、DALL·E、内容生成、高转化营销、多模态协同、提示词工程摘要：随着AIGC（人工智能生成内容）技术的爆发式发展，ChatGPT（文本生成）与DALL·E（图像生成）的组合已成为内容创作领域的“黄金搭档”。本文将深度解析二者的协同原理，结合实战案例演示从需求分析到内容落地的全流程，并揭示提升内容
企业级RAG的数据方案选择 - 向量数据库、图数据库和知识图谱南七小僧 AI技术产品经理网站开发人工智能数据库知识图谱人工智能
如何为企业RAG选择合适的数据存储方式摘要:本文讨论了矢量数据库、图数据库和知识图谱在解决信息检索挑战方面的重要性，特别是针对企业规模的检索增强生成（RAG）。看看海外人工智能企业Writer是如何利用知识图谱增强企业级RAG。要点概要：矢量数据库高效存储数据，但缺乏上下文和关联信息。图数据库优先考虑数据点之间的关系，受益于关系结构。知识图谱在语义存储方面表现出色，由于其能够编码丰富的上下文信息，
【人工智能入门必看的最全Python编程实战（1）】 DFCED 人工智能 python 开发语言深度学习找工作就业
--------------------------------------------------------------------------------------------------------------------1.AIGC未来发展前景未完持续…1.1人工智能相关科研重要性拥有一篇人工智能科研论文及专利软著竞赛是保研考研留学深造以及找工作的关键门票！！！拥有一篇人工智能科研论文
Langchain学习笔记(十)：文档加载与处理详解
注：本文是Langchain框架的学习笔记；不是教程！不是教程！内容可能有所疏漏，欢迎交流指正。后续将持续更新学习笔记，分享我的学习心得和实践经验。前言在构建基于大语言模型的应用时，文档处理是一个至关重要的环节。无论是构建RAG（检索增强生成）系统，还是进行知识库问答，我们都需要将各种格式的文档转换为模型可以理解和处理的形式。Langchain提供了强大的文档加载和处理功能，支持多种文件格式，并提
AGI和AIGC傻傻分不清楚，一篇文章告诉你如何分辨！
Look！我们的大模型商业化落地产品更多AI资讯请关注Free三天集训营助教在线为您火热答疑‍什么是AGI(人工通用智能)?AGI是ArtificialGeneralIntelligence的缩写，中文翻译为“通用人工智能”，该术语指的是机器能够完成人类能够完成的任何智力任务的能力。与狭义的人工智能(ANI)不同，狭义的人工智能是为特定领域或问题而设计的，而AGI旨在实现一般的认知能力，能够适应任
2025年海外短剧CPS分销系统开发：技术架构与商业化实战指南
一、市场爆发：万亿级赛道的结构性机遇2025年海外短剧市场迎来指数级增长，SensorTower数据显示，仅第一季度应用内购收入就达7亿美元，全年预计突破45亿美元。美国贡献49%收入，东南亚以9%增速成为新兴增长极。这种爆发式增长源于三大驱动力：用户行为变迁：全球短视频用户突破20亿，微短剧月活用户仅8000万，渗透率不足10%，存在11倍增长空间技术赋能创新：AI生成内容（AIGC）降低制作成
基于知识图谱技术增强大模型RAG知识库应用效果罗伯特之技术屋知识图谱人工智能
【摘要】本文是AI落地实践的优秀案例，利用RAG技术（Retrieval-AugmentedGeneration，检索增强生成）的知识库实践为背景，介绍了RAG技术的发展及存在的不足，以及知识图谱相关的知识，利用RAG技术去完善和智能化知识图谱。在AI技术大量涌现，但应用不足的情况下，指明了现有应用场景、技术与AI结合的具体做法。1.引言随着人工智能技术的加速演进，AI大模型如雨后春笋般纷纷涌现，
构建高效 RAG 流程的七个关键点及其落地实践 charles666666 搜索引擎大数据需求分析交互笔记数据库
人工智能应用浪潮中，检索增强生成（RAG）技术凭借着结合大型语言模型（LLMs）的生成能力和信息检索系统的独特优势，成为了各企业挖掘数据价值、提升业务智能化水平的关键手段之一。然而，构建一个高效且精准的RAG流程并非易事，其中存在着诸多关键点和挑战。作为一名非资深IT技术顾问，我将基于丰富的实战经验，为大家深入剖析构建高效RAG流程的七个关键点及其落地实践。一、文档解析：混合格式的“第一道坎”在企
【速通RAG实战：进阶】16、AI生成思维导图全技术解析无心水速通 RAG 实战！解锁 AI 2.0 高薪密码人工智能 AI思维导图知识图谱 markmap-js Qwen-long模型 CSDN技术干货
一、AI生成思维导图的底层技术逻辑（一）知识结构化的核心流程AI生成思维导图的本质是非结构化文本到结构化知识图谱的转化，其技术流程可拆解为五大核心环节：1.语义解析与实体抽取多模态输入处理：支持文本（Markdown/Word/PDF）、语音（会议录音）、手写笔记（图片OCR）等多形式输入，通过TesseractOCR识别图片文字，Whisper处理语音流。实体识别技术栈：#中英文混合实体识别示例
5个必知的AIGC工具，轻松打造爆款虚拟偶像 AI原生应用开发 AI 原生应用开发实战 AIGC ai
5个必知的AIGC工具，轻松打造爆款虚拟偶像关键词：AIGC工具、虚拟偶像、AI生成内容、数字人建模、智能交互、语音合成、动画生成摘要：本文深度解析5款前沿AIGC工具在虚拟偶像打造中的核心应用，涵盖从形象设计、语音生成到动态交互的全流程技术实现。通过MidJourney、D-ID、MetaHuman、RunwayML、VoiceMaker等工具的原理剖析、操作指南及实战案例，揭示如何利用AI技术
RAG流程中，要怎么对文本进行拆词？ java干货仓库八股文汇总大模型面试人工智能自然语言处理 llama
在RAG（Retrieval-AugmentedGeneration）流程中，对文本的拆词（Tokenization）是影响检索和生成效果的关键步骤。以下是文本拆词的技术细节及优化方法：1.拆词的核心目标检索阶段：确保查询（Query）和文档（Document）的拆词方式一致，提高检索匹配精度。生成阶段：适配大模型的词表，避免生成时的OOV（Out-of-Vocabulary）问题。2.常见拆词方
RAGFlow 框架调研报告 it_czz 架构
RAGFlow框架调研报告1.概述RAGFlow是一个开源的检索增强生成（RAG）框架，专注于深度文档理解和高精度检索。它通过先进的文档解析能力和可视化调试功能，为企业提供了一个强大的知识库问答解决方案。1.1核心特性深度文档处理：内置DeepDoc引擎，支持复杂文档解析高精度检索：提供可视化分块和引用追踪多模态支持：支持文本、图片、PDF、Excel等多种格式开源自托管：完全开源，支持私有化部署
AIGC 领域 AI 写作在电商文案中的应用技巧 SuperAGI架构师的AI实验室 AI大模型应用开发宝典 AIGC 人工智能 easyui ai
AIGC领域AI写作在电商文案中的应用技巧关键词：AIGC、AI写作、电商文案、内容生成、自然语言处理、营销自动化、个性化推荐摘要：本文深入探讨了AIGC（人工智能生成内容）技术在电商文案创作中的应用技巧。文章首先介绍了AIGC的基本概念和发展现状，然后详细分析了AI写作在电商领域的核心应用场景和技术原理。通过具体的算法解析、数学模型和实际案例，展示了如何利用AI技术提升电商文案的创作效率和质量。
AIGC时代，营销人需要掌握的5项新技能 SuperAGI架构师的AI实验室 AI大模型应用开发宝典 AIGC ai
AIGC时代，营销人需要掌握的5项新技能关键词：AIGC、营销转型、内容生成、数据驱动、人机协作、技能升级、数字营销摘要：随着生成式人工智能(AIGC)技术的快速发展，营销行业正在经历前所未有的变革。本文详细分析了在AIGC时代营销人必须掌握的5项核心新技能，包括AIGC工具应用、数据思维、创意管理、人机协作和伦理意识。通过生动的案例和实用的建议，帮助营销从业者顺利实现技能升级，把握AI时代的营销
AIGC领域MCP模型上下文协议：数据处理的新方案 AI大模型应用工坊 AIGC ai
AIGC领域MCP模型上下文协议：数据处理的新方案关键词：AIGC、MCP模型、上下文协议、多模态数据处理、动态上下文管理、长序列建模、语义连贯性摘要：随着AIGC（人工智能生成内容）技术的快速发展，多模态生成、长文本创作、跨场景对话等任务对上下文管理提出了更高要求。传统上下文处理方案因碎片化、语义断层、动态适应性差等问题，难以满足复杂场景需求。本文聚焦AIGC领域的MCP（Multi-Conte
Java中的模型API、RAG与向量数据库：构建智能应用的新范式张道宁人工智能
引言在当今人工智能迅猛发展的时代，Java开发者如何利用最新的AI技术构建智能应用？本文将深入探讨模型API、检索增强生成(RAG)和向量数据库这三种关键技术，以及它们如何协同工作来提升Java应用的智能化水平。一、模型API：Java中的AI能力接入1.1什么是模型API模型API是大型语言模型(LLM)提供的编程接口，允许开发者通过HTTP请求与AI模型交互。在Java生态中，我们可以通过多种
Java AI面试实战：Spring AI与RAG技术落地 GEM的左耳返 Java场景面试宝典 Java面试 Spring AI RAG 向量数据库 AI应用 Prompt工程
JavaAI面试实战：SpringAI与RAG技术落地面试现场：AI技术终面室面试官：谢飞机同学，今天我们聚焦JavaAI应用开发，重点考察SpringAI和RAG技术栈。谢飞机：（兴奋地）面试官好！我可是AI达人！ChatGPT、Midjourney我天天用，SpringAI这新框架我也研究过！第一轮：SpringAI基础面试官：请详细描述SpringAI的核心组件及PromptTemplate
Coze 开源了！所有人都可以免费使用了 Immerse666 开源 COZE 工作流智能体人工智能
大家好，我是Immerse，一名独立开发者、内容创作者、AGI实践者。关注公众号：#沉浸式趣谈，获取最新文章（更多内容只在公众号更新)个人网站：https://yaolifeng.com也同步更新。转载请在文章开头注明出处和版权信息。我会在这里分享关于编程、独立开发、AI干货、开源、个人思考等内容。如果本文对您有所帮助，欢迎动动小手指一键三连(点赞、评论、转发)，给我一些支持和鼓励，谢谢！今天字节
RAG 技术落地：从文档处理到模型输出，细节决定大模型应用效果
RAG技术落地：从文档处理到模型输出，细节决定大模型应用效果基于经典的RAG（检索增强生成）流程，我们能快速搭建大模型相关应用，但实际落地中，细节把控直接决定应用效果能否达到上线标准。从文档读取到最终回复用户，每个环节都暗藏技术挑战，唯有逐一攻克，才能让RAG应用真正发挥价值。文档处理：RAG的基础工程难题RAG流程的第一步是文档处理，这看似简单，实则暗藏诸多挑战。实际场景中需要处理的文档类型繁杂
打造专属知识库：手把手教你构建RAG系统
RAG通常指的是"Retrieval-AugmentedGeneration"，即“检索增强的生成”。这是一种结合了检索（Retrieval）和生成（Generation）的机器学习模型，通常用于自然语言处理任务，如文本生成、问答系统等。我们通过一下几个步骤来完成一个基于京东云官网文档的RAG系统数据收集建立知识库向量检索提示词与模型数据收集数据的收集再整个RAG实施过程中无疑是最耗人工的，涉及到
人类心智逆向工程：AGI的认知科学基础 .猫的树 AGI-通用人工智能 AGI 逆向工程类脑计算动态终身学习系统
文章目录引言：为何需要逆向工程人类心智？一、逆向工程的定义与目标1.1什么是逆向工程？1.2AGI逆向工程的核心目标二、认知科学的四大支柱与AGI2.1神经科学：大脑的硬件解剖2.2心理学：心智的行为建模2.3语言学：符号与意义的桥梁2.4哲学：意识与自我模型的争议三、逆向工程的技术挑战3.1数据不足：脑科学研究的局限性3.2计算模型与生物机制的鸿沟3.3跨学科协作的复杂性四、逆向工程对AGI的启
百度大涨，AIGC视频生成模型蒸汽机将会给百度带来什么？
百度7月23日盘中表现强势，盘中一度涨4.49%。消息面上，百度旗下百度商业研发团队自研的AIGC视频生成模型蒸汽机(MuseSteamer)正式上线手机网页版，支持用户通过移动端一键生成电影级视频。百度的大涨我们该怎么分析？首先，百度股价的上涨反映了市场对其新推出的AIGC视频生成模型蒸汽机的高度认可和期待。这款模型能够支持用户通过移动端一键生成电影级视频，显示出百度在人工智能技术应用领域的持续
大语言模型 LLM 通过 Excel 知识库增强日志分析，根因分析能力的技术方案（1）：总体介绍 shiter 人工智能系统解决方案与技术架构语言模型 excel 人工智能
文章大纲1.核心目标2.系统总体架构3.GoogleCloud端到端方案（含无RAG&RAG双模式）3.1无RAG：Function-Calling查表模式3.2RAG：托管式向量检索4.开源轻量级方案5.数字孪生联合验证（实验性）6.知识图谱增强（Neo4j）7.监控与持续优化（CometLLM）8.实施路线图（4~10周）9.典型案例速览10.一键复现仓库11.参考文献1.核心目标让LLM在“
RAG面试内容整理-1. 检索增强生成（RAG）概述与意义不务正业的猿面试 AI 面试 RAG 人工智能算法大模型检索
检索增强生成（Retrieval-AugmentedGeneration,RAG）是一种将大语言模型与外部知识库相结合的生成式AI架构。传统的大型预训练语言模型（LLM）容易受到训练语料限制，面对超出其知识范围或需要最新信息的查询时可能产生“幻觉”。RAG通过在生成答案前检索相关文档片段，引入新鲜、可信的知识，从而提升回答的准确性和时效性。RAG系统包含两个核心组件：检索器（Retriever）和
RAG 技术深度面试题：架构、优化与实践应用居7然大模型面试架构人工智能机器学习算法面试
1.RAG基础架构设计问题：对比单阶段检索（Single-stageRetrieval）与两阶段检索（Two-stageRetrieval）在RAG系统中的架构差异，说明在企业知识库场景下为何优先选择两阶段检索？答案：单阶段检索直接通过向量数据库对用户query进行一次相似度匹配返回结果，架构简单但精度有限；两阶段检索则先通过召回阶段（如向量检索+关键词检索）获取候选文档，再通过重排序阶段（如Cr
生成式引擎优化（GEO）：AI时代网站优化的范式重构 GEO优化助手 AI搜索优化生成式引擎优化 GEO优化人工智能重构生成式引擎优化搜索引擎 GEO优化 AI搜索营销
在DeepSeek、文心一言等大模型驱动的AI时代，搜索引擎正经历从"信息检索工具"向"智能决策助手"的质变。据中国互联网信息中心数据显示，2025年AI生成内容（AIGC）在搜索结果中的占比已突破63%，传统SEO的关键词堆砌策略逐渐失效。生成式引擎优化（GEO）作为适配AI搜索的新兴学科，正在重构数字营销的底层逻辑。某美妆品牌通过关键词堆砌获得首页排名，但在文心一言的"2025职场穿搭"问答中
生成式引擎优化（GEO）：AI携手迈向搜索引擎智能新时代 GEO优化助手生成式引擎优化 GEO优化 AI搜索优化搜索引擎人工智能 GEO 生成式引擎优化
生成式引擎优化（GEO）：AI携手迈向搜索引擎智能新时代一、技术范式重构：从关键词匹配到语义共生在人工智能技术驱动下，搜索引擎正经历从"信息检索工具"向"认知决策伙伴"的范式转变。生成式引擎优化（GEO）作为连接内容生产与AI理解的桥梁，通过三大技术支柱重塑搜索生态：检索增强生成（RAG）架构夸克平台采用自研Qwen推理模型构建向量数据库，实现分钟级知识图谱更新。医疗设备企业通过API接口同步实时
【AIGC调研系列】敢于挑战Transformer的新架构Megalodon有什么优势 Zachary AI AIGC调研相关 AIGC transformer 架构
Megalodon作为一种新架构，其优势主要体现在以下几个方面：无限上下文处理能力：Megalodon能够处理无限上下文，这一点在多个证据中得到了强调[1][2][3]。这意味着它能够在处理长文本时保持高效和准确，而不会因为上下文长度的限制而降低性能。高性能：在2万亿token的训练任务中，Megalodon的性能超越了Llama2-7B，实现了非凡的效率[1][2][3]。这表明Megalodo
「大模型应用」(2)RAG的检索与rerank 木楚子 bge rerank rag 语言模型
0.基础内容我们先来介绍几种检索方式，在RAG（Retrieval-AugmentedGeneration，检索增强生成）框架中，稀疏检索器（SparseRetriever）和密集检索器（DenseRetriever）是两种核心的文档检索方式，它们的主要作用是：从海量知识库中找出与用户输入相关的文档，供语言模型参考生成回答。一、稀疏检索器（SparseRetriever）✅基本原理稀疏检索器通常基
Js函数返回值 _wy_ js return
一、返回控制与函数结果，语法为：return 表达式;作用: 结束函数执行，返回调用函数，而且把表达式的值作为函数的结果二、返回控制语法为：return;作用: 结束函数执行，返回调用函数，而且把undefined作为函数的结果在大多数情况下,为事件处理函数返回false,可以防止默认的事件行为.例如,默认情况下点击一个<a>元素,页面会跳转到该元素href属性
MySQL 的 char 与 varchar bylijinnan mysql
今天发现，create table 时，MySQL 4.1有时会把 char 自动转换成 varchar 测试举例： CREATE TABLE `varcharLessThan4` ( `lastName` varchar(3) ) ; mysql> desc varcharLessThan4; +----------+---------+------+-
Quartz——TriggerListener和JobListener eksliang TriggerListener JobListener quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208624 一.概述 listener是一个监听器对象，用于监听scheduler中发生的事件，然后执行相应的操作；你可能已经猜到了，TriggerListeners接受与trigger相关的事件，JobListeners接受与jobs相关的事件。二.JobListener监听器 j
oracle层次查询 18289753290 oracle；层次查询；树查询
.oracle层次查询(connect by) oracle的emp表中包含了一列mgr指出谁是雇员的经理，由于经理也是雇员，所以经理的信息也存储在emp表中。这样emp表就是一个自引用表，表中的mgr列是一个自引用列，它指向emp表中的empno列，mgr表示一个员工的管理者， select empno,mgr,ename,sal from e
通过反射把map中的属性赋值到实体类bean对象中酷的飞上天空 javaee 泛型类型转换
使用过struts2后感觉最方便的就是这个框架能自动把表单的参数赋值到action里面的对象中但现在主要使用Spring框架的MVC，虽然也有@ModelAttribute可以使用但是明显感觉不方便。好吧，那就自己再造一个轮子吧。原理都知道，就是利用反射进行字段的赋值，下面贴代码主要类如下： import java.lang.reflect.Field; imp
SAP HANA数据存储：传统硬盘的瓶颈问题蓝儿唯美 HANA
SAPHANA平台有各种各样的应用场景，这也意味着客户的实施方法有许多种选择，关键是如何挑选最适合他们需求的实施方案。在《Implementing SAP HANA》这本书中，介绍了SAP平台在现实场景中的运作原理，并给出了实施建议和成功案例供参考。本系列文章节选自《Implementing SAP HANA》，介绍了行存储和列存储的各自特点，以及SAP HANA的数据存储方式如何提升空间压
Java Socket 多线程实现文件传输随便小屋 java socket
高级操作系统作业，让用Socket实现文件传输，有些代码也是在网上找的，写的不好，如果大家能用就用上。客户端类： package edu.logic.client; import java.io.BufferedInputStream; import java.io.Buffered
java初学者路径 aijuans java
学习Java有没有什么捷径?要想学好Java，首先要知道Java的大致分类。自从Sun推出Java以来，就力图使之无所不包，所以Java发展到现在，按应用来分主要分为三大块：J2SE,J2ME和J2EE,这也就是Sun ONE(Open Net Environment)体系。J2SE就是Java2的标准版，主要用于桌面应用软件的编程；J2ME主要应用于嵌入是系统开发，如手机和PDA的编程；J2EE
APP推广 aoyouzi APP 推广
一，免费篇 1，APP推荐类网站自主推荐最美应用、酷安网、DEMO8、木蚂蚁发现频道等,如果产品独特新颖，还能获取最美应用的评测推荐。PS：推荐简单。只要产品有趣好玩，用户会自主分享传播。例如足迹APP在最美应用推荐一次，几天用户暴增将服务器击垮。 2，各大应用商店首发合作老实盯着排期，多给应用市场官方负责人献殷勤。 3，论坛贴吧推广百度知道，百度贴吧，猫扑论坛，天涯社区，豆瓣（
JSP转发与重定向百合不是茶 jsp servlet Java Web jsp转发
在servlet和jsp中我们经常需要请求,这时就需要用到转发和重定向; 转发包括;forward和include 例子;forwrad转发; 将请求装法给reg.html页面关键代码; req.getRequestDispatcher("reg.html
web.xml之jsp-config bijian1013 java web.xml servlet jsp-config
1.作用：主要用于设定JSP页面的相关配置。 2.常见定义： <jsp-config> <taglib> <taglib-uri>URI(定义TLD文件的URI,JSP页面的tablib命令可以经由此URI获取到TLD文件)</tablib-uri> <taglib-location> TLD文件所在的位置
JSF2.2 ViewScoped Using CDI sunjing CDI JSF 2.2 ViewScoped
JSF 2.0 introduced annotation @ViewScoped; A bean annotated with this scope maintained its state as long as the user stays on the same view(reloads or navigation - no intervening views). One problem w
【分布式数据一致性二】Zookeeper数据读写一致性 bit1129 zookeeper
很多文档说Zookeeper是强一致性保证，事实不然。关于一致性模型请参考http://bit1129.iteye.com/blog/2155336 Zookeeper的数据同步协议 Zookeeper采用称为Quorum Based Protocol的数据同步协议。假如Zookeeper集群有N台Zookeeper服务器(N通常取奇数，3台能够满足数据可靠性同时
Java开发笔记白糖_ java开发
1、Map<key,value>的remove方法只能识别相同类型的key值 Map<Integer,String> map = new HashMap<Integer,String>(); map.put(1,"a"); map.put(2,"b"); map.put(3,"c"
图片黑色阴影 bozch 图片
.event{ padding:0; width:460px; min-width: 460px; border:0px solid #e4e4e4; height: 350px; min-heig
编程之美-饮料供货-动态规划 bylijinnan 动态规划
import java.util.Arrays; import java.util.Random; public class BeverageSupply { /** * 编程之美饮料供货 * 设Opt（V’，i）表示从i到n-1种饮料中，总容量为V’的方案中，满意度之和的最大值。 * 那么递归式就应该是：Opt（V’，i）=max{ k * Hi+Op
ajax大参数（大数据）提交性能分析 chenbowen00 Web Ajax 框架浏览器 prototype
近期在项目中发现如下一个问题项目中有个提交现场事件的功能，该功能主要是在web客户端保存现场数据（主要有截屏，终端日志等信息）然后提交到服务器上方便我们分析定位问题。客户在使用该功能的过程中反应点击提交后反应很慢，大概要等10到20秒的时间浏览器才能操作，期间页面不响应事件。根据客户描述分析了下的代码流程，很简单，主要通过OCX控件截屏，在将前端的日志等文件使用OCX控件打包，在将之转换为
[宇宙与天文]在太空采矿,在太空建造 comsci
我们在太空进行工业活动...但是不太可能把太空工业产品又运回到地面上进行加工,而一般是在哪里开采,就在哪里加工,太空的微重力环境,可能会使我们的工业产品的制造尺度非常巨大.... 地球上制造的最大工业机器是超级油轮和航空母舰,再大些就会遇到困难了,但是在空间船坞中,制造的最大工业机器,可能就没
ORACLE中CONSTRAINT的四对属性 daizj oracle CONSTRAINT
ORACLE中CONSTRAINT的四对属性 summary:在data migrate时,某些表的约束总是困扰着我们,让我们的migratet举步维艰,如何利用约束本身的属性来处理这些问题呢?本文详细介绍了约束的四对属性: Deferrable/not deferrable, Deferred/immediate, enalbe/disable, validate/novalidate,以及如
Gradle入门教程 dengkane gradle
一、寻找gradle的历程一开始的时候，我们只有一个工程，所有要用到的jar包都放到工程目录下面，时间长了，工程越来越大，使用到的jar包也越来越多，难以理解jar之间的依赖关系。再后来我们把旧的工程拆分到不同的工程里，靠ide来管理工程之间的依赖关系，各工程下的jar包依赖是杂乱的。一段时间后，我们发现用ide来管理项程很不方便，比如不方便脱离ide自动构建，于是我们写自己的ant脚本。再后
C语言简单循环示例 dcj3sjt126com c
# include <stdio.h> int main(void) { int i; int count = 0; int sum = 0; float avg; for (i=1; i<=100; i++) { if (i%2==0) { count++; sum += i; } } avg
presentModalViewController 的动画效果 dcj3sjt126com controller
系统自带(四种效果)： presentModalViewController模态的动画效果设置： [cpp] view plain copy UIViewController *detailViewController = [[UIViewController al
java 二分查找 shuizhaosi888 二分查找 java二分查找
需求：在排好顺序的一串数字中，找到数字T 一般解法：从左到右扫描数据，其运行花费线性时间O(N)。然而这个算法并没有用到该表已经排序的事实。 /** * * @param array * 顺序数组 * @param t * 要查找对象 * @return */ public stati
Spring Security（07）——缓存UserDetails 234390216 ehcache 缓存 Spring Security
Spring Security提供了一个实现了可以缓存UserDetails的UserDetailsService实现类，CachingUserDetailsService。该类的构造接收一个用于真正加载UserDetails的UserDetailsService实现类。当需要加载UserDetails时，其首先会从缓存中获取，如果缓存中没
Dozer 深层次复制 jayluns VO maven po
最近在做项目上遇到了一些小问题，因为架构在做设计的时候web前段展示用到了vo层，而在后台进行与数据库层操作的时候用到的是Po层。这样在业务层返回vo到控制层，每一次都需要从po-->转化到vo层，用到BeanUtils.copyProperties(source, target)只能复制简单的属性，因为实体类都配置了hibernate那些关联关系，所以它满足不了现在的需求，但后发现还有个很
CSS规范整理（摘自懒人图库） a409435341 html UI css 浏览器
刚没事闲着在网上瞎逛，找了一篇CSS规范整理，粗略看了一下后还蛮有一定的道理，并自问是否有这样的规范，这也是初入前端开发的人一个很好的规范吧。一、文件规范 1、文件均归档至约定的目录中。具体要求通过豆瓣的CSS规范进行讲解：所有的CSS分为两大类：通用类和业务类。通用的CSS文件，放在如下目录中：基本样式库 /css/core
C++动态链接库创建与使用你不认识的休道人 C++dll
一、创建动态链接库 1.新建工程test中选择”MFC [dll]”dll类型选择第二项"Regular DLL With MFC shared linked"，完成 2.在test.h中添加 extern “C” 返回类型 _declspec(dllexport)函数名(参数列表); 3.在test.cpp中最后写 extern “C” 返回类型 _decls
Android代码混淆之ProGuard rensanning ProGuard
Android应用的Java代码，通过反编译apk文件（dex2jar、apktool）很容易得到源代码，所以在release版本的apk中一定要混淆一下一些关键的Java源码。 ProGuard是一个开源的Java代码混淆器（obfuscation）。ADT r8开始它被默认集成到了Android SDK中。官网： http://proguard.sourceforge.net/
程序员在编程中遇到的奇葩弱智问题 tomcat_oracle jquery 编程 ide
　　现在收集一下：　　排名不分先后，按照发言顺序来的。 1、Jquery插件一个通用函数一直报错，尤其是很明显是存在的函数，很有可能就是你没有引入jquery。。。或者版本不对 2、调试半天没变化：不在同一个文件中调试。这个很可怕，我们很多时候会备份好几个项目，改完发现改错了。有个群友说的好：在汤匙
解决maven-dependency-plugin (goals "copy-dependencies","unpack") is not supported xp9802 dependency
解决办法：在plugins之前添加如下pluginManagement，二者前后顺序如下： [html] view plain copy <build> <pluginManagement