Sentence Transformers 教程!

Sentence Transformers 教程!_第1张图片

Sentence Transformers专注于句子和文本嵌入,支持超过100种语言。利用深度学习技术,特别是Transformer架构的优势,将文本转换为高维向量空间中的点,使得相似的文本在几何意义上更接近。 

  1. 语义搜索:构建高效的语义搜索系统,找到最相关的查询结果。
  2. 信息检索与重排:在大规模文档集合中查找相关文档并重新排序。
  3. 聚类分析:将文本自动分组,发现隐藏的主题或模式。
  4. 摘要挖掘:识别和提取文本的主要观点。
  5. 平行句对挖掘:在多语言数据中找出对应的翻译句子。

pip安装:

pip install -U sentence-transformers

conda安装:

conda install -c conda-forge sentence-transformers

快速使用:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
# 加载all-MiniLM-L6-v2,这是一个在超过 10 亿个训练对的大型数据集上微调的 MiniLM 模型

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]


embeddings = model.encode(sentences)
print(embeddings.shape)

# 计算所有句子对之间的相似度
similarities = model.similarity(embeddings, embeddings)
print(similarities)

输出:

 Sentence Transformers 教程!_第2张图片

Cross Encoder

  • 计算给定文本对的相似度得分

  • 通常比Sentence Transformer模型慢,因为它需要对每一对而不是每个文本进行计算

  • 交叉编码器(Cross Encoder)经常被用来对Sentence Transformer模型的top-k个结果进行重新排序。

Cross Encoder (又名 reranker) 模型的用法与 Sentence Transformers 类似:

from sentence_transformers.cross_encoder import CrossEncoder
# 我们选择要加载的CrossEncoder模型
model = CrossEncoder("cross-encoder/stsb-distilroberta-base")

# 定义查询句子和语料库
query = "A man is eating pasta."
corpus = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin.",
    "Two men pushed carts through the woods.",
    "A man is riding a white horse on an enclosed ground.",
    "A mo

你可能感兴趣的:(人工智能,transformer,nlp,langchain,gpt-3,python)