语义漂移

参考文献

Komachi M, Kudo T, Shimbo M, et al. Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms.[C]// Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Proceedings of the Conference, 25-27 October 2008, Honolulu, Hawaii, Usa, A Meeting of Sigdat, A Special Interest Group of the ACL. 2008:1011-1020.

之前总是在论文中看到“语义漂移”,但是都只是提到了这个概念,并没有详细解释,今天在这篇文章里看到了。特记录一下。
原文:

However, it is known that bootstrapping often acquires instances not related to seed instances. For example, consider the task of collecting the names of common tourist sites from web corpora. Given words like “Geneva” and “Bali” as seed instances,bootstrapping would eventually learn generic patterns such as “pictures” and “photos,” which also co-occur with many other unrelated instances. The subsequent iterations would likely acquire frequent words that co-occur with these generic patterns,such as “Britney Spears.” This phenomenon is called semantic drift (Curran et al., 2007).

翻译过来就是:

但是,众所周知的是bootstrapping通常会获取与种子无关的实例。例如对于任务“从网络语料库中收集常见景点名称”。给定词语“Geneva”(日内瓦)和“Bali”(巴黎)作为种子实例,bootstrapping最终将学习成通用模式如“pictures”和“photos”,这些与其他不相关实例同时出现。随后的迭代可能会获得与这些通用模式共同出现的频繁词语,如“Britney Spears”,这种现象被称为语义漂移。

我的个人理解是,在迭代过程中会产生一些与种子不相关的实例,然后这些不相关实例再次进入迭代,频繁产生其他不相关实例。

你可能感兴趣的:(信息抽取与问答系统)