openai-gpt_GPT-3报告存在的问题

openai-gpt

I’ve recently seen a massive number of articles about GPT-3, on Medium and elsewhere. I even wrote one. The language model is a significant development in AI, so it’s only natural that writers want to share their excitement with the world.

我最近在Medium和其他地方看到了大量关于GPT-3的文章。 我什至写了一个 。 语言模型是AI的重大发展,因此,作家与世界分享自己的兴奋是很自然的。

Here’s the problem: the ability of GPT-3 — namely the quality of its writing — is often exaggerated by published samples. In fact, there are not one, but two filters keeping the AI’s worst results from wide dissemination.

这就是问题所在:GPT-3的能力(即其写作质量)经常被已发布的样本夸大。 实际上,没有一个过滤器,但是有两个过滤器使AI的最坏结果无法广泛传播。

Selection bias wouldn’t be a problem if any interested reader could access the GPT-3 API and make their own observations of its ability. However, access is currently severely limited. (AI Dungeon is often used to test GPT-3 by those of us without the full version, but its creator has recently outlined ways backdoor access to GPT-3 is being prevented.)

如果有兴趣的读者可以访问GPT-3 API并对其功能进行自己的观察,那么选择偏向就不会成为问题。 但是,当前访问受到严重限制。 ( AI地牢通常被我们那些没有完整版本的人用来测试GPT-3,但其创建者最近概述了如何防止对GPT-3进行后门访问。)

When reporting — and I use that term in its broadest possible interpretation to mean any writing about GPT-3 — is the only source of public information, selection biases ought to be considered in our understanding of the product. Here, I outline the obvious bias, and a less-obvious bias which exacerbates the issue.

报告时(我在最广泛的解释中使用该术语表示任何有关GPT-3的文字)是唯一的公共信息来源,因此,在理解产品时应考虑选择偏见。 在这里,我概述了明显的偏见,以及不太明显的偏见,这加剧了该问题。

1.选择写作样本以提高质量 (1. Writing samples are selected for quality)

Say I’m writing an informative piece on GPT-3. I want to demonstrate that it can put together coherent strings of sentences, so I give it a prompt and examine the output.

假设我正在撰写有关GPT-3的内容丰富的文章。 我想证明它可以将连贯的句子串在一起,所以我给了它一个提示,并检查了输出。

If I don’t like what I see, I’m likely to try again with a slightly different (perhaps longer) prompt. Even if I’m not actively selecting particular sentences that suit the purpose of my article, massaging the output creates a biased sample of writing that is not representative of GPT-3’s overall quality.

如果我不喜欢自己看到的内容,则可能会再次尝试使用稍有不同(可能更长)的提示。 即使我没有积极选择适合我文章目的的特定句子,但对输出进行按摩也会产生有偏见的写作样本,不能代表GPT-3的整体素质。

In the context of creating a narrative about the AI, it makes sense to showcase its best work rather than a fair representation of its limitations. This is the first problem.

在创建有关AI的叙述的背景下,有意义的是展示其最佳作品,而不是公平地表述其局限性。 这是第一个问题。

2.文章越酷,观看次数越多 (2. The cooler the article, the more views)

Consider the case where something does gets written about a function GPT-3 cannot perform. It might be a list of writing fails, or code that doesn’t compile.

考虑以下情况: 确实编写了有关GPT-3 无法执行的功能的信息。 可能是写入失败或代码未编译的列表。

To me, that wouldn’t be an interesting piece, and I suspect it wouldn’t intrigue others either. I’m sure Tweets, Reddit posts, and longer articles detailing GPT-3’s unexpected failures are out there, but the fact of the matter is they’re not getting read.

对我来说,那不是件有趣的事,而且我怀疑它也不会吸引其他人。 我敢肯定,这里有Tweets,Reddit帖子和更长的文章,详细介绍了GPT-3的意外故障,但是事实是它们没有被阅读

On the surface, this doesn’t seem like a problem. It definitely isn’t necessary to read about everything that GPT-3 can’t do. The real problem is when positive results are favoured over negative ones for the same task. For example, if someone reported positive results for getting GPT-3 to write a legal document, this would undoubtedly receive more attention than an instance where the AI fails to generate a coherent document.

从表面上看,这似乎不是问题。 绝对没有必要阅读GPT-3不能做的所有事情。 真正的问题是,对于同一任务,正面结果胜于负面结果。 例如,如果有人报告说让GPT-3撰写法律文件取得了积极成果,那么毫无疑问,这将比AI无法生成连贯文件的情况受到更多关注。

In essence, the way GPT-3 reporting currently works is analogous to running scientific trials without pre-registration. Publication bias, where statistically insignificant results don’t get published, can cause absurd findings to be accepted as solid research.

从本质上讲,GPT-3报告的当前工作方式类似于无需预先注册即可进行的科学试验。 出版偏见不会发表统计上微不足道的结果,这可能会导致荒谬的发现被接受为可靠的研究。

To be clear, I don’t think there is an imperative for writers to publish more negative results from GPT-3. There is, however, an obligation to contextualize samples with the way in which they were generated and how many negative results were obtained in the process.

需要明确的是,我认为作者没有必要发布GPT-3的更多负面结果。 但是,有义务根据样本的生成方式和在此过程中获得多少负面结果来对样本进行情境化。

After all, human selection — on the level of individual pieces of writing or how the larger body of work gets consumed — of an AI’s output is a combination of our intelligence with that of a computer program, and that’s a beautiful thing.

毕竟,人工智能的输出是人工选择,无论是在单个作品的层次上还是在更大的工作量上,人类的选择都是我们的智慧与计算机程序的结合,这是一件很美的事情。

翻译自: https://towardsdatascience.com/the-problem-with-gpt-3-reporting-93c7b5b58400

openai-gpt

你可能感兴趣的:(python,java)