tesseract 预处理
by Berk Kaan Kuguoglu
通过Berk Kaan Kuguoglu
Previously, on How to get started with Tesseract, I gave you a practical quick-start tutorial on Tesseract using Python. It is a pretty simple overview, but it should help you get started with Tesseract and clear some hurdles that I faced when I was in your shoes. Now, I’m keen on showing you a few more tricks and stuff you can do with Tesseract and OpenCV to improve your overall accuracy.
之前,在“ 如何开始使用Tesseract”上 ,我为您提供了使用Python进行Tesseract实用的快速入门教程。 这是一个非常简单的概述,但是它应该可以帮助您开始使用Tesseract并清除我在穿鞋时遇到的一些障碍。 现在,我很想向您展示一些Tesseract和OpenCV可以提高整体准确性的技巧和东西。
In the previous story, I didn’t bother going into details for the most part. But if you liked the first story, here comes the sequel! So where did we leave off?
在上一个故事中 ,我大部分时间都没有去关注细节。 但是,如果您喜欢第一个故事,那么续集就来了! 那么我们从哪里出发呢?
Ah, we had a brief overview of rescaling, noise removal, and binarization. Now, it’s time to get down to details and show you a few settings you can play with.
嗯,我们简要介绍了缩放,降噪和二值化。 现在,是时候详细介绍一下您可以使用的一些设置了。
The images that are rescaled are either shrunk or enlarged. If you’re interested in shrinking your image, INTER_AREA is the way to go for you. (Btw, the parameters fx and fy denote the scaling factor in the function below.)
重新缩放的图像将缩小或放大。 如果您有兴趣缩小图像,可以使用INTER_AREA 。 (顺便说一句,参数fx和fy表示下面函数中的比例因子。)
img = cv2.resize(img, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_AREA)
On the other hand, as in most cases, you may need to scale your image to a larger size to recognize small characters. In this case, INTER_CUBIC generally performs better than other alternatives, though it’s also slower than others.
另一方面,在大多数情况下,您可能需要将图像缩放到更大的尺寸才能识别小字符。 在这种情况下, 尽管INTER_CUBIC的速度也比其他方法慢,但它通常比其他方法要好。
img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
If you’d like to trade off some of your image quality for faster performance, you may want to try INTER_LINEAR for enlarging images.
如果您想权衡某些图像质量以获得更快的性能,则可以尝试使用INTER_LINEAR来放大图像。
img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_LINEAR)
It’s worth mentioning that there are a few blur filters available in the