Zoomix is an Israeli company that you probably haven't heard of though it has offices in both the UK and the Netherlands (the US is planned for next year). It is interesting because it is taking an innovative approach to data quality.
Traditionally, data quality solutions were (and are) purely based on statistics. That is, you apply a particular algorithm or rule and then the software compares the data based on that algorithm and tells you the likelihood of a match. An extension to this is to use natural language capabilities that provides context to this matching. The upside of this is that the software will know that the “New York Yankees”, for example, represents a single term rather than three separate words. The downside is that you have to develop that natural language capability for each individual language. What Zoomix has done is go beyond natural language capability by using a self-learning dictionary that works with any language.
Suppose you are matching product data then, since the software has classification capabilities built-in, it is able to recognise that “wit” and “weiß” and “white” are potentially all the same—the software will suggest that these form a match and once you okay this then the dictionary will know that fact thereafter and apply it automatically.
The second thing that makes Zoomix different is that its whole methodology is to move away from a project based approach. Traditionally, you had a project team that developed data quality rules to ensure that the data was fit for purpose and then you loaded the data into the live system and then, in effect, you start all over again because you have no way of guaranteeing the accuracy of further input data. Zoomix's view (and it is not alone in this) is that we need to move to environment in which data quality is assured up front and on a continuous basis.
The product, which is metadata driven, actually includes five steps, which can be used independently and in any order:
A number of the other particular features provided include support for multiple category hierarchies, the ability for items to be in multiple hierarchies (for example, a diving watch to be in both the diving and watch categories), the ability to modify standards (that is, you are not restricted to stated UNSPS categories, say, but can amend these if appropriate), and accuracy matrices that continuously check the accuracy of results. This last is important because it helps users (and the product is aimed at primarily at end users who do not want or need the involvement of IT) to understand what is happening and to have confidence in the system, which should lead to further automation of data quality.
That, primarily, is what Zoomix is about: the automation of data quality. And about time too.