基于高光谱成像的分类算法在分拣系统中高效处理多类目的开发
原文链接
When dealing with practical applications of hyperspectral imaging, the development of efficient, fast and flexible classification algorithms*/'ælgərɪð(ə)mz/* is of the utmost importance.
在处理高光谱图像的实际应用时,开发高效、快速和灵活的分类算法是极其重要的。
Indeed, the optimal classification method should be able, in a reasonable time, to maximise the separation between the classes of interest and, at the same time, to correctly reject possible outlier samples.
实际上,最佳的分类方法应该能够在合理的时间去增加兴趣类别之间的距离,并且同时正确地排除可能的异常样本。
To this aim, a new extension of Partial Least Squares Discriminant Analysis (PLS-DA), namely Soft PLS-DA, has been implemented.
为此,已实施了偏最小二乘分析(PLS-DA)的新扩展,即Soft PLS-DA。
The basic engine of Soft PLS-DA is the same as PLS-DA, but class assignment is subjected to some additional criteria which allow samples not belonging to the target classes to be identified and rejected.
Soft PLS-DA的基本引擎和PLS-DA相同,但是类别分配要遵循一些附加的条件,从而辨别和排除一些不属于目标分类的样本。
The proposed approach was tested on a real case study of plastic waste sorting based on near infrared hyperspectral imaging.
在基于近红外高光谱图像的塑料垃圾分类的实例研究中对提出的方法进行了测试。
Household plastic waste objects made of the six recyclable plastic polymers/'pɔliməs/ commonly used for packaging were collected and imaged using a hyperspectral camera mounted on an industrial sorting system.
使用安装在工业分类系统上的高光谱相机收集由六种常用于包装的可回收塑料聚合物制成的家用塑料废品,并对其成像。
In addition, paper and not recyclable plastics were also considered as potential foreign materials that are commonly found in plastic waste.
此外,纸和不可回收的塑料也被视为潜在的异物,通常在塑料废料中发现。
For classification purposes, the Soft PLS-DA algorithm was integrated into a hierarchical classification tree for the discrimination of the different plastic polymers.
出于分类的目的,将Soft PLS-DA算法集成到用于区分不同塑料聚合物的分层分类树中。
Furthermore, Soft PLS-DA was also coupled with sparse-based variable selection to identify the relevant variables involved in the classification and to speed up the sorting process.
此外,Soft PLS-DA还与基于稀疏的变量选择相结合,以识别分类中涉及的相关变量并加快分类过程。
The tree-structured classification model was successfully validated both on a test set of representative spectral of each material for a quantitative evaluation, and at the pixel level on a set of hyperspectral images for a qualitative assessment.
树状分类模型已在每种材料的代表性光谱测试集上成功进行了验证,以进行定量评估,并在一组高光谱图像的像素级上进行了定性评价。
Keywords: PLS-DA, multivariate classification, hierarchical /ˌhaɪəˈrɑːkɪkl/ classification, sparse methods, feature selection, plastic*/ˈplæstɪk/* sorting
关键词: PLS-DA ,多元分类 ,层次分类, 稀疏方法 ,特征选择, 塑性排序
Over the past decades, Hyperspectral Imaging(HSI) has gained increasing attention from industries interested in the implementation of automated sorting systems to solve an umber of different problems.
在过去的几十年里,高光谱成像(HSI)越来越受到工业界的关注,他们对自动分拣系统的实现感兴趣,想要解决许多不同的问题。
Indeed,HSI has found awide range of applications in the food industry, including the quality evaluation and safety assessment of several food products ,such as fruits and vegetables,meat,cereals and dairy products.
实际上,高光谱成像技术在食品工业中已经得到了广泛的应用,包括一些食品的质量评估和安全评价,比如水果和蔬菜,肉类,谷物和奶制品。
Moreover, other manufacturing environments, such as the pharmaceutical/ˌfɑːməˈsuːtɪkl/ industry , have employed real-time HSI systems for quality control and process monitoring in the frame of the process analytical technology.
此外,其他制造环境,比如制药行业已经使用了框架内过程分析技术中的高光谱成像系统的质量控制和过程监督。
Another relevant field of application of HSI is represented by the recycling industry, where hyperspectral sensors are used to separate end-of-life objects,such as plastic,paper or electronic waste, according to material type.
HIS的另一个相关应用领域是回收行业,根据材料类型,高光谱传感器用于分离报废对象,例如塑料,纸张或电子垃圾。
In these contexts, HSI can be considered as a step forward with respect to traditional spectroscopic/,spektrə’skɔpik/ techniques, which allow fast and non-destructive characterisation of the chemical properties of the analysed samples.
在这些情况下,相对于传统的光谱技术,HSI可以看作是向前迈出的一步,传统的光谱技术可以对分析样品的化学性质进行快速且无损的表征。
In fact, HSI systems couple these advantages with the possibility of also visualising the spatial distribution of the chemical features of interest within the sample surface.
事实上,高光谱成像系统将这些优势与还可以可视化样品表面内目标化学特征的空间分布相结合。
Furthermore, in sorting systems, HSI can also be employed to quickly identify the chemical composition of homogeneous objects moving on a conveyor belt, and to distinguish them from samples with different composition.
此外,在分拣系统中,高光谱成像还可以用于快速识别在传送带上移动的均质物体的化学成分,并将其与具有不同成分的样品区分开。
In practical situations,hyperspectral imaging can be applied to address complex classification issues,where the sorting problem under investigation requires the discrimination of several classes at the same time, with some classes sharing similar features.
在实际情况下,高光谱成像技术能够应用于解决复杂的分类问题,所研究的排序问题需要同时区分多个类,其中有些类具有相似的特征。
This can be easily managed by using HSI systems, since with a single measurement, i.e. with the acquisition of a single hyperspectral image, it is possible to have a wide range of information.
使用高光谱成像系统可以很容易地做到这一点,因为通过单个测量,即通过获取单个高光谱图像,可以获得广泛的信息。
However, in order to meet the needs of real-time applications, it is necessary to identify classification strategies able to handle a huge amount of spectral data,providing reliable results in short computational times.
然而,为了满足实时应用的需要,有必要识别能够处理大量光谱数据的分类策略,在短计算时间内提供可靠的结果。
When dealing with multiple classes, this issue can be addressed using a tree-structured classification model,where each branching (tree node) corresponds to a local classification model.
在处理多个类时,可以使用树结构分类模型来解决这个问题,其中每个分支(树节点)对应一个本地分类模型。
In this manner, classification is performed considering a top-down approach, where the samples are initially assigned to general macro-categories, and then each macro-class is split into increasingly specific categories,until reaching the classes of interest.
通过这种方式,采用自顶向下的方法来执行分类,在这种方法中,最初将样本分配给一般的宏观类别,然后将每个宏观类别划分为越来越具体的类别,直到达到感兴趣的类别。
Another relevant issue to be faced in practical applications of HSI in sorting systems is related to the fact that,generally, it is not easy to have a strict control of the input stream in order to avoid the presence of foreign objects, i.e. objects not belonging to the target classes of the specific application.
在分类系统中HSI的实际应用中要面对的另一个相关问题是,通常很难对输入流进行严格控制以避免异物的存在,即不属于物体 特定应用程序的目标类。
In this context, the availability of algorithms able to maximise the discrimination between the categories of interest and, at the same time, to identify possible foreign materials is of the utmost importance.
在这种情况下,最重要的是能够获得能够最大程度地区分所关注类别之间的区别并同时识别可能的异物的算法。
Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most widely used methods for multivariate classification of hyperspectral data.
偏最小二乘判别分析(PLS-DA)是对高光谱数据多元分类最广泛的使用方法之一。
Basically, PLS-DA is an extension of the PLS algorithm, which aims at identifying a new set of variables, named Latent Variables
(LVs), by maximising the between-classes variance.
基本上,PLS-DA是PLS算法的一种延伸,该算法旨在辨别一组称为 潜在变量(LVs)的新变量,通过最大化类间差异。
Class membership is coded using a dummy Y matrix, and the assignment of unknown samples is based on the a posteriori probability associated with the corresponding Y predicted values.
使用虚拟Y矩阵对类成员资格进行编码,未知样本的分配基于与相应的Y预测值相关的后验概率。
The standard PLS-DA approach assigns a sample to the class for which it has the higher a posteriori probability, resulting in unknown samples always being assigned to one of the target classes.
标准的PLS-DA方法将样本分配给后验概率较高的类别,从而导致始终将未知样本分配给目标类别之一。
Conversely, the possibility of having unassigned samples is one of the major advantages of the so-called class-modelling techniques, which are essentially based on describing each single class independently from the others, and then verifying whether an unknown sample is compliant or not with the characteristics of each class of interest.
相反,拥有未分配样品的可能性是所谓的类建模技术的主要优势之一,该技术基本上基于彼此独立地描述每个单个类,然后验证未知样品是否兼容,或每个不感兴趣类别的特征。
In this manner, it is possible that a new unknown sample is rejected from all the class models, resulting in an unassigned sample.
以此方式,有可能所有类模型都拒绝了一个新的未知样本,从而导致未分配样本。
Soft Independent Modelling of Class Analogy /əˈnælədʒi/(SIMCA) is the most common class-modelling method.
类比的软独立建模(SIMCA)是最常见的类建模方法。
It calculates local Principal Component Analysis (PCA) models for each considered class, which are used to define class boundaries based on the distances both in the score space (Hotelling’s T2) and in the residual/rɪ’zɪdjʊə/ space (Qresiduals).
它为每个要考虑的类别计算局部主成分分析(PCA)模型,该模型用于根据分数空间(Hotelling T2)和残差空间(Qresiduals)中的距离来定义类别边界。
Notwithstanding the advantages of class-modelling methods like SIMCA,they can provide poor classification results when the modelled classes are quite overlapped, since the model is not oriented towards the discrimination of the considered categories.
尽管像SIMCA这样的类建模方法有很多优点,但是当建模的类完全重叠时,它们会提供较差的分类结果,这是因为该模型并不针对所考虑类别的区分。
Given these considerations, it is reasonable to assume that a classification algorithm to be efficiently employed in sorting systems should comprise the advantages of both classification techniques and of class-modelling methods, i.e. it should be able to maximise the discrimination between the categories of interest and to recognise and reject outlier samples at the same time.
考虑到这些考虑因素,可以合理地假设要在分类系统中有效使用的分类算法应同时包括分类技术和类建模方法的优点,即应能够最大程度地区分兴趣类别之间的区别, 同时识别和拒绝异常样本。
To this aim, in the present paper a modified version of the PLS-DA algorithm, namely Soft PLS-DA, is proposed.
为此,在本文中提出了PLS-DA算法的改进版本,即Soft PLS-DA。
The basic principle of Soft PLS-DA is the same as PLS-DA, but class assignment is performed by fixing additional limits both on the Y predicted values and on the Q residuals.
Soft PLS-DA的基本原理与PLS-DA相同,但是通过在Y预测值和Q残差的附加限制来执行类分配。
In this manner, the classification model is built by maximising the differences between the modelled classes; at the same time, the additional limits allow the rejection of samples belonging to unexpected categories and relegation /,reli’geiʃən/of them to a general category of unassigned samples.
通过这种方式,通过最大化建模类之间的差异来构建分类模型。 同时,附加限制排除属于意外类别的样本,并将其降级为未分配样本的一般类别。
The effectiveness of Soft PLS-DA algorithm was tested on a case study related to the implementation of a near infrared (NIR) hyperspectral imaging system for plastic waste sorting.
在与用于塑料废物分类的近红外(NIR)高光谱成像系统的实施相关的案例研究中,对Soft PLS-DA算法的有效性进行了测试。
Indeed, the different plastic polymers have a specific spectral fingerprint in the NIR range and optical sorting is commonly used to separate them.
实际上,不同的塑料聚合物在NIR范围内具有特定的光谱指纹,通常使用光学分选法将其分离。
The goal of our study consisted in the implementation of a classification method able to effectively discriminate paper and six recyclable plastic polymers commonly used for packaging, and to correctly reject objects belonging to non-target classes, such as non-recyclable plastics.
我们研究的目标在于实施一种分类方法,该方法能够有效地区分纸和六种通常用于包装的可回收塑料聚合物,并正确剔除属于非目标类别的物体,例如不可回收塑料。
The present manuscript is structured as follows:
The next section reports the theoretical background of the standard classification approach based on PLS-DA and a detailed description of the novel Soft PLS-DA algorithm.
本手稿的结构如下:
下一节将介绍基于PLS-DA的标准分类方法的理论背景,并对新型的Soft PLS-DA算法进行详细说明。
Material and methods describes the plastic dataset,the procedure followed for image acquisition and elaboration/ɪˌlæbəˈreɪʃn/ together with the different steps of data analysis.
材料和方法描述了塑料数据集,图像获取和细化以及数据分析的不同步骤。
Results shows the classification results obtained using Soft PLS-DA algorithm both considering the full wavelength range and sparse-based variable selection, and also the results of the final implementation of the classification tree.
结果显示了使用Soft PLS-DA算法获得的分类结果,同时考虑了整个波长范围和基于稀疏的变量选择,以及分类树的最终实现结果。
Finally, we report the general conclusions of this study.
最后,我们报告了这项研究的一般结论。
PLS-DA is as an extension of PLS regression adapted to operate in a classification framework.
PLS-DA是PLS回归的扩展,适用于分类框架。
Similarly to PLS, the independent matrix X is regressed against the dependent matrix Y by calculating a new set of LVs, which maximise the covariance between X and Y matrices.
与PLS相似,通过计算一组新的LVs,使独立矩阵X相对于从属矩阵Y回归,这将最大化X和Y矩阵之间的协方差。
In more detail, both X and Y matrices can be decomposed as follows:
更详细地说,可以将X和Y矩阵分解如下:
where T, P and E represent the score matrix, the loading matrix and the residuals matrix of X, respectively;
其中T、P、E分别表示X的得分矩阵、加载矩阵和残差矩阵;
where U is the score matrix, C is the loading matrix and G is the residuals matrix referred to Y.
其中U是得分矩阵,C是加载矩阵,G是引用Y的残差矩阵。
According to PLS, the object variation of the X-block expressed by the score matrix T can be used to describe Y; therefore Equation 2 can be re-written as follows:
根据PLS,得分矩阵T表示的X块的对象变化可用于描述Y; 因此,等式2可以重写如下:
In particular, the decomposition of X has to be optimised in a manner that T accounts for the variation in X which allows the best description of Y.
特别是,X的分解必须以T解释X的变化的方式进行优化,从而可以最好地描述Y。dd
To this aim, for each LV a weight vector (w) is calculated which weights the original variables according to their contribution in explaining the Y matrix. Given these considerations, the estimate of Y (Ŷ) can be calculated as follows:
为此,对于每个LV,计算权重向量(w),其根据原始变量在解释Y矩阵中的作用对原始变量进行加权。 考虑到这些因素,Y(Ŷ)的估算值可以计算如下:
where W is the weights matrix and B is the matrix of regression coefficients.
其中W为权值矩阵,B为回归系数矩阵。
In the case of PLS-DA, the Y matrix consists of a dummy matrix with as many rows as the number of samples and as many columns as the number of considered classes.
在PLS-DA的情况下,Y矩阵由一个虚拟矩阵组成,该虚拟矩阵的行数与样本数一样多,列数与所考虑的类数一样多。
This dummy matrix expresses class membership of each sample with binary coding: a value equal to 1 indicates that an object belongs to the class, while a value equal to 0 refers to samples not belonging to the class.
该伪矩阵用二进制编码表示每个样本的类别成员资格:等于1的值表示对象属于该类别,而等于0的值表示不属于该类别的样本。
Once the PLS model has been calibrated, class assignment of unknown samples is based on the y values estimated for each class (ŷ).
一旦校准了PLS模型,未知样本的类别分配将基于每个类别(ŷ)估计的y值。
These values will not be exactly equal to 0 or 1, therefore it is necessary to establish a threshold value so that a new sample is assigned to a defined class only if its ŷ value is greater than the threshold for that class.
这些值将不完全等于0或1,因此有必要建立一个阈值,以便仅当新样本的ŷ值大于该类别的阈值时,才将其分配给已定义的类别。
The threshold is usually calculated using the Bayes theorem under the assumption that the estimated values for each class follow a Gaussian distribution, and these distributions are used to calculate the a posteriori probability that a sample belongs to a given class.
通常在假设每个类别的估计值遵循高斯分布的前提下,使用贝叶斯定理来计算阈值,并且使用这些分布来计算样本属于给定类别的后验概率。
In particular, for each class the threshold value corresponds to the ŷ value at which the number of false positives and false negatives is minimised, that is the point where the two probability distributions cross (i.e. the point where the a posteriori probability values for the two classes are the same).
特别是,为每个类的阈值对应于ŷ值假阳性和假阴性的数量最小化,这是两个概率分布的交叉(即后验概率值的点两个类是相同的)。
Usually, class assignment of an unknown sample can be done considering two different approaches: either choosing the class with the highest probability, or comparing the predicted ŷ values with the corresponding threshold values.
通常,可以考虑两种不同的方法来完成未知样本的类别分配:选择具有最高概率的类别,或者将预测的ŷ值与相应的阈值进行比较。
The former strategy represents the standard discriminant approach, where samples are always assigned to one of the modelled classes.
前一种策略代表标准的判别方法,在该方法中,始终将样本分配给其中一个建模类。
Conversely, if the latter approach is used, a sample can have ŷ values higher than the corresponding threshold for more than one class, or lower than the threshold for all the classes.
相反,如果使用后一种方法,则对于一个以上的类别,样本的ŷ值可以高于相应的阈值,或者对于所有类别的样本,其ŷ值可以低于其阈值。
In both these cases, the sample cannot be assigned to any class.
在这两种情况下,都无法将样本分配给任何类别。
However, in the case of only two classes, these two approaches converge to the same results and standard PLS-DA algorithm will always attribute a sample to one of the classes.
但是,在只有两个类别的情况下,这两种方法收敛到相同的结果,并且标准PLS-DA算法始终将样本归于其中一个类别。
The standard discriminant approach has often been criticised due to its limited ability to correctly handle new objects not belonging to the target classes.
标准的鉴别方法经常受到批评,因为它在正确处理不属于目标类的新对象方面能力有限。
In order to overcome this issue, extensions of PLS-DA have been proposed in the literature, which incorporate a rejection option in the classification rule.
为了克服这个问题,文献中已经提出了对PLS-DA的扩展,在分类规则中加入了排除选项。
These methods usually consist in considering PLS-DA as a data compression method rather than a classification strategy, and class assignment is performed with a further class-modelling step considering distance-based metrics calculated on the PLS scores16 or on the PCA scores obtained from the decomposition of the Ŷ matrix.
这些方法通常包括将PLS-DA视为一种数据压缩方法而不是一种分类策略,并且通过考虑在PLS分数16或从PLS分数获得的基于PCA分数计算出的基于距离的指标,通过进一步的类别建模步骤来进行类别分配。 分解Ŷ矩阵。
In this manner, class assignment is performed in a rather complex multi-step procedure.
以这种方式,类分配以相当复杂的多步骤过程执行。
Conversely, an alternative approach consists in calculating a classification rule considering confidence intervals around the ŷ values of each class, and rejecting samples outside these intervals.
相反,另一种方法是计算一个分类规则,考虑每个类的ŷ值附近的置信区间,并拒绝这些间隔之外的样本。
Furthermore, it has to be considered that diagnostics based on Q residuals can represent an effective tool to identify outlier samples, i.e. samples with properties different from those of the samples used for model calibration.
此外,必须考虑基于Q残差的诊断可以代表一种识别异常样本的有效工具,即具有与用于模型校准的样本性质不同的样本。
These diagnostics are widely used in classmodelling, for example in the SIMCA algorithm. However,Q residuals are rarely incorporated in classification rules based on PLS-DA algorithm, even if the computation of Q scores from the outcomes of the PLS model is straight forward according to the following equation:
这些诊断程序广泛用于类建模,例如SIMCA算法。 然而,即使根据以下等式直接根据PLS模型的结果对Q分数进行计算,Q残差也很少包含在基于PLS-DA算法的分类规则中:
In the present study, a novel algorithm based on PLS-DA has been developed in order to combine the advantages of classical discriminant analysis with those of class modelling techniques; for these reasons it has been named Soft PLS-DA. The main idea behind Soft PLS-DA is to have a flexible and simple classification tool able to maximise the separation between the considered classes and, at the same time, to effectively identity possible outlier objects (i.e. objects that do not belong to the classes included in the classification model), which will be automatically not assigned to any class.
为了结合经典判别分析和类建模技术的优点,开发了一种基于PLS-DA的新算法。 由于这些原因,它被命名为Soft PLS-DA。 Soft PLS-DA的主要思想是拥有一个灵活而简单的分类工具,能够最大程度地考虑类之间的分离,同时可以有效地识别可能的异常对象(即,不属于所包含类的对象) (在分类模型中),则不会自动将其分配给任何类别。
In the same manner as PLS-DA, a PLS model is calculated between the X matrix and the dummy Y matrix, in order to maximise the differences between the considered classes. Then, class assignment of unknown samples is based on some additional criteria which allow outlier samples to not be assigned to the target classes.
以与PLS-DA相同的方式,在X矩阵和虚拟Y矩阵之间计算PLS模型,以使所考虑类别之间的差异最大。 然后,未知样本的类别分配是基于一些其他准则的,这些准则不允许将异常样本分配给目标类别。
According to the Soft PLS-DA decision rules, class assignment of a new sample to a defined class is subjected to the following criteria:
根据Soft PLS-DA决策规则,将新样本的类别分配给已定义的类别应遵循以下标准:
having Q residuals falling inside the 99.9% confidence limit of the model.31 The 99.9% confidence limit has been chosen in order to set boundaries large enough to consider as much as possible the variability of the different classes and, at the same time, being able to
exclude samples with a very low fit to the model;
Q残差落在模型的99.9%置信限内。选择99.9%置信限是为了将边界设置得足够大,以便尽可能多地考虑不同类别的可变性,同时能够 至
排除与模型拟合度很低的样本;
having ŷ values falling inside an acceptability range for the considered class. More in detail, in addition to the threshold value calculated by standard PLS-DA for each class g (ytsh1,g), also an upper limit (ytsh2,g) on the ŷ values has been introduced, which is calculated as
follows:
ŷ值落在所考虑类别的可接受范围内。 更详细地,除了由标准PLS-DA为每个类别g(ytsh1,g)计算的阈值之外,还引入了ŷ值的上限(ytsh2,g),其计算公式如下:
where my^,g and sy^,g are the mean and the standard deviation of the ŷ values of class g calculated on the training set samples.
Therefore, in order to be assigned to class g, an unknown sample must have a ŷ value ranging between ytsh1,g and ytsh2,g. The upper limit imposed on the ŷ values allows objects found at the extremes of the Gaussian probability density functions (PDFs) to be rejected; these usually have low values in the PDFs but high a posteriori probability for one class according to the Bayes rule. The
upper limit was set based on preliminary tests performed on some representative images;
因此,为了分配给类g,未知样本的ŷ值必须在ytsh1,g和ytsh2,g之间。 ŷ值的上限使在高斯概率密度函数(PDF)的极端处找到的对象被拒绝; 这些在PDF中通常具有较低的值,但根据贝叶斯规则,一类的后验概率较高。 上限是根据对一些代表性图像进行的初步测试确定的;
for classification problems with more than two classes, the samples must be unambiguously assigned only to one class.
Samples that do not match all the three criteria are not assigned to any class and are labelled as “not assigned” (NA). In this manner, Soft PLS-DA allows boundaries to be drawn around each modelled class which maximise the discrimination between the categories of interest and minimise possible false positives due to ambiguous classifications or to outlier samples.
不符合所有三个条件的样本将不分配给任何类别,并标记为“未分配”(NA)。 以这种方式,Soft PLS-DA允许在每个建模类周围绘制边界,从而最大程度地关注类别之间的区别,并最大程度地减少由于分类不明确或样本异常导致的假阳性。
In the present study, we have considered the recyclable plastic polymers mainly used for packaging, including polyethylene terephthalate (PET), polystyrene (PS), polyvinyl chloride (PVC), polypropylene (PP) and polyethylene (PE), which in turn can be further subdivided in high-density polyethylene (HDPE) and low-density polyethylene (LDPE).
在本研究中,我们考虑了主要用于包装的可回收塑料聚合物,包括聚对苯二甲酸乙二醇酯(PET),聚苯乙烯(PS),聚氯乙烯(PVC),聚丙烯(PP)和聚乙烯(PE), 可以进一步细分为高密度聚乙烯(HDPE)和低密度聚乙烯(LDPE)。
Different plastic objects made of the considered polymers have been collected form household waste. In addition, samples composed of paper and of other types of non-recyclable plastics (OTHER), e.g. acrylonitrile butadiene styrene (ABS) and polylactic acid (PLA), were also considered as possible foreign materials that can be found in plastic municipal waste.
从家庭垃圾中收集了由考虑的聚合物制成的不同塑料物品。 此外,还包括纸和其他类型的不可回收塑料(OTHER)制成的样品。 丙烯腈丁二烯苯乙烯(ABS)和聚乳酸(PLA)也被认为是可能存在于塑料垃圾中的异物。
The different samples have been manually sorted into the corresponding categories based on the Resin Identification Code (RIC) reported on the objects.32 The RIC is an international coding system comprising a set of symbols (labels and numbers) present on plastic products and indicating the polymer of which they are composed.
已根据对象上报告的树脂识别码(RIC)将不同样品手动分类为相应的类别。RIC是一种国际编码系统,包括塑料产品上的一组符号(标签和数字),并指出 它们组成的聚合物。
For example, according to RIC, coding number 1 is associated to PET, number 2 is associated to HDPE, number 3
is associated to PVC etc.
例如,根据RIC,编号1与PET关联,编号2与HDPE关联,编号3与PVC等相关联
图像采集
The collected waste samples have been acquired with an industrial sorting system consisting of a NIR line scanning hyperspectral camera (KUSTA1.9MSI, LLA Instruments) mounted over a black conveyor belt and equipped with an InGaAs detector array and Zeiss f/2.4, 10mm optical lens. Image acquisition was performed with a frame rate equal to 644 Hz and the speed of the conveyor belt was equal to 0.84 m s–1. Illumination was provided by halogen light bulbs positioned in two parallel illumination rows slightly tilted towards each other (PMAmsi, LLA Instruments). The hyperspectral images were acquired in the NIR range from 1330nm to 1900nm with a spectral resolution of 6nm.
收集的废物样品已通过工业分类系统获取,该系统包括安装在黑色传送带上并配备InGaAs检测器阵列和Zeiss f / 2.4、10mm光学镜头的NIR线扫描高光谱相机(KUSTA1.9MSI,LLA Instruments) 。 以等于644 Hz的帧频执行图像采集,并且传送带的速度等于0.84 m s-1。 卤素灯泡位于彼此平行倾斜的两个平行照明行中(PMAmsi,LLA仪器),以提供照明。 在1330nm至1900nm的NIR范围内获取高光谱图像,光谱分辨率为6nm。
The samples were acquired in two acquisition phases conducted on different days. In the first phase, hyperspectral images containing objects made of the same material were acquired. In more detail, two images of different objects were acquired for each type of material, for a total of 16 hyperspectral images (= 8 materials 2 replicated images). These images were used as training images in the subsequent elaboration steps, in order to obtain a library of representative spectra for each material type
在不同的日期进行的两个采集阶段中采集样品。 在第一阶段,获取包含由相同材料制成的物体的高光谱图像。 更详细地,对于每种类型的材料,获取了两个不同对象的图像,总共获得了16个高光谱图像(= 8个材料2个复制图像)。 这些图像在后续的详细步骤中用作训练图像,以获得每种材料类型的代表性光谱库
In the second acquisition phase, hyperspectral images containing samples of two different materials were acquired considering all the possible combinations between the material types under investigation. On the whole, 56 hyperspectral images have been obtained in
this phase, resulting from two replicate images for each combination…
在第二个采集阶段,考虑到正在研究的材料类型之间的所有可能组合,采集了包含两种不同材料的样品的高光谱图像。 总体而言,此阶段已获得56张高光谱图像,这是由每个组合的两个重复图像产生的。
All the hyperspectral images, acquired on the objects positioned on the moving conveyor belt, have size equal to 41 row pixels 500 column pixels 96 wavelengths. For each image, the raw intensity counts were converted into reflectance units by means of an internal calibration procedure based on the measure of the dark current and of a white high reflectance standard. Dark current was
measured by closing the shutter of the camera, while the white reference consisted of an aluminium frame holding the calibration material with an average remission factor equal to 83.0%.
在位于移动的传送带上的物体上获取的所有高光谱图像的大小等于41行像素500列像素96个波长。 对于每个图像,通过内部校准程序,基于暗电流和白色高反射率标准的测量,将原始强度计数转换为反射率单位。 通过关闭相机的快门来测量暗电流,而白色参考由铝制框架组成,该框架固定着校准材料,平均反射率等于83.0%。
图像细化
Initially, the 16 images acquired during the first acquisition phase were analysed by means of PCA after mean centring as data preprocessing, in order to segment the pixels of the background (black conveyor belt) from those belonging to the samples.
最初,在平均居中之后,通过PCA分析在第一个采集阶段中采集的16个图像,将其作为数据预处理,以便从属于样本的背景像素中分离出背景像素(黑色传送带)。
After background segmentation, from each training image 1000 spectra of the sample and 400 spectra of the background were randomly selected. These spectra were used to build a training set with size {22,400 spectra 96 wavelengths}, containing 16,000 representative spectra of the considered materials (2000 spectra for each material type) and 6400 spectra of the background. The average spectra of each material type calculated from the training set are reported in Figure 1A, while the average and standard deviation spectra of each class are shown in Figure S1 of the Supplementary Material.
在背景分割之后,从每个训练图像中随机选择1000个样本光谱和400个背景光谱。 这些光谱用于构建大小为{22,400光谱96个波长}的训练集,其中包含16,000个代表性材料的代表光谱(每种材料类型2000个光谱)和6400个背景光谱。 从训练集中计算出的每种材料类型的平均光谱记录在图1A中,而每种类别的平均和标准偏差光谱显示在补充材料的图S1中。
The images containing samples of two different materials were used as test images to obtain both a quantitative and qualitative evaluation of the classification models. For the creation of the test set, some represen-tative images have been chosen in order to consider all the materials, and Regions of Interest (ROI) were defined on these images. From each ROI, the corresponding spectra were extracted to create a test set with size {21,760 spectra 96 wavelengths}, containing a library of spectra with known assignment. Figure 1B shows the average spectra of each considered material calculated from the test set, while the corresponding average and standard deviation spectra are reported in Figure S2 of the Supplementary Material.
包含两种不同材料的样本的图像用作测试图像,以获得分类模型的定量和定性评估。 为了创建测试集,已选择了一些具有代表性的图像以考虑所有材料,并在这些图像上定义了感兴趣区域(ROI)。 从每个ROI中提取相应的光谱以创建大小为{21,760光谱96个波长}的测试集,其中包含具有已知分配的光谱库。 图1B显示了从测试集计算出的每种考虑材料的平均光谱,而相应的平均和标准偏差光谱则记录在补充材料的图S2中。
The training set was initially analysed by means of PCA considering both standard normal variate (SNV) and detrend as row preprocessing methods. This preliminary analysis allowed similarities and differences between the considered materials to be identified, and both SNV and detrend gave analogous results. The clusters of the different materials observed in the PCA score plots essentially reflected similarities and differences of the chemical structure of the considered polymers. These results were used to develop the structure of the tree-based classification model reported in Figure 2. In particular, PCA highlighted that the class corresponding to the non-recyclable plastic polymers was too heterogenous to be modelled into a single class. For this reason, the spectra belonging to the OTHER class were not used during model calibration, but they were used during the final validation of the classification tree in order to evaluate the ability of Soft PLS-DA algorithm to reject objects belonging to classes not considered in model calibration. The classification tree reported in Figure 2 is composed of five nodes, each corresponding to a classification model calculated with Soft PLS-DA algorithm using the training set.
最初通过PCA分析训练集,同时考虑标准正变量(SNV)和降趋势作为行预处理方法。通过初步分析,可以确定所考虑材料之间的异同,并且SNV和下降趋势均得出相似的结果。在PCA分数图中观察到的不同材料的簇基本上反映了所考虑的聚合物化学结构的相似性和差异。这些结果用于开发图2中报告的基于树的分类模型的结构。特别是,PCA强调指出,与不可回收塑料聚合物相对应的类别太异类,无法建模为单个类别。因此,在模型校准期间未使用属于OTHER类的光谱,但在对分类树进行最终验证时使用了它们,以便评估Soft PLS-DA算法拒绝属于未考虑类别的对象的能力在模型校准中。图2中报告的分类树由五个节点组成,每个节点对应于使用训练集使用Soft PLS-DA算法计算出的分类模型。
For each node of the tree, the classification models have been computed considering both SNV + mean centring and detrend + mean centring as data preprocessing methods.
对于树的每个节点,已将SNV +平均居中和下降趋势+平均居中都作为数据预处理方法来计算分类模型。
The average spectra of each considered material type obtained from both training and test sets preprocessed with SNV and detrend are reported in Figures 1C–F
从训练和测试集获得的每种考虑的材料类型的平均光谱均通过SNV和去趋势进行预处理,如图1C–F所示。
Classification performances of each single class were defined using sensitivity (SENS), i.e. the percentage of objects belonging to a given class correctly assigned to the corresponding class, specificity (SPEC), i.e. the percentage of objects correctly rejected from the class model, and efficiency (EFF), i.e. the geometric mean of sensitivity and specificity. Furthermore, for an overall evaluation of the classification quality of each node, the Non-Error Rate (NER) was also considered, which is calculated as the arithmetic mean of the SENS values of the different classes.
使用灵敏度(SENS)(即属于给定类别的对象的百分比正确分配给相应类别),特异性(SPEC)(即从类别模型正确拒绝的对象百分比)定义了每个单一类别的分类性能。 (EFF),即敏感性和特异性的几何平均值。 此外,为了全面评估每个节点的分类质量,还考虑了非错误率(NER),该值被计算为不同类别的SENS值的算术平均值。
The proper number of LVs for the Soft PLS-DA models has been optimised by maximising the NER value in crossvalidation. In particular, a customised cross-validation scheme has been adopted, consisting of two deletion groups (i.e., cross-validation groups); during the crossvalidation step one model was therefore calculated using the spectra of each deletion group, to predict the class of
the spectra of the other deletion group.
通过最大化交叉验证中的NER值,优化了适用于Soft PLS-DA模型的LV数量。 特别地,采用了定制的交叉验证方案,其由两个删除组(即,交叉验证组)组成。 因此,在交叉验证步骤中,使用每个删除组的光谱计算一个模型,以预测另一个删除组的光谱类别。
Each deletion group was defined in order to include the spectra belonging to all the considered target classes and to keep together all the spectra extracted from the same image.
定义每个删除组是为了包括属于所有考虑的目标类别的光谱,并将从同一图像中提取的所有光谱保持在一起。
Furthermore, for each node of the classification tree, the Soft PLS-DA algorithm has also been coupled with a sparse-based variable selection approach in order to identify the relevant variables involved in the classification. In more detail, the outcomes of the sparse PLS-DA (sPLS-DA) algorithm proposed by Lê Cao et al.34–36 were subjected to the same constraints for class assignment described above for Soft PLS-DA, thus resulting in a sparse version of Soft PLS-DA (sparse Soft PLS-DA).
此外,对于分类树的每个节点,Soft PLS-DA算法还与基于稀疏的变量选择方法相结合,以识别分类中涉及的相关变量。 更详细地讲,LêCao等人提出的稀疏PLS-DA(sPLS-DA)算法的结果受到与上述针对Soft PLS-DA进行的类分配相同的约束,因此导致了Soft的稀疏版本 PLS-DA(稀疏软PLS-DA)。
Basically, the main idea of sparse-based methods is to perform variable selection by forcing the model coefficients not bringing useful information to be equal to zero. Sparse methods represent an extension of the corresponding traditional classification or regression methods, where sparsity is achieved by adding a penalty term to the computation of the model coefficients.Inaddition to the number of model components (i.e., LVs), sparse methods also require the level of sparsity to be optimised, which is related to the number of variables whose coefficients are set equal to zero in the model.
基本上,基于稀疏方法的主要思想是通过强制模型系数不使有用信息等于零来执行变量选择。 稀疏方法代表了相应的传统分类或回归方法的扩展,其中稀疏性是通过在模型系数的计算中添加惩罚项来实现的。除了模型分量(即LV)的数量外,稀疏方法还需要级别 稀疏度的优化,与模型中系数设置为零的变量的数量有关。
The sparse-based Soft PLS-DA models were optimised considering a maximum number of LVs equal to 10 and a number of variables selected for each LV ranging from 4 to 96 (with a step equal to 4). For each node of the classification tree, the best combination between the number of LVs and the number of selected variables was identified by maximising the NER value estimated in cross-validation.
考虑到最大LV数量等于10,并且为每个LV选择的变量数量在4到96之间(步长等于4),优化了基于稀疏的Soft PLS-DA模型。 对于分类树的每个节点,通过最大化交叉验证中估计的NER值,可以确定LV数量与所选变量数量之间的最佳组合。
Both Soft PLS-DA and sparse Soft PLS-DA models were validated using the test set described above.
使用上述测试集对Soft PLS-DA模型和稀疏Soft PLS-DA模型进行了验证。
Image elaboration and data analysis were performed using the PLS_Toolbox software (ver. 8.5, Eigenvector Research Inc., USA) and ad hoc routines developed in the MATLAB environment (ver. 9.0, The MathWorks, USA). The MATLAB routine to run Soft PLS-DA algorithm is freely downloadable from http://www.chimslab.unimore.it/downloads/.
使用PLS_Toolbox软件(版本8.5,美国Eigenvector Research Inc.)和在MATLAB环境中开发的即席例程(版本9.0,美国MathWorks)进行图像细化和数据分析。 可从http://www.chimslab.unimore.it/downloads/免费下载运行Soft PLS-DA算法的MATLAB例程。
Table 1 and Table 2 report the results obtained by applying Soft PLS-DA algorithm for each node of the classification tree and considering both SNV and detrend as row preprocessing methods. For each model, the classification performances have been evaluated considering, for each single class, the number of not-assigned pixel spectra and SENS, SPEC and EFF values, while the NER values have been calculated as a global measure over all the classes. To maintain a concise presentation, Table 1 shows only the NER values obtained for each node of the classification tree, while Table 2 reports the SENS values of all the considered classes in each node of the tree.
表1和表2报告了通过对分类树的每个节点应用Soft PLS-DA算法并将SNV和降趋势作为行预处理方法而获得的结果。 对于每个模型,已经针对每个类别考虑了未分配像素光谱的数量以及SENS,SPEC和EFF值,评估了分类性能,而已计算NER值作为所有类别的全局度量 。 为了保持简洁的表述,表1仅显示了为分类树的每个节点获得的NER值,而表2报告了树的每个节点中所有考虑的类的SENS值。
With few exceptions, SNV preprocessing method allowed better classification performances to be obtained than detrend, both in cross-validation and in prediction of the test set. Indeed, in Node 1, Node 2 and Node 4 the NER values obtained in cross-validation from the models calculated with SNV are higher than those obtained with detrend, and these results were also confirmed from the prediction of the external test set. Concerning Node 3, SNV and detrend led to similar classification performances in cross-validation, but detrend gave a lower NER value in prediction of the test set; in particular a SENS value equal to 43.0% was obtained for class PS. Therefore, also for Node 3 SNV was the best preprocessing method, since it resulted in a more robust classification model. Conversely, in Node 5 the two preprocessing methods showed similar performances both in cross-validation and in prediction of the test set.
除少数例外,SNV预处理方法在交叉验证和测试集预测方面都比逆趋势更好地获得了分类性能。 确实,在节点1,节点2和节点4中,通过SNV计算的模型在交叉验证中获得的NER值高于通过下降趋势获得的NER值,这些结果也从外部测试集的预测中得到了证实。 关于节点3,SNV和下降趋势在交叉验证中产生了相似的分类性能,但是下降趋势在预测测试集时给出了较低的NER值。 特别是对于PS级,获得的SENS值等于43.0%。 因此,对于节点3而言,SNV也是最好的预处理方法,因为它导致了更健壮的分类模型。 相反,在节点5中,两种预处理方法在交叉验证和测试集预测方面均表现出相似的性能。
Based on these results, SNV can be identified as the optimal preprocessing method for all the five nodes of the classification tree. Furthermore, it has to be highlighted that the use of the same preprocessing method for each node of the classification tree represents a great advantage from the computational point of view. Indeed, the spectra of the hyperspectral images can be rowpreprocessed only once after image acquisition, and the preprocessed spectra can then be used for all the nodes of the classification tree, resulting in a lower computational effort.
基于这些结果,可以将SNV识别为分类树中所有五个节点的最佳预处理方法。 此外,必须强调的是,从计算的角度来看,对分类树的每个节点使用相同的预处理方法代表了巨大的优势。 实际上,高光谱图像的光谱在图像采集后只能进行行预处理,然后可以将预处理的光谱用于分类树的所有节点,从而降低了计算量。
Figure S3 of the Supplementary Material shows the regression vectors of the classification models calculated for each node of the tree using SNV as spectral row preprocessing. By comparing Figure S3 with Figure 1, which reports the average spectra of the different material types, it is possible to observe that for each node of the tree the relevant variables, i.e. the variables with highest absolute values of the regression coefficients, generally correspond to the characteristic wavelengths of the materials considered in the specific classification problem.
补充材料的图S3显示了使用SNV作为光谱行预处理为树的每个节点计算的分类模型的回归向量。 通过将图S3与报告不同材料类型的平均光谱的图1进行比较,可以观察到对于树的每个节点,相关变量(即具有最高回归系数绝对值的变量)通常对应于 在特定分类问题中考虑的材料的特征波长。
Since SNV was the optimal row preprocessing method for the whole classification tree, the subsequent variable selection step by means of sparse Soft PLS-DA was performed considering only SNV + mean centring for each node.
由于SNV是整个分类树的最佳行预处理方法,因此仅考虑SNV +每个节点的平均居中,就可以通过稀疏Soft PLS-DA进行后续的变量选择步骤。
Table 3 reports the classification results obtained for the five nodes of the classification tree expressed in terms of NER vales as a global measurement of the classification performances of each node, while Table 4 shows the SENS values of each modelled class.
表3列出了从分类树的五个节点获得的分类结果,这些分类结果以NER值表示,作为每个节点的分类性能的整体度量,而表4显示了每个建模类的SENS值。
Comparing Table 1 with Table 3 and Table 2 with Table 4, it is possible to observe that, generally, variable selection improved the classification performances and, at the same time, considerably reduced the number of retained variables.
将表1与表3和表2与表4进行比较,可以发现,一般而言,变量选择可以提高分类性能,同时可以显着减少保留变量的数量。
The higher improvement in classification performances was reached in Node 3 with a NER value in cross-validation equal to 94.3% with respect of a NER value equal to 89.5% obtained considering the full wavelength range. In addition, by considering only the 20 selected
variables in Node 3, all the test set spectra were correctly assigned.
考虑到整个波长范围,获得的NER值等于89.5%,交叉验证中的NER值等于94.3%,在节点3中达到了分类性能的更高改进。 此外,仅考虑选择的20个在节点3中的变量中,所有测试集的光谱均已正确分配。
Node 2 is the node with the lower number of selected variables, retaining only 8 variables out of the 96 original wavelengths. Such a small number of variables enabled an increase in the classification performances for all the three classes modelled in Node 2 (PAPER, PE+PVC+PP and PS+PET), maintaining at the same time the 100.0% of correct assignments for the test set.
节点2是所选变量数量较少的节点,在96个原始波长中仅保留8个变量。 如此少的变量使节点2中建模的所有三个类别(PAPER,PE + PVC + PP和PS + PET)的分类性能得以提高,同时保持了测试正确分配的100.0% 组。
Conversely, in Node 5 variable selection led to a slight decrease of the classification performances in cross-validation; in particular, the SENS value for class HDPE is lower than what obtained with the full wavelength range (82.6% vs 84.5%). Actually, it should be considered that Node 5 deals with the classification of HDPE and LDPE, which are derived from the same monomer, and the difference between these two polymers is only related to the degree of branching. Therefore, it is reasonable to assume that, for the discrimination between HDPE and LDPE, the full wavelength range provides more complete information.
相反,在节点5中,变量选择导致交叉验证中的分类性能略有下降。 特别是,HDPE类的SENS值低于在整个波长范围内获得的SENS值(82.6%对84.5%)。 实际上,应该考虑节点5处理源自相同单体的HDPE和LDPE的分类,并且这两种聚合物之间的差异仅与支化程度有关。 因此,可以合理地假设,为了区分HDPE和LDPE,整个波长范围将提供更完整的信息。
Based on the results obtained from the optimisation of each single node, the different classification models have been assembled together in order to obtain the final implementation of the classification tree, which is schematised in Figure 3.
根据从每个单个节点的优化获得的结果,将不同的分类模型组装在一起,以获得分类树的最终实现,如图3所示。
The proposed tree-structured classification model was applied to the test set in order to obtain a quantitative assessment of its classification performances. During the final validation of the model, the spectra belonging to the other types of non-recyclable plastics (class OTHER) were also included in the test set to evaluate the ability of Soft PLS-DA, and of its sparse-based extension, to correctly reject spectra not belonging to the modelled classes.
提出的树结构分类模型被应用于测试集,以获得对其分类性能的定量评估。 在模型的最终验证期间,测试集中还包含了属于其他类型的不可回收塑料(OTHER类)的光谱,以评估Soft PLS-DA以及其基于稀疏性扩展的能力。 正确拒绝不属于建模类的光谱。
The results are summarised in Figure 4, which can be seen as a kind of graphical representation of the confusion matrix obtained by applying the classification tree to the test set. Indeed, each point in the graph represents a spectrum of the test set and it is coloured according to the corresponding actual class, while the position of the points on the y-axis is based on the class predicted from the tree-structured classification model.
结果总结在图4中,可以看作是通过将分类树应用于测试集而获得的混淆矩阵的一种图形表示。 实际上,图中的每个点都代表测试集的光谱,并根据相应的实际类别进行着色,而点在y轴上的位置基于从树结构分类模型预测的类别。
The same results are also reported in Table 5 in terms of SENS and SPEC values for each considered class, and of NER of the overall tree-structured model.
在表5中还报告了相同结果,涉及每种考虑类别的SENS和SPEC值以及整个树结构模型的NER。
Generally, satisfactory classification results have been obtained for all the modelled classes, resulting in a NER value equal to 98.4%. The SENS values are always greater than 90%, and for PS and PVC all the test set spectra have been correctly predicted.
通常,对于所有建模类别都获得了令人满意的分类结果,其NER值等于98.4%。 SENS值始终大于90%,对于PS和PVC,所有测试集的光谱均已正确预测。
Concerning the results expressed in terms of specificity, for all the classes, SPEC values close to 1 were obtained, indicating that the classification tree correctly rejected the great part of spectra not belonging to the considered class. Indeed, Figure 4 shows that the majority of the test set spectra belonging to the category of other plastics (OTHER), which was not included in the classification tree, was correctly rejected from all the considered classes and thus labelled as “not-assigned” (NA). In more detail, 70.9 % of spectra belonging to class OTHER was predicted as NA, while only a minority of these spectra from other plastics was erroneously assigned to PET and PS classes (12.4 % and 14.7 %, respectively).
关于以特异性表示的结果,对于所有类别,均获得接近1的SPEC值,这表明分类树正确地拒绝了不属于所考虑类别的大部分光谱。 实际上,图4显示属于其他塑料类别(OTHER)的大多数测试集光谱(未包括在分类树中)已从所有考虑的类别中正确剔除,因此标记为“未分配” (NA)。 更详细地,属于OTHER类的光谱的70.9%被预测为NA,而来自其他塑料的光谱中只有一小部分被错误地分配给PET和PS类(分别为12.4%和14.7%)。
These results confirm that the proposed tree-structured classification model is able to effectively recognise the spectra of the analysed materials, minimising at the same time possible false positives due to spectra belonging to materials that were not considered during model calibration.
这些结果证实了所提出的树结构分类模型能够有效识别被分析材料的光谱,同时将由于光谱属于模型校准过程中未考虑的材料引起的误报降至最低。
Furthermore, the classification tree was applied to the test images, i.e. to the images containing objects of two different materials, in order to evaluate the classification performances at the pixel level. As an example, Figure 5 shows the prediction images obtained from the hyperspectral images containing PP + PVC (Figure 5A), PAPER + HDPE (Figure 5B), LDPE + OTHER (Figure 5C) and PET + PS (Figure 5D). In order to facilitate the interpretation of the results, the RGB images of the corresponding waste samples have also been included together with the prediction images. In the prediction images, all the pixels predicted as belonging to a defined class have been coloured according to the legend. In particular, grey is associated with those pixels that have not been assigned to any class.
此外,将分类树应用于测试图像,即应用于包含两种不同材料的对象的图像,以便评估像素级的分类性能。 例如,图5显示了从包含PP + PVC(图5A),PAPER + HDPE(图5B),LDPE + OTHER(图5C)和PET + PS(图5D)的高光谱图像获得的预测图像。 为了便于解释结果,相应废物样本的RGB图像也已与预测图像一起包括在内。 在预测图像中,根据图例对所有被预测为属于定义类的像素进行了着色。 特别地,灰色与尚未分配给任何类别的那些像素相关联。
Based on the prediction images, it is possible to observe that the majority of the pixels of each single object are correctly assigned to the corresponding material type. Misclassifications mainly occur at the edges of the objects or in some areas of the background, due to the noise caused by specular reflections of the conveyor belt.
根据预测图像,可以观察到每个单个对象的大部分像素已正确分配给相应的材质类型。 错误分类主要发生在物体的边缘或背景的某些区域,这是由于传送带的镜面反射所引起的噪声。
In more detail, in Figure 5A and Figure 5D all the pixels of the objects depicted in the images are correctly classified,while in Figure 5B the pixels ofthe largerHDPE bottle located at its upper edge are not assigned. Considering Figure 5C, the majority of the pixels of the LDPE object are correctly assigned, while some of them are classified as HDPE or not assigned. In the same image, the objects made of plastic polymers not included in the classification model have been globally not assigned to any class.
更详细地,在图5A和图5D中,图像中所描绘的对象的所有像素均被正确分类,而在图5B中,未分配位于其上边缘的较大HDPE瓶的像素。 考虑到图5C,LDPE对象的大多数像素已正确分配,而其中一些像素被分类为HDPE或未分配。 在同一幅图中,未包含在分类模型中的由塑料聚合物制成的对象在总体上未分配给任何类别。
In practical applications of hyperspectral imaging to classification aims, e.g. in sorting plants, it is necessary to implement classification methods able to effectively handle a large number of classes, to correctly classify samples belonging to the considered categories, and to correctly recognise and reject possible foreign objects.
在将高光谱成像应用于分类目标的实际应用中,例如 在分类工厂中,有必要实施能够有效处理大量类别的分类方法,对属于所考虑类别的样本进行正确分类,并正确识别和拒绝可能的异物。
The present study was aimed at the development of a classification rule for the discrimination of multiple categories through a tree-structured model, in which the classification at each node was performed by Soft PLS-DA, an extension of the PLS-DA algorithm. The
basic engine of Soft PLS-DA is the same as PLS-DA, but class assignment is subjected to some additional criteria involving the calculation of further thresholds based on Q residuals and on y predictions. These additional thresholds allow the rejection of unknown samples which are not compliant with the classes of interest.
本研究旨在通过树结构模型开发用于区分多个类别的分类规则,其中每个节点的分类由Soft PLS-DA(PLS-DA算法的扩展)执行。PLS-DA的基本引擎与PLS-DA相同,但是类分配要遵循一些附加标准,其中包括基于Q残差和y预测来计算其他阈值。 这些额外的阈值允许拒绝不符合所关注类别的未知样本。
The proposed approach was tested on a case study related to the discrimination of the different recyclable plastic polymers that are commonly used for packaging. On the one hand, the use of a tree-structured classification model allowed eight different classes to be efficiently handled, a situation in which a single discrimination step would rarely give satisfactory results. On the other hand,Soft PLS-DA proved to be a flexible algorithm thanks to the possibility of rejecting foreign objects, whose presence is a plausible situation in recycling plants, since it is not possible to completely control the incoming materials. Furthermore, coupling Soft-PLS-DA with a sparse-based variable selection allowed us to improve the classification performances and to decrease the number of spectral variables, reducing at the same time the computational efforts.
在与区分通常用于包装的不同可回收塑料聚合物有关的案例研究中测试了所建议的方法。 一方面,使用树状结构的分类模型可以有效地处理八个不同的类别,而这种情况下,单个区分步骤很少会给出令人满意的结果。 另一方面,由于有可能拒绝异物,Soft PLS-DA被证明是一种灵活的算法,异物的存在在回收工厂看来是可行的情况,因为不可能完全控制进料。 此外,将Soft-PLS-DA与基于稀疏的变量选择结合使用,可以提高分类性能并减少频谱变量的数量,同时减少了计算量。
The external validation of the classification tree demonstrated the effectiveness of the proposed approach, reaching a NER value equal to 98.4%. These satisfactory results were also confirmed by the pixel-level prediction performed on a set of test images.
分类树的外部验证证明了该方法的有效性,其NER值达到98.4%。 这些令人满意的结果也通过对一组测试图像执行的像素级预测得到了证实。
A further improvement of this application will consist in the extension of the prediction from a pixel-level to an object-level approach, by assigning each plastic sample to a defined class based on the class attribution of the majority of its pixels. In the specific case under investigation, this task can be accomplished by coupling the hyperspectral camera with an RGB camera for object shape detection. Indeed, the much higher spatial resolution of RGB imaging could allow to better define the edges of each imaged object, to identify labels partly covering the samples, and to further classify the samples of a given plastic material based on its colour.
该应用程序的进一步改进将包括通过将每个塑料样本基于其大多数像素的类别属性分配给已定义的类别,将预测从像素级方法扩展到对象级方法。 在研究中的特定情况下,可以通过将高光谱摄像头与用于物体形状检测的RGB摄像头耦合来完成此任务。 实际上,RGB成像的更高的空间分辨率可以更好地定义每个成像对象的边缘,识别部分覆盖样品的标签,并根据其颜色对给定塑料材料的样品进行进一步分类。