u013886628

【投稿】Machine Learning With Spark Note 2:构建简单的推荐系统

本文为数盟特约作者投稿，欢迎转载，请注明出处“数盟社区”和作者

博主简介：段石石，1号店精准化推荐算法工程师，主要负责1号店用户画像构建，喜欢钻研点Machine Learning的黑科技，对Deep Learning感兴趣，喜欢玩kaggle、看9神，对数据和Machine Learning有兴趣咱们可以一起聊聊，个人博客： hacker.duanshishi.com

Matrix Factorization

MF在Netflix Prize中得到最好的名词，关于MF的一片overview：http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html。

Explicit matrix factorization

user ratings 数据：

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 15px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
            Tom 
            , 
              
            Star  
            Wars 
            , 
              
            5

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 15px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
            Jane 
            , 
              
            Titanic 
            , 
              
            4

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 15px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
            Bill 
            , 
              
            Batman 
            , 
              
            3

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 15px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
            Jane 
            , 
              
            Star  
            Wars 
            , 
              
            2

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 15px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
            Bill 
            , 
              
            Titanic 
            , 
              
            3

以user为行，movie为列构造对应rating matrix：

MF就是一种直接建模user-item矩阵的方法，利用两个低维度的小矩阵的乘积来表示，属于一种降维的技术。

如果我们有U个用户，I个items，若不经过MF处理，它看来会使这样的：

是一个极其稀疏的矩阵，经过MF处理后，表示为两个维度较小的矩阵相乘：

这类模型被称为latent feature models，旨在寻找那些潜在的特征，来间接表示user-item rating的矩阵。这类潜在的features并不直接建模user对item的rating关系，而是通过latent features更趋近于建模用户对某类items的偏好，例如某类影片、风格等等，而这些事通过MF寻找其内在的信息，无需items的详细描述（和基于content的方法不同）。

MF模型如何计算一个user对某个item的偏好，对应向量相乘即可：

如何计算两个item的相似度：

MF模型的好处是一旦模型创建好后，predict变得十分容易，并且性能也很好，但是在海量的用户和itemset时，存储和生产MF中的如上图的这两个矩阵会变得具有挑战性。

Implicit matrix factorization

前面我们都在讨论显式的一些偏好信息，比如rating，但是在大部分应用中，拿不到这类信息，我们更多滴搜集的是一些隐性的反馈信息，这类反馈信息没有明确地告诉某个用户对某个item的偏好信息，但是却可以从用户对某个item的交互信息中建模出来，例如一些二值特征，包括是否浏览过、是否购买过产品、以及多少次看过某部电影等等。

MLlib中提供了一种处理这类隐性特征的方法，将前面的输入ratings矩阵其实可以看做是两个矩阵：二值偏好矩阵P和信心权重矩阵C；

举个例子：假定我们的网站上面没有设计对movie的rating部分，只能通过log查看到用户是否观看过影片，然后通过后期处理，可以看出他观看到过多少次某部影片，这里P来表示影片是否被某用户看过，C来描述这里的confidence weighting也就是观看的次数：

这里我们把P和C的dot product来替代前面的rating矩阵，那么我们最终建模来预估某用户对item的偏好

Alternating least squares

ALS是解决MF问题的一个优化技术，被证明高效、高性能并且能有效地并行化，目前为止，是MLlib中推荐模块的唯一一个算法。Spark官网上有专门地描述。

特征提取

特征提取是从已有数据中找到有用的数据来对算法进行建模，本文中使用显式数据也就是用户对movie的rating信息，这个数据来源于网络上的MovieLens标准数据集，以下代码为《Machine Learning with Spark》这本书里面的python的重写版本，会有专门的ipython notebook放到github上。

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 60px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
             2 
           
             3 
           
             4 
           
            rawData 
              
            = 
              
            sc 
            . 
            textFile 
            ( 
            "../data/ML_spark/MovieLens/u.data" 
            ) 
           
            print 
              
            rawData 
            . 
            first 
            ( 
            ) 
           
            rawRatings 
              
            = 
              
            rawData 
            . 
            map 
            ( 
            lambda 
              
            x 
            : 
              
            x 
            . 
            split 
            ( 
            '\t' 
            ) 
            ) 
           
            rawRatings 
            . 
            take 
            ( 
            5 
            )

数据分别是userId，itemId，rating和timestamp。

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 60px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
 
       
        
         
             1 
           

             2 
           

             3 
           

             4 
           
 
          
            from 
              
            pyspark 
            . 
            mllib 
            . 
            recommendation  
            import 
              
            Rating 
           
 
            from 
              
            pyspark 
            . 
            mllib 
            . 
            recommendation  
            import 
              
            ALS 
           
 
            ratings 
              
            = 
              
            rawRatings 
            . 
            map 
            ( 
            lambda 
              
            x 
              
            : 
              
            Rating 
            ( 
            int 
            ( 
            x 
            [ 
            0 
            ] 
            ) 
            , 
            int 
            ( 
            x 
            [ 
            1 
            ] 
            ) 
            , 
            float 
            ( 
            x 
            [ 
            2 
            ] 
            ) 
            ) 
            ) 
           
 
            print 
              
            ratings 
            . 
            first 
            ( 
            ) 
           
 
        
 
       
     

格式化数据，用于后面建模数据，导入Rating，ALS模块，下面是ALS类的使用说明：

其中rank就是上面latent feature model中矩阵的k，在下面的实验中，我们设为50：

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 60px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
             2 
           
             3 
           
             4 
           
            model 
              
            = 
              
            ALS 
            . 
            train 
            ( 
            ratings 
            , 
            50 
            ) 
           
            # modelImplicit = ALS.(ratings,50,alpha=0.02) 
           
            userFeatures 
              
            = 
              
            model 
            . 
            userFeatures 
            ( 
            ) 
           
            print 
              
            userFeatures 
            . 
            take 
            ( 
            2 
            )

这里user1与user2，均用50维的向量来表示，也就是上面U*k那个矩阵的每个向量

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 30px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
             2 
           
            predictRating 
              
            = 
              
            model 
            . 
            predict 
            ( 
            789 
            , 
            123 
            ) 
           
            print 
              
            predictRating

预测用户789对item 123的rating值，结果为3.76599662082。

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 135px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
             2 
           
             3 
           
             4 
           
             5 
           
             6 
           
             7 
           
             8 
           
             9 
           
            topKRecs 
              
            = 
              
            model 
            . 
            recommendProducts 
            ( 
            userId 
            , 
            K 
            ) 
           
            for 
              
            rec  
            in 
              
            topKRecs 
            : 
           
            print 
              
            rec 
           
            moviesForUser 
              
            = 
              
            ratings 
            . 
            groupBy 
            ( 
            lambda 
              
            x 
              
            : 
              
            x 
            . 
            user 
            ) 
            . 
            mapValues 
            ( 
            list 
            ) 
            . 
            lookup 
            ( 
            userId 
            ) 
           
            # print moviesForUser 
           
            for 
              
            i 
              
            in 
              
            sorted 
            ( 
            moviesForUser 
            [ 
            0 
            ] 
            , 
            key 
            = 
            lambda 
              
            x 
              
            : 
              
            x 
            . 
            rating 
            , 
            reverse 
            = 
            True 
            ) 
            : 
           
            print 
              
            i 
            . 
            product 
           
            # for 
           
            # print moviesForUser

使用recommendProducts来为用户推荐top10的items，其items顺序为降序。MoviesForUser是从ratings数据中找出的用户789rating最高的数据，仔细看下发现数据和我们的ratings里面找出的数据貌似一个都没有相同的，那么是不是说明我们的算法不给力呢？！这个可不一定，想想看，如果推荐系统只是推荐给你看过的电影，那么它一定是一个失败的，并且完全对系统的kpi数据无提升作用，前面提到，MF的实质是通过latent feature去找到与用户过去偏好高的有某些隐性相同特征的电影（这些由整体用户的集体智慧得到），比如可能是某一类型的电影、又或者相同的演员等等，所以这里不能说明推荐系统不给力，但是确实也很难具有解释性。

Item recommendations

基于MF的方法中，我们可以利用之前看到k*I的矩阵，计算两个向量质检的相似性，也就是item的相似性。这样，可以很容易做相似商品推荐的场景。这里我们定义相似函数为余弦相似性：

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 45px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
 
       
        
         
             1 
           

             2 
           

             3 
           
 
          
            import 
              
            numpy  
            as 
              
            np 
           
 
            def 
              
            cosineSImilarity 
            ( 
            x 
            , 
            y 
            ) 
            : 
           
 
                 
            return 
              
            np 
            . 
            dot 
            ( 
            x 
            , 
            y 
            ) 
            / 
            ( 
            np 
            . 
            linalg 
            . 
            norm 
            ( 
            x 
            ) 
            * 
            np 
            . 
            linalg 
            . 
            norm 
            ( 
            y 
            ) 
            ) 
           
 
        
 
       
     

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 30px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
             2 
           
            testx 
              
            = 
              
            np 
            . 
            array 
            ( 
            [ 
            1.0 
            , 
            2.0 
            , 
            3.0 
            ] 
            ) 
           
            print 
              
            cosineSImilarity 
            ( 
            testx 
            , 
            testx 
            )

然后，通过ALS建模的item的向量，拿到对应地item的向量表示：

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 105px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
 
       
        
         
             1 
           

             2 
           

             3 
           

             4 
           

             5 
           

             6 
           

             7 
           
 
          
            itemId 
              
            = 
              
            567 
           
 
            itemFactor 
              
            = 
              
            model 
            . 
            productFeatures 
            ( 
            ) 
            . 
            lookup 
            ( 
            itemId 
            ) 
            [ 
            0 
            ] 
           
 
            # itemFactor = itemFactor[1] 
           
 
            print 
              
            itemFactor 
           
 
            # model.productFeatures().collect() 
           
 
            sims 
              
            = 
              
            model 
            . 
            productFeatures 
            ( 
            ) 
            . 
            map 
            ( 
            lambda 
              
            ( 
            id 
            , 
            factor 
            ) 
            : 
            ( 
            id 
            , 
            cosineSImilarity 
            ( 
            np 
            . 
            array 
            ( 
            factor 
            ) 
            , 
           
 
                          
            np 
            . 
            array 
            ( 
            itemFactor 
            ) 
            ) 
            ) 
            ) 
           
 
        
 
       
     

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 15px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
 
       
        
         
             1 
           
 
          
            sims 
            . 
            sortBy 
            ( 
            lambda 
              
            ( 
            x 
            , 
            y 
            ) 
            : 
            y 
            , 
            ascending 
            = 
            False 
            ) 
            . 
            take 
            ( 
            10 
            ) 
           
 
        
 
       
     

利用ALS的item向量拿到itemId为567的向量表示，然后对model的item的特征向量来计算与567的相似度，按降序排序并取top10

这样，可以找到与567这个item相似性最大的itemlist。

如何衡量推荐系统的性能

怎么判断我们生成的模型性能呢？常用的有一些比如Mean Squared Error，Root Mean Squared Error，但是这类标准无法考量推荐最终的items的排序问题，在实际工作中用的比较多的是Mean Average Precision，考虑到了item的排序造成的影响。

MSE&RMSE：

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 75px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
 
       
        
         
             1 
           

             2 
           

             3 
           

             4 
           

             5 
           
 
          
            userProducts 
              
            = 
              
            ratings 
            . 
            map 
            ( 
            lambda 
              
            rating 
            : 
            ( 
            rating 
            . 
            user 
            , 
            rating 
            . 
            product 
            ) 
            ) 
           
 
            print 
              
            userProducts 
            . 
            take 
            ( 
            1 
            ) 
            [ 
            0 
            ] 
           
 
            predictions 
              
            = 
              
            model 
            . 
            predictAll 
            ( 
            userProducts 
            ) 
            . 
            map 
            ( 
            lambda 
              
            rating 
            : 
            ( 
            ( 
            rating 
            . 
            user 
            , 
            rating 
            . 
            product 
            ) 
           
 
                           
            , 
            rating 
            . 
            rating 
            ) 
            ) 
           
 
            print 
              
            predictions 
            . 
            take 
            ( 
            5 
            ) 
           
 
        
 
       
     

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 30px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
 
       
        
         
             1 
           

             2 
           
 
          
            ratingsAndPredictions 
              
            = 
              
            ratings 
            . 
            map 
            ( 
            lambda 
              
            rating 
            : 
            ( 
            ( 
            rating 
            . 
            user 
            , 
            rating 
            . 
            product 
            ) 
            , 
            rating 
            . 
            rating 
            ) 
            ) 
           
 
                          
            . 
            join 
            ( 
            predictions 
            ) 
           
 
        
 
       
     

MSE = ratingsAndPredictions.map(lambda ((x,y),(m,n)):math.pow(m-n,2)).reduce(lambda x,y:x+y)/ratingsAndPredictions.count() print MSE print math.sqrt(MSE)

先map ratings数据得到用户对item的组合，然后对这类数据predictAll计算该用户对item的rating估计值。然后利用join函数将预测的数据与ratings中的数据”联合”起来，塞入相似度函数进行计算,最终结果如下：

备注：看到这里肯定有人会问题，你之前在前面recommendProducts的，没有一个item是与ratings的数据相同，但是这里为什么又对比ratings中的评分信息来衡量推荐模型的好坏呢。猜想：recommendProduct是基于最终预测的ratings的高低来推荐的，但是，考虑到前面分析的原因，应该是不仅仅是按predict的rating的高低来给定推荐产品而是参入了其他的考量，所以这里并不矛盾。

APK：

什么是APK？可以看下这里，里面有R，Matlab，Python的各种Metrics的实现，还有kaggle里对APK的说明，逻辑很简单，相对于MSE和RMSE，考虑了推荐的排序对最后metrics的影响，如果检索出来的item排序越靠前，得分越高。

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 225px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
             2 
           
             3 
           
             4 
           
             5 
           
             6 
           
             7 
           
             8 
           
             9 
           
             10 
           
             11 
           
             12 
           
             13 
           
             14 
           
             15 
           
            def 
              
            avgPrecisionK 
            ( 
            actual 
            , 
              
            predicted 
            , 
            k 
            = 
            10 
            ) 
            : 
           
            if 
              
            len 
            ( 
            predicted 
            ) 
            > 
            k 
            : 
           
            predicted 
              
            = 
              
            predicted 
            [ 
            : 
            k 
            ] 
           
            score 
              
            = 
              
            0.0 
           
            num_hits 
              
            = 
              
            0.0 
           
            for 
              
            i 
            , 
            p 
              
            in 
              
            enumerate 
            ( 
            predicted 
            ) 
            : 
           
            if 
              
            p 
              
            in 
              
            actual  
            and 
              
            p 
              
            not 
              
            in 
              
            predicted 
            [ 
            : 
            i 
            ] 
            : 
           
            num_hits 
              
            += 
              
            1.0 
           
            score 
              
            += 
              
            num_hits 
              
            / 
              
            ( 
            i 
            + 
            1.0 
            ) 
           
            if 
              
            not 
              
            actual 
            : 
           
            return 
              
            1.0 
           
            return 
              
            score 
              
            / 
              
            min 
            ( 
            len 
            ( 
            actual 
            ) 
            , 
              
            k 
            )

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 15px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
 
       
        
         
             1 
           
 
          
            itemFactors 
              
            = 
              
            model 
            . 
            productFeatures 
            ( 
            ) 
            . 
            map 
            ( 
            lambda 
              
            ( 
            id 
            , 
            factor 
            ) 
            : 
            factor 
            ) 
            . 
            collect 
            ( 
            ) 
           
 
        
 
       
     

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 15px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
            itemMatrix 
              
            = 
              
            np 
            . 
            array 
            ( 
            itemFactors 
            )

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 15px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
            imBroadcast 
              
            = 
              
            sc 
            . 
            broadcast 
            ( 
            itemMatrix 
            )

拿到product的所有向量表示，初始化矩阵，然后broadcast到各个节点。

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 105px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
 
       
        
         
             1 
           

             2 
           

             3 
           

             4 
           

             5 
           

             6 
           

             7 
           
 
          
            userVector 
              
            = 
              
            model 
            . 
            userFeatures 
            ( 
            ) 
            . 
            map 
            ( 
            lambda 
              
            ( 
            userId 
            , 
            array 
            ) 
            : 
            ( 
            userId 
            , 
            np 
            . 
            array 
            ( 
            array 
            ) 
            ) 
            ) 
           
 
            # print userVector[0] 
           
 
            userVector 
              
            = 
              
            userVector 
            . 
            map 
            ( 
            lambda 
              
            ( 
            userId 
            , 
            x 
            ) 
            : 
              
           
 
                         
            ( 
            userId 
            , 
            imBroadcast 
            . 
            value 
            . 
            dot 
            ( 
            ( 
            np 
            . 
            array 
            ( 
            x 
            ) 
            . 
            transpose 
            ( 
            ) 
            ) 
            ) 
            ) 
            ) 
           
 
            userVectorId 
              
            = 
              
            userVector 
            . 
            map 
            ( 
            lambda 
              
            ( 
            userId 
            , 
            x 
            ) 
              
            : 
              
            ( 
            userId 
            , 
            [ 
            ( 
            xx 
            , 
            i 
            ) 
              
            for 
              
            i 
            , 
            xx  
            in 
              
            enumerate 
            ( 
            x 
            . 
            tolist 
            ( 
            ) 
            ) 
            ] 
            ) 
            ) 
           
 
            sortUserVectorId 
              
            = 
              
            userVectorId 
            . 
            map 
            ( 
            lambda 
              
            ( 
            userId 
            , 
            x 
            ) 
            : 
            ( 
            userId 
            , 
            sorted 
            ( 
            x 
            , 
            key 
            = 
            lambda 
              
            x 
            : 
            x 
            [ 
            0 
            ] 
            , 
            reverse 
            = 
            True 
            ) 
            ) 
            ) 
           
 
            sortUserVectorRecId 
              
            = 
              
            sortUserVectorId 
            . 
            map 
            ( 
            lambda 
              
            ( 
            userId 
            , 
            x 
            ) 
            : 
              
            ( 
            userId 
            , 
            [ 
            xx 
            [ 
            1 
            ] 
              
            for 
              
            xx  
            in 
              
            x 
            ] 
            ) 
            ) 
           
 
        
 
       
     

为每一个user推荐一个对应的item list，并按user向量与item向量相乘计算的该用户对该item的rating值来进行排序，最终给定一个有序的item的list。

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 75px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
 
       
        
         
             1 
           

             2 
           

             3 
           

             4 
           

             5 
           
 
          
            userMovies 
              
            = 
              
            ratings 
            . 
            map 
            ( 
            lambda 
              
            rating 
            : 
              
            ( 
            rating 
            . 
            user 
            , 
            rating 
            . 
            product 
            ) 
            ) 
            . 
            groupBy 
            ( 
            lambda 
              
            ( 
            x 
            , 
            y 
            ) 
            : 
            x 
            ) 
           
 
            userMovies 
              
            = 
              
            userMovies 
            . 
            map 
            ( 
            lambda 
              
            ( 
            userId 
            , 
            x 
            ) 
            : 
            ( 
            userId 
            , 
              
            [ 
            xx 
            [ 
            1 
            ] 
              
            for 
              
            xx  
            in 
              
            x 
            ] 
              
            ) 
            ) 
           
 
            allAPK 
            = 
            sortUserVectorRecId 
            . 
            join 
            ( 
            userMovies 
            ) 
            . 
            map 
            ( 
            lambda 
              
            ( 
            userId 
            , 
            ( 
            predicted 
            , 
              
            actual 
            ) 
            ) 
           
 
                    
            : 
            avgPrecisionK 
            ( 
            actual 
            , 
            predicted 
            , 
            2000 
            ) 
            ) 
           
 
            print 
              
            allAPK 
            . 
            reduce 
            ( 
            lambda 
              
            x 
            , 
            y 
            : 
            x 
            + 
            y 
            ) 
            / 
            allAPK 
            . 
            count 
            ( 
            ) 
           
 
        
 
       
     

然后从rating中找到对应的的item 列表，然后塞入之前我们写的apk函数，然后求平均，最终结果为0.115484271925。

当然我们可以直接使用MLlib内置的evaluation模块来对我们的模型进行评价，如MSE，RMSE：

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 120px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
             2 
           
             3 
           
             4 
           
             5 
           
             6 
           
             7 
           
             8 
           
            from 
              
            pyspark 
            . 
            mllib 
            . 
            evaluation  
            import 
              
            RegressionMetrics 
           
            from 
              
            pyspark 
            . 
            mllib 
            . 
            evaluation  
            import 
              
            RankingMetrics 
           
            predictedAndTrue 
              
            = 
              
            ratingsAndPredictions 
            . 
            map 
            ( 
            lambda 
              
            ( 
            ( 
            userId 
            , 
            product 
            ) 
            , 
            ( 
            predicted 
            , 
              
            actual 
            ) 
            ) 
           
            : 
            ( 
            predicted 
            , 
            actual 
            ) 
            ) 
           
            # print predictedAndTrue.take(1) 
           
            regressionMetrics 
              
            = 
              
            RegressionMetrics 
            ( 
            predictedAndTrue 
            ) 
           
            print 
              
            "Mean Squared Error = %f" 
            % 
            regressionMetrics 
            . 
            meanSquaredError 
           
            print 
              
            "Root Mean Squared Error %f" 
            % 
              
            regressionMetrics 
            . 
            rootMeanSquaredError

MAP：

       <textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly="readonly" style="margin: 0px; padding-top: 0px; padding-right: 5px; padding-left: 5px; width: 823px; overflow: hidden; height: 180px; position: absolute; opacity: 0; border: 0px; border-radius: 0px; box-shadow: none; white-space: pre; word-wrap: normal; resize: none; color: rgb(0, 0, 0); tab-size: 4; z-index: 0; font-family: Monaco, MonacoRegular, 'Courier New', monospace !important; font-size: 12px !important; line-height: 15px !important; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"></textarea> 
     
             1 
           
             2 
           
             3 
           
             4 
           
             5 
           
             6 
           
             7 
           
             8 
           
             9 
           
             10 
           
             11 
           
             12 
           
            #MAP 
           
            # The implementation of the average precision at the K function in RankingMetrics is slightly different  
           
            # from ours, 
           
            # so we will get different results. However, the computation of the overall mean average precision  
           
            #(MAP, which does not use a threshold at K) is the same as our function if we select K to be very high  
           
            # (say, at least as high as the number of items in our item set) 
           
            sortedLabels 
              
            = 
              
            sortUserVectorRecId 
            . 
            join 
            ( 
            userMovies 
            ) 
            . 
            map 
            ( 
            lambda 
              
            ( 
            userId 
            , 
            ( 
            predicted 
            , 
              
            actual 
            ) 
            ) 
           
            : 
            ( 
            predicted 
            , 
            actual 
            ) 
            ) 
           
            # print sortedLabels.take(1) 
           
            rankMetrics 
              
            = 
              
            RankingMetrics 
            ( 
            sortedLabels 
            ) 
           
            print 
              
            "Mean Average Precision = %f" 
              
            % 
              
            rankMetrics 
            . 
            meanAveragePrecision 
           
            print 
              
            "Mean Average Precision(at K=10) = %f" 
              
            % 
              
            rankMetrics 
            . 
            precisionAt 
            ( 
            5 
            )

这里结果与我们前面取k=2000的结果相同，说明我们的计算和MLlib是一致的，但是K=10或者比较小的值时，不一样，这是因为MLlib在precisionAt(k)这个函数与我们前面逻辑不同，这里我们不做考虑。

本章的代码放到了github上面，是ipython notebook的可以直接调用试用下，这版代码是我学习spark写的，水平很差，而且notebook中也没有基本的代码说明，算是对原书中这部分的scala的一次重写，喜欢python和spark的可以研究下，一步一步看下还是会熟悉python操作spark的流程的。

你可能感兴趣的:(算法,数据,spark,机器学习,工程师)

x86-64汇编语言训练程序与实战十除以十等于一
本文还有配套的精品资源，点击获取简介：汇编语言是一种低级语言，与机器代码紧密相关，特别适用于编写系统级代码及性能要求高的应用。nasm编译器是针对x86和x86-64架构的汇编语言编译器，支持多种语法风格和指令集。项目Euler提供数学和计算机科学问题，鼓励编程技巧应用，前100个问题的答案可共享。x86-64架构扩展了寄存器数量并引入新指令，提升了数据处理效率。学习汇编语言能够深入理解计算机底层
移动端城市区县二级联动选择功能实现包 good2know
本文还有配套的精品资源，点击获取简介：本项目是一套为移动端设计的jQuery实现方案，用于简化用户在选择城市和区县时的流程。它包括所有必需文件：HTML、JavaScript、CSS及图片资源。通过动态更新下拉菜单选项，实现城市到区县的联动效果，支持数据异步加载。开发者可以轻松集成此功能到移动网站或应用，并可基于需求进行扩展和优化。1.jQuery移动端解决方案概述jQuery技术简介jQuery
（二）SAP Group Reporting (GR) 核心子模块功能及数据流向架构解析
数据如何从子公司流转到合并报表的全过程，即数据采集→合并引擎→报表输出，特别是HANA内存计算如何优化传统ETL瓶颈。SAPGroupReporting(GR)核心模块功能及数据流向的架构解析，涵盖核心组件、数据处理流程和关键集成点，适用于S/4HANA1809+版本：一、核心功能模块概览模块功能关键事务码/FioriApp数据采集(DataCollection)整合子公司财务数据（SAP/非SA
9、汇编语言编程入门：从环境搭建到简单程序实现神经网络酱汇编语言 MEPIS GNU工具链
汇编语言编程入门：从环境搭建到简单程序实现1.数据存储介质问题解决在处理数据存储时，若要使用MEPIS系统，需确保有其可访问的存储介质。目前，MEPIS无法向采用NTFS格式（常用于Windows2000和XP工作站）的硬盘写入数据。不过，若硬盘采用FAT32格式，MEPIS就能进行写入操作。此外，MEPIS还能将文件写入软盘和大多数USB闪存驱动器。若工作站连接到局域网，还可通过FTP协议或挂载
day15｜前端框架学习和算法 universe_01 前端算法笔记
T22括号生成先把所有情况都画出来，然后（在满足什么情况下）把不符合条件的删除。T78子集要画树状图，把思路清晰。可以用暴力法、回溯法和DFS做这个题DFS深度搜索：每个边都走完，再回溯应用：二叉树搜索，图搜索回溯算法=DFS+剪枝T200岛屿数量（非常经典BFS宽度把树状转化成队列形式，lambda匿名函数“一次性的小函数，没有名字”setup语法糖：让代码更简洁好写的语法ref创建：基本类型的
C++ 计数排序、归并排序、快速排序每天搬一点点砖 c++数据结构算法
计数排序：是一种基于哈希的排序算法。他的基本思想是通过统计每个元素的出现次数，然后根据统计结果将元素依次放入排序后的序列中。这种排序算法适用于范围较小的情况，例如整数范围在0到k之间计数排序步骤：1初始化一个长度为最大元素值加1的计数数组，所有元素初始化为02遍历原始数组，将每个元素值作为索引，在计数数组中对应位置加13将数组清空4遍历计数器数组，按照数组中的元素个数放回到元数组中计数排序的优点和
实时数据流计算引擎Flink和Spark剖析程小舰 flink spark 数据库 kafka hadoop
在过去几年，业界的主流流计算引擎大多采用SparkStreaming，随着近两年Flink的快速发展，Flink的使用也越来越广泛。与此同时，Spark针对SparkStreaming的不足，也继而推出了新的流计算组件。本文旨在深入分析不同的流计算引擎的内在机制和功能特点，为流处理场景的选型提供参考。（DLab数据实验室w.x.公众号出品）一.SparkStreamingSparkStreamin
MotionLCM 部署优化踩坑解决bug AI算法网奇 aigc与数字人深度学习宝典文生motion
目录依赖项windowstorchok：渲染黑白图问题解决：humanml3d：sentence-t5-large下载数据：报错：Nomodulenamed'sentence_transformers'继续报错：fromtransformers.integrationsimportCodeCarbonCallback解决方法：推理相关转mesh：module‘matplotlib.cm‘hasno
【C++算法】76.优先级队列_前 K 个高频单词流星白龙优选算法C++c++算法开发语言
文章目录题目链接：题目描述：解法C++算法代码：题目链接：692.前K个高频单词题目描述：解法利用堆来解决TopK问题预处理一下原始的字符串数组，用一个哈希表统计一下每一个单词出现的频次。创建一个大小为k的堆频次：小根堆字典序（频次相同的时候）：大根堆循环让元素依次进堆判断提取结果C++算法代码：classSolution{//定义类型别名，PSI表示对typedefpairPSI;//自定义比较
JVM 内存模型深度解析：原子性、可见性与有序性的实现练习时长两年半的程序员小胡 JVM 深度剖析：从面试考点到生产实践 jvm java 内存模型
在了解了JVM的基础架构和类加载机制后，我们需要进一步探索Java程序在多线程环境下的内存交互规则。JVM内存模型（JavaMemoryModel，JMM）定义了线程和主内存之间的抽象关系，它通过规范共享变量的访问方式，解决了多线程并发时的数据一致性问题。本文将从内存模型的核心目标出发，详解原子性、可见性、有序性的实现机制，以及volatile、synchronized等关键字在其中的作用。一、J
什么是缓存雪崩？缓存击穿？缓存穿透？分别如何解决？什么是缓存预热？ daixin8848 缓存 redis java 开发语言
缓存雪崩：在一个时间段内，有大量的key过期，或者Redis服务宕机，导致大量的请求到达数据库,带来巨大压力-给key设置不同的TTL、利用Redis集群提高服务的高可用性、添加多级缓存、添加降级流策略缓存击穿：给某一个key设置了过期时间，当key过期的时间，恰好这个时间点有大量的并发请求访问这个key，可能会瞬间把数据库压垮-互斥锁：缓存失败时，只允许一个请求去加载数据并更新缓存，其他请求阻塞
力扣面试题07 - 旋转矩阵茶猫_ leetcode 矩阵算法 c语言
题目：给你一幅由N×N矩阵表示的图像，其中每个像素的大小为4字节。请你设计一种算法，将图像旋转90度。不占用额外内存空间能否做到？示例1:给定matrix=[[1,2,3],[4,5,6],[7,8,9]],原地旋转输入矩阵，使其变为:[[7,4,1],[8,5,2],[9,6,3]]示例2:给定matrix=[[5,1,9,11],[2,4,8,10],[13,3,6,7],[15,14,12,
车载刷写架构 --- 整车刷写中为何增加了ECU 队列刷写策略？汽车电子实验室电子电器架构——刷写方案车载电子电气架构架构开发语言车载诊断进阶篇汽车中央控制单元HPC软件架构关于网关转发性能引起的思考
我是穿拖鞋的汉子，魔都中坚持长期主义的汽车电子工程师。老规矩，分享一段喜欢的文字，避免自己成为高知识低文化的工程师：周末洗了一个澡，换了一身衣服，出了门却不知道去哪儿，不知道去找谁，漫无目的走着，大概这就是成年人最深的孤独吧!旧人不知我近况，新人不知我过往，近况不该旧人知，过往不与新人讲。纵你阅人何其多，再无一人恰似我。时间不知不觉中，来到新的一年。2025开始新的忙碌。成年人的我也不知道去哪里渡
车载诊断架构 ---面向售后的DTC应该怎么样填写？汽车电子实验室车载电子电气架构漫谈UDS诊断协议系列 EV（电动汽车）常规知识必备架构面向售后的DTC 车载诊断架构 OEM怎么掌握软件开发能力车载通信网络槪述 android ZEVonUDS-J1979
我是穿拖鞋的汉子，魔都中坚持长期主义的汽车电子工程师。老规矩，分享一段喜欢的文字，避免自己成为高知识低文化的工程师：简单，单纯，喜欢独处，独来独往，不易合同频过着接地气的生活，除了生存温饱问题之外，没有什么过多的欲望，表面看起来很高冷，内心热情，如果你身边有这样灵性的人，一定要好好珍惜他们眼中有神有光，干净，给人感觉很舒服，有超强的感知能力有形的无形的感知力很强，能感知人的内心变化喜欢独处，好静，
车载诊断架构 --- 关于诊断时间参数P4的浅析汽车电子实验室车载电子电气架构漫谈UDS诊断协议系列架构开发语言关于网关转发性能引起的思考汽车中央控制单元HPC软件架构车载诊断进阶篇
关于诊断时间参数P4的浅析我是穿拖鞋的汉子，魔都中坚持长期主义的汽车电子工程师。老规矩，分享一段喜欢的文字，避免自己成为高知识低文化的工程师：所谓鸡汤，要么蛊惑你认命，要么怂恿你拼命，但都是回避问题的根源，以现象替代逻辑，以情绪代替思考，把消极接受现实的懦弱，伪装成乐观面对不幸的豁达，往不幸上面喷“香水”来掩盖问题。无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事.而不是让内心的烦
车载刷写架构 --- 刷写思考扩展汽车电子实验室电子电器架构——刷写方案架构开发语言关于网关转发性能引起的思考汽车中央控制单元HPC软件架构车载诊断进阶篇
我是穿拖鞋的汉子，魔都中坚持长期主义的汽车电子工程师。老规矩，分享一段喜欢的文字，避免自己成为高知识低文化的工程师：做到欲望极简，了解自己的真实欲望，不受外在潮流的影响，不盲从，不跟风。把自己的精力全部用在自己。一是去掉多余，凡事找规律，基础是诚信；二是系统思考、大胆设计、小心求证；三是“一张纸制度”，也就是无论多么复杂的工作内容，要在一张纸上描述清楚；四是要坚决反对虎头蛇尾，反对繁文缛节，反对老
JAVA接口机结构解析秃狼 SpringBoot 八股文 Java java 学习
什么是接口机在Java项目中，接口机通常指用于与外部系统进行数据交互的中间层，负责处理请求和响应的转换、协议适配、数据格式转换等任务。接口机的结构我们的接口机的结构分为两个大部分，外部接口机和内部接口机，在业务的调度上也是通过mq来实现的，只要的目的就是为了解耦合和做差异化。在接口机中主要的方法就是定时任务，消息的发送和消费，其他平台调用接口机只能提供外部接口机的方法进行调用，外部接口机可以提供消
Aop +反射实现方法版本动态切换
需求分析在做技术选型的时候一直存在着两个声音，mongo作为数据库比较mysql好，mysql做为该数据比mongo好。当然不同数据库都有有着自己的优势，我们在做技术选型的时候无非就是做到对数据库的扬长避短。mysql最大的优势就是支持事务，事务的五大特性保证的业务可靠性，随之而来的就是事务会产生的问题：脏读、幻读、不可重复度，当然我们也会使用不同的隔离级别来解决。（最典型的业务问题：银行存取钱）
最新阿里四面面试真题46道：面试技巧+核心问题+面试心得风平浪静如码
前言做技术的有一种资历，叫做通过了阿里的面试。这些阿里Java相关问题，都是之前通过不断优秀人才的铺垫总结的，先自己弄懂了再去阿里面试，不然就是去丢脸，被虐。希望对大家帮助，祝面试成功，有个更好的职业规划。一，阿里常见技术面1、微信红包怎么实现。2、海量数据分析。3、测试职位问的线程安全和非线程安全。4、HTTP2.0、thrift。5、面试电话沟通可能先让自我介绍。6、分布式事务一致性。7、ni
上半年居民消费榜出炉！这个城市的人最能花 BBM优选官方
上半年居民消费榜出炉哪个地方的人最能花钱？国家统计局公布的数据显示上海上半年居民人均可支配收入32612元居民人均消费支出21321元均为全国最高成为最能挣钱也最能花钱的城市1上海人均消费支出全国第一国家统计局公布的31省份居民人均消费支出数据显示，上海、北京、天津上半年居民人均消费支出排名前三。其中，上海上半年居民人均消费支出21321元，位居榜首。上海也是上半年全国仅有的居民人均消费支出突破2
模拟退火(SA)：如何“故意走错路”，才能找到最优解？小瑞瑞acd 小瑞瑞学数模模拟退火算法 python 启发式算法算法
模拟退火(SA)：如何“故意走错路”，才能找到最优解？图示模拟退火算法如何通过接受较差解（橙色虚线标注）从局部最优（绿色点）逃逸，最终找到全局最优解（紫色点），展示其跳出局部极小值的能力。大家好，我是小瑞瑞！欢迎回到我的专栏！想象一下，你站在一座连绵不绝的山脉中，目标是找到海拔最低的那个山谷。你手上只有一个高度计，视野被浓雾笼罩，只能看清脚下的一小片区域。如果你是一个“贪心”的登山者，你的策略会非
编程算法：技术创新的引擎与业务增长的核心驱动力
在数字经济时代，算法已成为推动技术创新与业务增长的隐形引擎。从存内计算突破冯·诺依曼瓶颈，到动态规划优化万亿级金融交易，编程算法正在重塑产业竞争格局。一、存内计算：突破冯·诺依曼瓶颈的算法革命1.1存内计算的基本原理传统计算架构中90%的能耗消耗在数据搬运上。存内计算（Processing-in-Memory）通过直接在存储单元执行计算，实现能效10-100倍提升：#传统计算vs存内计算能耗模型i
图论算法经典题目解析：DFS、BFS与拓扑排序实战周童學数据结构与算法深度优先算法图论
图论算法经典题目解析：DFS、BFS与拓扑排序实战图论问题是算法面试中的高频考点，本博客将通过四道LeetCode经典题目（均来自"Top100Liked"题库），深入讲解图论的核心算法思想和实现技巧。涵盖DFS、BFS、拓扑排序和前缀树等知识点，每道题配有Java实现和易错点分析。1.岛屿数量(DFS遍历)问题描述给定一个由'1'(陆地)和'0'(水)组成的二维网格，计算岛屿的数量。岛屿由水平或
基于redis的Zset实现作者的轻量级排名周童學 Java redis 数据库缓存
基于redis的Zset实现轻量级作者排名系统在今天的技术架构中，Redis是一种广泛使用的内存数据存储系统，尤其在需要高效检索和排序的场景中表现优异。在本篇博客中，我们将深入探讨如何使用Redis的有序集合（ZSet）构建一个高效的笔记排行榜系统，并提供相关代码示例和详细的解析。1.功能背景与需求假设我们有一个笔记分享平台，用户可以发布各种笔记，系统需要根据用户发布的笔记数量来生成一个实时更新的
【项目实战】容错机制与故障恢复：保障系统连续性的核心体系本本本添哥 004 -研效与DevOps运维工具链 002 -进阶开发能力分布式
在分布式系统中，硬件故障、网络波动、软件异常等问题难以避免。容错机制与故障恢复的核心目标是：通过主动检测故障、自动隔离风险、快速转移负载、重建数据一致性，最大限度减少故障对业务的影响，保障系统“持续可用”与“数据不丢失”。以下从核心机制、实现方式、典型案例等维度展开说明。一、故障检测：及时发现异常节点故障检测是容错的第一步，需通过多维度手段实时感知系统组件状态，确保故障被快速识别。1.健康检查与心
营销活动-大转盘無缺520
写在前面最近，首先营销活动工具这块我是再熟悉不过了。曾经做了不下20个活动工具，然后通过监控活动数据反推活动的好坏。文中主要讲解幸运大转盘营销工具一.大转盘定义大转盘是比较常见的营销活动工具，它是通过消费者用户控制【开始/停止】操作获得奖品物品。用户在不知道自己能获得什么奖品的条件下，然后通过抽奖，大概率的获得未知的奖品。类似最近流行的盲盒玩法。二.为什么做大转盘大转盘是最常用的抽奖类的活动工具之
Java 队列 tryxr java 开发语言队列
队列一般用什么哪种结构实现队列的特性数据入队列时一定是从尾部插入吗数据出队列时一定是从头部删除吗队列的基本运算有什么队列支持随机访问吗队列的英文表示什么是队列队列从哪进、从哪出队列的进出顺序队列是用哪种结构实现的Queue和Deque有什么区别Queue接口的方法Queue中的add与offer的区别offer、poll、peek的模拟实现如何利用链表实现队列如何利用顺序表实现队列什么叫做双端队列
分支和循环（下） tryxr 服务器运维
写⼀个猜数字游戏游戏要求：1.电脑⾃动⽣成1~100的随机数2.玩家猜数字，猜数字的过程中，根据猜测数据的⼤⼩给出⼤了或⼩了的反馈，直到猜对，游戏结束1.随机数生成要想完成猜数字游戏，⾸先得产⽣随机数，那怎么产⽣随机数呢？randC语⾔提供了⼀个函数叫rand，这函数是可以⽣成随机数的，函数原型如下所⽰：intrand(void);rand函数会返回⼀个伪随机数，这个随机数的范围是在0~RAND_
5G-RAN与语义通信RAN 一去不复返的通信er 智简网络&语义通信 5G 人工智能语义通信
1️⃣RAN协议栈与TCP/IP五层协议栈的对应关系a.物理层（TCP/IP）↔PHY（RAN）对应关系：5GNRRAN的物理层直接对应TCP/IP的物理层。功能对比：TCP/IP物理层：负责比特流的物理传输，如通过电缆、光纤或无线介质传输信号。RAN物理层：处理无线信号的调制、编码、信道估计和传输（如OFDM、LDPC编码）。在5GNR中，物理层负责将数据映射到无线信道（如PDSCH、PUSCH
第二十二天（数据结构，无头节点的单项链表）肉夹馍不加青椒 c语言数据结构
线性表：一个线性表里面可以是任意的数据元素，但是同一个线性表里面数据应该是同类型的1存在一个/唯一被称为第一个节点的节点2存在一个/唯一被称为最后一个节点的节点3除了第一个以外，每一个元素都有一个前驱节点4除了最后一个，每一个元素都有一个后继节点满足以上性质，这个表就被称为线性表数组就是一个线性表想实现线性表的保存，我们需要考虑下面的事情1元素要保存2元素与元素之间的序偶关系谁是前面的谁是后面的我
apache ftpserver-CentOS config gengzg apache
<server xmlns="http://mina.apache.org/ftpserver/spring/v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://mina.apache.o
优化MySQL数据库性能的八种方法 AILIKES sql mysql
1、选取最适用的字段属性　　MySQL可以很好的支持大数据量的存取，但是一般说来，数据库中的表越小，在它上面执行的查询也就会越快。因此，在创建表的时候，为了获得更好的性能，我们可以将表中字段的宽度设得尽可能小。例如，在定义邮政编码这个字段时，如果将其设置为CHAR(255),显然给数据库增加了不必要的空间，甚至使用VARCHAR这种类型也是多余的，因为CHAR(6)就可以很
JeeSite 企业信息化快速开发平台 Kai_Ge JeeSite
JeeSite 企业信息化快速开发平台平台简介 JeeSite是基于多个优秀的开源项目，高度整合封装而成的高效，高性能，强安全性的开源Java EE快速开发平台。 JeeSite本身是以Spring Framework为核心容器，Spring MVC为模型视图控制器，MyBatis为数据访问层， Apache Shiro为权限授权层，Ehcahe对常用数据进行缓存，Activit为工作流
通过Spring Mail Api发送邮件 120153216 邮件 main
原文地址：http://www.open-open.com/lib/view/open1346857871615.html 使用Java Mail API来发送邮件也很容易实现，但是最近公司一个同事封装的邮件API实在让我无法接受，于是便打算改用Spring Mail API来发送邮件，顺便记录下这篇文章。【Spring Mail API】 Spring Mail API都在org.spri
Pysvn 程序员使用指南 2002wmj SVN
源文件:http://ju.outofmemory.cn/entry/35762 这是一篇关于pysvn模块的指南. 完整和详细的API请参考 http://pysvn.tigris.org/docs/pysvn_prog_ref.html. pysvn是操作Subversion版本控制的Python接口模块. 这个API接口可以管理一个工作副本, 查询档案库, 和同步两个. 该
在SQLSERVER中查找被阻塞和正在被阻塞的SQL 357029540 SQL Server
SELECT R.session_id AS BlockedSessionID , S.session_id AS BlockingSessionID , Q1.text AS Block
Intent 常用的用法备忘 7454103 .net android Google Blog F#
Intent 应该算是Android中特有的东西。你可以在Intent中指定程序要执行的动作（比如：view,edit,dial），以及程序执行到该动作时所需要的资料。都指定好后，只要调用startActivity()，Android系统会自动寻找最符合你指定要求的应用程序，并执行该程序。下面列出几种Intent 的用法显示网页:
Spring定时器时间配置 adminjun spring 时间配置定时器
红圈中的值由6个数字组成，中间用空格分隔。第一个数字表示定时任务执行时间的秒，第二个数字表示分钟，第三个数字表示小时，后面三个数字表示日，月，年，< xmlnamespace prefix ="o" ns ="urn:schemas-microsoft-com:office:office" /> 测试的时候，由于是每天定时执行，所以后面三个数
POJ 2421 Constructing Roads 最小生成树 aijuans 最小生成树
来源：http://poj.org/problem?id=2421 题意：还是给你n个点，然后求最小生成树。特殊之处在于有一些点之间已经连上了边。思路：对于已经有边的点，特殊标记一下，加边的时候把这些边的权值赋值为0即可。这样就可以既保证这些边一定存在，又保证了所求的结果正确。代码： #include <iostream> #include <cstdio>
重构笔记——提取方法（Extract Method） ayaoxinchao java 重构提炼函数局部变量提取方法
提取方法（Extract Method）是最常用的重构手法之一。当看到一个方法过长或者方法很难让人理解其意图的时候，这时候就可以用提取方法这种重构手法。下面是我学习这个重构手法的笔记：提取方法看起来好像仅仅是将被提取方法中的一段代码，放到目标方法中。其实，当方法足够复杂的时候，提取方法也会变得复杂。当然，如果提取方法这种重构手法无法进行时，就可能需要选择其他
为UILabel添加点击事件 bewithme UILabel
默认情况下UILabel是不支持点击事件的，网上查了查居然没有一个是完整的答案，现在我提供一个完整的代码。 UILabel *l = [[UILabel alloc] initWithFrame:CGRectMake(60, 0, listV.frame.size.width - 60, listV.frame.size.height)]
NoSQL数据库之Redis数据库管理(PHP-REDIS实例) bijian1013 redis 数据库 NoSQL
一.redis.php <?php //实例化 $redis = new Redis(); //连接服务器 $redis->connect("localhost"); //授权 $redis->auth("lamplijie"); //相关操
SecureCRT使用备注 bingyingao secureCRT 每页行数
SecureCRT日志和卷屏行数设置一、使用securecrt时，设置自动日志记录功能。 1、在C:\Program Files\SecureCRT\下新建一个文件夹(也就是你的CRT可执行文件的路径），命名为Logs； 2、点击Options -> Global Options -> Default Session -> Edite Default Sett
【Scala九】Scala核心三：泛型 bit1129 scala
泛型类 package spark.examples.scala.generics class GenericClass[K, V](val k: K, val v: V) { def print() { println(k + "," + v) } } object GenericClass { def main(args: Arr
素数与音乐 bookjovi 素数数学 haskell
由于一直在看haskell，不可避免的接触到了很多数学知识，其中数论最多，如素数，斐波那契数列等，很多在学生时代无法理解的数学现在似乎也能领悟到那么一点。闲暇之余，从图书馆找了<<The music of primes>>和<<世界数学通史>>读了几遍。其中素数的音乐这本书与软件界熟知的&l
Java-Collections Framework学习与总结-IdentityHashMap BrokenDreams Collections
这篇总结一下java.util.IdentityHashMap。从类名上可以猜到，这个类本质应该还是一个散列表，只是前面有Identity修饰，是一种特殊的HashMap。简单的说，IdentityHashMap和HashM
读《研磨设计模式》-代码笔记-享元模式-Flyweight bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.Collection; import java.util.HashMap; import java.util.List; import java
PS人像润饰&调色教程集锦 cherishLC PS
1、仿制图章沿轮廓润饰——柔化图像，凸显轮廓 http://www.howzhi.com/course/retouching/ 新建一个透明图层，使用仿制图章不断Alt+鼠标左键选点，设置透明度为21%，大小为修饰区域的1/3左右（比如胳膊宽度的1/3），再沿纹理方向（比如胳膊方向）进行修饰。所有修饰完成后，对该润饰图层添加噪声，噪声大小应该和
更新多个字段的UPDATE语句 crabdave update
更新多个字段的UPDATE语句 update tableA a set (a.v1, a.v2, a.v3, a.v4) = --使用括号确定更新的字段范围
hive实例讲解实现in和not in子句 daizj hive not in in
本文转自：http://www.cnblogs.com/ggjucheng/archive/2013/01/03/2842855.html 当前hive不支持 in或not in 中包含查询子句的语法，所以只能通过left join实现。假设有一个登陆表login(当天登陆记录,只有一个uid),和一个用户注册表regusers(当天注册用户，字段只有一个uid)，这两个表都包含
一道24点的10+种非人类解法（2,3,10,10） dsjt 算法
这是人类算24点的方法？！！！事件缘由：今天晚上突然看到一条24点状态，当时惊为天人，这NM叫人啊？以下是那条状态朱明西 : 24点，算2 3 10 10，我LX炮狗等面对四张牌痛不欲生，结果跑跑同学扫了一眼说，算出来了，2的10次方减10的3次方。。我草这是人类的算24点啊。。然后么。。。我就在深夜很得瑟的问室友求室友算刚出完题，文哥的暴走之旅开始了 5秒后
关于YII的菜单插件 CMenu和面包末breadcrumbs路径管理插件的一些使用问题 dcj3sjt126com yii framework
在使用 YIi的路径管理工具时，发现了一个问题。 <?php
对象与关系之间的矛盾：“阻抗失配”效应[转] come_for_dream 对象
概述 “阻抗失配”这一词组通常用来描述面向对象应用向传统的关系数据库（RDBMS）存放数据时所遇到的数据表述不一致问题。C++程序员已经被这个问题困扰了好多年，而现在的Java程序员和其它面向对象开发人员也对这个问题深感头痛。 “阻抗失配”产生的原因是因为对象模型与关系模型之间缺乏固有的亲合力。“阻抗失配”所带来的问题包括：类的层次关系必须绑定为关系模式（将对象
学习编程那点事 gcq511120594 编程互联网
一年前的夏天，我还在纠结要不要改行，要不要去学php？能学到真本事吗？改行能成功吗？太多的问题，我终于不顾一切，下定决心，辞去了工作，来到传说中的帝都。老师给的乘车方式还算有效，很顺利的就到了学校，赶巧了，正好学校搬到了新校区。先安顿了下来，过了个轻松的周末，第一次到帝都，逛逛吧！接下来的周一，是我噩梦的开始，学习内容对我这个零基础的人来说，除了勉强完成老师布置的作业外，我已经没有时间和精力去
Reverse Linked List II hcx2013 list
Reverse a linked list from position m to n. Do it in-place and in one-pass. For example:Given 1->2->3->4->5->NULL, m = 2 and n = 4, return
Spring4.1新特性——页面自动化测试框架Spring MVC Test HtmlUnit简介 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
Hadoop集群工具distcp liyonghui160com
1. 环境描述两个集群：rock 和 stone rock无kerberos权限认证，stone有要求认证。 1. 从rock复制到stone，采用hdfs Hadoop distcp -i hdfs://rock-nn:8020/user/cxz/input hdfs://stone-nn:8020/user/cxz/运行在rock端，即源端问题：报版本
一个备份MySQL数据库的简单Shell脚本 pda158 mysql 脚本
　　主脚本（用于备份mysql数据库）：　　该Shell脚本可以自动备份数据库。只要复制粘贴本脚本到文本编辑器中，输入数据库用户名、密码以及数据库名即可。我备份数据库使用的是mysqlump 命令。后面会对每行脚本命令进行说明。　　 1. 分别建立目录“backup”和“oldbackup” 　　#mkdir /backup 　　#mkdir /oldbackup 　
300个涵盖IT各方面的免费资源（中）——设计与编码篇 shoothao IT资源图标库图片库色彩板字体
A. 免费的设计资源 Freebbble:来自于Dribbble的免费的高质量作品。 Dribbble:Dribbble上“免费”的搜索结果——这是巨大的宝藏。 Graphic Burger:每个像素点都做得很细的绝佳的设计资源。 Pixel Buddha:免费和优质资源的专业社区。 Premium Pixels:为那些有创意的人提供免费的素材。
thrift总结 - 跨语言服务开发 uule thrift
官网官网JAVA例子 thrift入门介绍 IBM-Apache Thrift - 可伸缩的跨语言服务开发框架 Thrift入门及Java实例演示 thrift的使用介绍 RPC POM： <dependency> <groupId>org.apache.thrift</groupId>