大数据推荐系统(5)Mahout

大数据推荐系统算法(1)大数据框架介绍
大数据推荐系统算法(2) lambda架构
大数据推荐系统算法(3) 用户画像
大数据推荐系统(4)推荐算法
大数据推荐系统(5)Mahout
大数据推荐系统(6)Spark
大数据推荐系统(7)推荐系统与Lambda架构
大数据推荐系统(8)分布式数据收集和存储
大数据推荐系统(9)实战
开发环境:
Linux + Intellij IDEA(IDE) +SBT(Simple Build Tool)(项目管理工具) 和 Maven + 持续集成:Jenkins(Jenkins是基于Java开发的一种持续集成工具,用于监控持续重复的工作)

大数据推荐系统(5)Mahout_第1张图片
Spark 基于内存,图调度,算子简单。 scala
H2O 预测分析的平台
Flink 做流处理的平台 (也可做批处理)

Mahout架构:high-level
大数据推荐系统(5)Mahout_第2张图片

Mahout架构:low-level
大数据推荐系统(5)Mahout_第3张图片

Mahout 推荐系统
(1)Mahout实现了协同过滤框架
使用历史数据(打分,点击,购买等)作为推荐的依据
User-based: 通过发现类似的用户推荐商品。由于用户多变的特性,这种方法很那扩展;
Item-based:通过计算item之间相似度推荐商品。商品不易变化,相似度矩阵可离线计算得到。(诞生于Amazon)
 MF-based:通过将原始的user-item矩阵分解成小的矩阵,分析潜在的影响因子,并以解释用户的行为。(诞生于Netflix Prize)

(2)Mahout实现了协同过滤框架
SVD(Singular Value Decomposition)因式分解实现协同过滤
基于ALS(alternating least squares)的协同过滤算法 (NMF)

Mahout推荐系统架构
大数据推荐系统(5)Mahout_第4张图片
输入输出
输入:原始数据(user preferences,用户偏好)
输出:用户偏好估计
步骤
Step 1:将原始数据映射到Mahout定义的Data Model中 (U I P )
Step 2: 调优推荐组件
相似度组件,临界关系组件等
Step 3: 计算排名估计值
Step 4:评估推荐结果

Mahout推荐系统组件
Mahout关键抽象是通过Java Interface实现的
DataModel Interface
将原始数据映射成Mahout兼容格式
UserSimilarity Interface
计算两个用户间的相关度
ItemSimilarity Interface
计算两个商品间的相关度
UserNeighborhood Interface
定义用户或商品间的“临近”
Recommender Interface
实现具体的推荐算法,完成推荐功能(包括训练,预测等)

(1)DataModel
大数据推荐系统(5)Mahout_第5张图片

大数据推荐系统(5)Mahout_第6张图片
不管是什么数据源,他们共享同样的底层实现
基本对象:Preference
三元组(user, item, score)
存储在UserPreferenceArray中
大数据推荐系统(5)Mahout_第7张图片

(2)UserSimilarity
UserSimilarity定义了两个用户的相似度
类似的,ItemSimilarity定义了两个商品间的相似度
相似度实现
Pearson Correlation
Spearman Correlation
Euclidean Distance
Tanimoto Coefficient
LogLikelihood Similarity

(3)UserNeighborhood
大数据推荐系统(5)Mahout_第8张图片

推荐系统评估
大数据推荐系统(5)Mahout_第9张图片
第一种Prediction-based measures
大数据推荐系统(5)Mahout_第10张图片
第二种 IR-based measures
大数据推荐系统(5)Mahout_第11张图片
实例1:preferences
要求
创建user-item偏好数据,并输出
实现
使用GenericUserPreferenceArray创建数据
通过PreferenceArray存储数据

package com.dylan.example;

import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.cf.taste.model.PreferenceArray;

public class CreatePreferenceArray {
    private CreatePreferenceArray() {
    }

    public static void main(String[] args) {
        PreferenceArray User1Pref = new GenericUserPreferenceArray(2);
        User1Pref.setUserID(0, 1L);
        User1Pref.setItemID(0, 101L);
        User1Pref.setValue(0, 3.0f);
        User1Pref.setItemID(1, 102L);
        User1Pref.setValue(1, 4.0f);
        Preference pref = User1Pref.get(1);
        System.out.println(User1Pref);
    }
}

实例2:data model
PreferenceArray存储了单个用户的偏好
所有用户的偏好数据如何保存?
HashMap? NO!
Mahout引入了一个为推荐任务优化的数据结构
FastByIDMap
需求
使用GenericDataModel读入FastByIDMap数据

package com.dylan.example;

import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.GenericDataModel;
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;

public class CreateGenericDataModel {
    private CreateGenericDataModel() {
    }

    public static void main(String[] args) {
        FastByIDMap preferences = new FastByIDMap();
        PreferenceArray User1Pref = new GenericUserPreferenceArray(2);
        User1Pref.setUserID(0, 1L);
        User1Pref.setItemID(0, 101L);
        User1Pref.setValue(0, 3.0f);
        User1Pref.setItemID(1, 102L);
        User1Pref.setValue(1, 4.0f);

        PreferenceArray User2Pref = new GenericUserPreferenceArray(2);
        User2Pref.setUserID(0, 2L);
        User2Pref.setItemID(0, 101L);
        User2Pref.setValue(0, 3.0f);
        User2Pref.setItemID(1, 102L);
        User2Pref.setValue(1, 4.0f);

        preferences.put(1L, User1Pref);
        preferences.put(2L, User2Pref);

        DataModel model = new GenericDataModel(preferences);
        System.out.println(model);
    }
}

实例3:Recommender
需求
通过User-based协同过滤推荐算法给用户1推荐2个商品
实现
使用FileDataModel读入文件
通过PearsonCorrelationSimilarity来计算相似度
使用GenericUserBasedRecommender构建推荐引

package com.dylan.example;

import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;

import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.similarity.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;

import java.io.File;
import java.util.List;

public class RecommenderIntro {
    private RecommenderIntro() {
    }

    public static void main(String[] args) throws Exception{
        DataModel model = new FileDataModel(new File("/root/data/ua.base"));
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
        Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);

        List recommendedItems = recommender.recommend(1, 20);

        for (RecommendedItem recommendedItem: recommendedItems){
            System.out.println(recommendedItem);
        }
    }
}

实例4 推荐模型评估(1)
需求
评估实例3的推荐系统的优劣
实现
使用AverageAbsoluteDifferenceRecommenderEvaluator和RMSRecommenderEvaluator来评估模型
通过RecommenderBuilder来实现评估模型

package com.dylan.example;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;

import java.io.File;

public class EvaluatorIntro {
    private EvaluatorIntro() {
    }

    public static void main(String[] args) throws Exception {

        RandomUtils.useTestSeed();

        final DataModel model = new FileDataModel(new File("/root/data/ua.base"));
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        RecommenderEvaluator recommenderEvaluator = new RMSRecommenderEvaluator();

        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            @Override
            public Recommender buildRecommender(DataModel model) throws TasteException {
                UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
                return new GenericUserBasedRecommender(model, neighborhood, similarity);
            }
        };

        double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
        double rmse = recommenderEvaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);

        System.out.println(score);
        System.out.println(rmse);
    }
}

推荐模型评估(2)
需求
通过IR指标来评估实例3的推荐系统的优劣
实现
使用RecommenderIRStatsEvaluator来进行评估

package com.dylan.example;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.*;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;

import java.io.File;

public class IREvaluatorIntro {
    private IREvaluatorIntro() {
    }

    public static void main(String[] args) throws Exception {

        RandomUtils.useTestSeed();

        final DataModel model = new FileDataModel(new File("/root/data/ua.base"));
        RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            @Override
            public Recommender buildRecommender(DataModel model) throws TasteException {
                UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
                return new GenericUserBasedRecommender(model, neighborhood, similarity);
            }
        };

        IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);

        System.out.println(stats.getPrecision());
        System.out.println(stats.getRecall());
        System.out.println(stats.getF1Measure());
    }
}

实例6:MovieLens推荐系统
需求
使用MovieLens 1M数据集实现电影推荐系统
步骤
实现MovieLens数据集的DataModel
实现Item-based和User-based的协同过滤推荐,并保存结果

1.新构建了 data
2.多线程来进行相似度矩阵的求解,得到similarities.csv的文件
3.对用户进行推荐 得到userRcomed.csv
Recommender cachingRecommender = new CachingRecommender(recommender); 做缓存的作用(数据量大的时候)

(1)把原始数据’::'分割的数据,转变成‘,’的数据

package com.dylan.MovieLens;

import org.apache.commons.io.Charsets;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.common.iterator.FileLineIterable;

import java.io.*;
import java.util.regex.Pattern;

public class MovieLensDataModel extends FileDataModel {

    private static String COLON_DELIMITER="::";
    private static Pattern COLON_DELIMITER_PATTERN=Pattern.compile(COLON_DELIMITER);

    public MovieLensDataModel(File ratingsFile) throws IOException{
        super(convertFile(ratingsFile));
    }

    private static File convertFile(File orginalFile) throws IOException{
        File resultFile = new File(System.getProperty("java.io.tmpdir"), "ratings.csv");
        if (resultFile.exists()){
            resultFile.delete();
        }
        try(Writer writer = new OutputStreamWriter(new FileOutputStream(resultFile), Charsets.UTF_8)) {

            for (String line: new FileLineIterable(orginalFile, false)){
                int lastIndex = line.lastIndexOf(COLON_DELIMITER);

                if (lastIndex < 0 ){
                    throw new IOException("Invalid data!");
                }
                String subLine = line.substring(0, lastIndex);

                String convertedSubLine = COLON_DELIMITER_PATTERN.matcher(subLine).replaceAll(",");
                writer.write(convertedSubLine);
                writer.write('\n');
            }
        } catch (IOException ioe){
            resultFile.delete();
            throw ioe;
        }
        return resultFile;
    }
}

(2)多线程批量生成结果。 批处理的方式。得到相似度文件。ICF可以用多线程处理(离线处理),UCF不行

package com.dylan.MovieLens;


import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.precompute.FileSimilarItemsWriter;
import org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.ItemBasedRecommender;
import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
import org.apache.mahout.cf.taste.similarity.precompute.BatchItemSimilarities;
import org.apache.mahout.cf.taste.similarity.precompute.SimilarItemsWriter;

import java.io.File;

public class BatchItemSimilaritiesMovieLens {
    private BatchItemSimilaritiesMovieLens(){
    }

    public static void main(String[] args) throws Exception{

        if (args.length !=1){
            System.err.println("Needs MovieLens 1M dataset as arugument!");
            System.exit(-1);
        }

        File resultFile = new File(System.getProperty("java.io.tmpdir"), "similarities.csv");

        DataModel dataModel = new MovieLensDataModel(new File(args[0]));
        ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
        ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel, similarity);
        BatchItemSimilarities batchItemSimilarities = new MultithreadedBatchItemSimilarities(recommender, 5);

        SimilarItemsWriter writer = new FileSimilarItemsWriter(resultFile);

        int numSimilarites = batchItemSimilarities.computeItemSimilarities(Runtime.getRuntime().availableProcessors(), 1, writer);

        System.out.println("Computed "+ numSimilarites+ " for "+ dataModel.getNumItems()+" items and saved them to "+resultFile.getAbsolutePath());
    }
}

(3)基于user的推荐 得到userRcomed.csv 文件

package com.dylan.MovieLens;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.impl.eval.RMSRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.CachingRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;

public class UserRecommenderMovieLens {
    private UserRecommenderMovieLens(){
    }

    public static void main(String[] args) throws Exception {

        if (args.length != 1) {
            System.err.println("Needs MovieLens 1M dataset as arugument!");
            System.exit(-1);
        }

        File resultFile = new File(System.getProperty("java.io.tmpdir"), "userRcomed.csv");

        DataModel dataModel = new MovieLensDataModel(new File(args[0]));
        UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, dataModel);

        Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
        Recommender cachingRecommender = new CachingRecommender(recommender);

        //Evaluate
        RMSRecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            @Override
            public Recommender buildRecommender(DataModel dataModel) throws TasteException {
                UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, dataModel);
                return new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
            }
        };
        double score = evaluator.evaluate(recommenderBuilder, null, dataModel, 0.9, 0.5);
        System.out.println("RMSE score is "+score);

        try(PrintWriter writer = new PrintWriter(resultFile)){
            for (int userID=1; userID <= dataModel.getNumUsers(); userID++){
                List recommendedItems = cachingRecommender.recommend(userID, 2);
                String line = userID+" : ";
                for (RecommendedItem recommendedItem: recommendedItems){
                    line += recommendedItem.getItemID()+":"+recommendedItem.getValue()+",";
                }
                if (line.endsWith(",")){
                    line = line.substring(0, line.length()-1);
                }
                writer.write(line);
                writer.write('\n');
            }
        } catch (IOException ioe){
            resultFile.delete();
            throw ioe;
        }
        System.out.println("Recommended for "+dataModel.getNumUsers()+" users and saved them to "+resultFile.getAbsolutePath());
    }
}

实例7 常用开放数据集:Book-Crossing
1.内容
来自Book-Crossing图书社区,读者对书籍的评分
2.数据量(数据条数)
278858个用户对271379本书进行的评分,包括显式和隐式的评分
3.数据集下载
http://grouplens.org/datasets/book-crossing/

显示数据评分 1-10
大数据推荐系统(5)Mahout_第12张图片
隐式:点击,购买等

需求
使用BookCrossing数据集实现两种图书推荐系统
基于ratings推荐
无ratings推荐 (用布尔变量,点击为1,没点击为0)

步骤
实现BookCrossing数据集的DataModel
实现两套推荐系统
使用GenericBooleanPrefUserBasedRecommender
实现DataModelBuilder

(1)基于ratings推荐
数据处理:

package com.dylan.BookCrossing;

import org.apache.commons.io.Charsets;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.common.iterator.FileLineIterable;

import java.io.*;
import java.util.regex.Pattern;

public class BXDataModel extends FileDataModel {

    //private static String COLON_DELIMITER="::";
    private static Pattern NON_DIGIT_SEMICOLON_DELIMITER=Pattern.compile("[^0-9;]");

    public BXDataModel(File ratingsFile, Boolean ignoreRatings) throws IOException{
        super(convertFile(ratingsFile, ignoreRatings));
    }

    private static File convertFile(File orginalFile, Boolean ignoreRatings) throws IOException{
        File resultFile = new File(System.getProperty("java.io.tmpdir"), "bookcrossing.csv");
        if (resultFile.exists()){
            resultFile.delete();
        }
        try(Writer writer = new OutputStreamWriter(new FileOutputStream(resultFile), Charsets.UTF_8)) {

            for (String line: new FileLineIterable(orginalFile, true)){
                if (line.endsWith("\"0\"")){
                    continue;
                }
                String convertedLine = NON_DIGIT_SEMICOLON_DELIMITER.matcher(line).replaceAll("").replace(';', ',');
                if (convertedLine.contains(",,")){
                    continue;
                }
                if (ignoreRatings){
                    convertedLine = convertedLine.substring(0, convertedLine.lastIndexOf(','));
                }

                writer.write(convertedLine);
                writer.write('\n');
            }
        } catch (IOException ioe){
            resultFile.delete();
            throw ioe;
        }
        return resultFile;
    }
}

准备一个 Recommender

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.Refreshable;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

import java.util.Collection;
import java.util.List;

public class BXRecommender implements Recommender{
    private Recommender recommender;
    public BXRecommender(DataModel dataModel) throws TasteException{
        UserSimilarity similarity = new EuclideanDistanceSimilarity(dataModel);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, 0.2, similarity,dataModel, 0.2);
        recommender = new GenericUserBasedRecommender(dataModel, neighborhood, similarity);
    }

    public List recommend(long userID, int howMany) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, false);
    }

    public List recommend(long userID, int howMany, boolean includeKnownItems) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
    }

    public List recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {
        return recommender.recommend(userID, howMany, rescorer, false);
    }

    @Override
    public List recommend(long userID, int howMany, IDRescorer idRescorer, boolean includeKnownItems) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
    }

    @Override
    public float estimatePreference(long userID, long itemID) throws TasteException {
        return recommender.estimatePreference(userID, itemID);
    }

    public void setPreference(long userID, long itemID, float value) throws TasteException {
        recommender.setPreference(userID, itemID, value);
    }

    public void removePreference(long userID, long itemID) throws TasteException {
        recommender.removePreference(userID, itemID);
    }

    public DataModel getDataModel() {
        return recommender.getDataModel();
    }

    @Override
    public void refresh(Collection collection) {
        recommender.refresh(collection);
    }
}

实现RecommenderBuilder

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.Recommender;

public class BXBooleanRecommenderBuilder implements RecommenderBuilder {
    @Override
    public Recommender buildRecommender(DataModel dataModel) throws TasteException {
        return new BXBooleanRecommender(dataModel);
    }
}

实现 evaluator

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.model.DataModel;

import java.io.File;
import java.io.IOException;

public class BXBooleanRecommenderEvaluator {
    private BXBooleanRecommenderEvaluator(){
    }

    public static void main(String[] args) throws IOException, TasteException {
/*
        DataModel dataModel = new BXDataModel(new File(args[0]), true);
        RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
        IRStatistics stats = evaluator.evaluate(new BXBooleanRecommenderBuilder(), new BXDataModelBuilder(), dataModel, null, 3, Double.NEGATIVE_INFINITY, 1.0);

        System.out.println("Precision is "+stats.getPrecision()+"; Recall is "+stats.getRecall()+"; F1 is"+stats.getF1Measure());
*/
        DataModel dataModel = new BXDataModel(new File(args[0]), true);
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        double score = evaluator.evaluate(new BXBooleanRecommenderBuilder(), null, dataModel, 0.9, 0.3);

        System.out.println("MAE score is "+score);
    }
}

(2)无rating的数据
准备一个 Recommender

package com.dylan.BookCrossing;

import com.sun.tools.internal.xjc.reader.xmlschema.bindinfo.BIConversion;
import org.apache.mahout.cf.taste.common.Refreshable;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.IDRescorer;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.CachingUserSimilarity;

import java.util.Collection;
import java.util.List;

public class BXBooleanRecommender implements Recommender{
    private Recommender recommender;
    public BXBooleanRecommender(DataModel dataModel) throws TasteException{
        UserSimilarity similarity = new CachingUserSimilarity(new LogLikelihoodSimilarity(dataModel), dataModel);
        //UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, Double.NEGATIVE_INFINITY, similarity,dataModel, 1.0);
        UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, similarity, dataModel, 1.0);
        recommender = new GenericBooleanPrefUserBasedRecommender(dataModel, neighborhood, similarity);
    }

    public List recommend(long userID, int howMany) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, false);
    }

    public List recommend(long userID, int howMany, boolean includeKnownItems) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
    }

    public List recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {
        return recommender.recommend(userID, howMany, rescorer, false);
    }

    @Override
    public List recommend(long userID, int howMany, IDRescorer idRescorer, boolean includeKnownItems) throws TasteException {
        return recommender.recommend(userID, howMany, (IDRescorer) null, includeKnownItems);
    }

    @Override
    public float estimatePreference(long userID, long itemID) throws TasteException {
        return recommender.estimatePreference(userID, itemID);
    }

    public void setPreference(long userID, long itemID, float value) throws TasteException {
        recommender.setPreference(userID, itemID, value);
    }

    public void removePreference(long userID, long itemID) throws TasteException {
        recommender.removePreference(userID, itemID);
    }

    public DataModel getDataModel() {
        return recommender.getDataModel();
    }

    @Override
    public void refresh(Collection collection) {
        recommender.refresh(collection);
    }
}

实现RecommenderBuilder

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.Recommender;

public class BXRecommenderBuilder implements RecommenderBuilder {
    @Override
    public Recommender buildRecommender(DataModel dataModel) throws TasteException {
        return new BXRecommender(dataModel);
    }
}

实现 evaluator

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.model.DataModel;

import java.io.File;
import java.io.IOException;

public class BXRecommenderEvaluator {
    private BXRecommenderEvaluator(){
    }

    public static void main(String[] args) throws IOException, TasteException {
        DataModel dataModel = new BXDataModel(new File(args[0]), false);
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        double score = evaluator.evaluate(new BXRecommenderBuilder(), null, dataModel, 0.9, 0.3);

        System.out.println("MAE score is "+score);
    }
}

实现DataModelBuilder接口

package com.dylan.BookCrossing;

import org.apache.mahout.cf.taste.eval.DataModelBuilder;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.GenericBooleanPrefDataModel;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;

public class BXDataModelBuilder implements DataModelBuilder{
    @Override
    public DataModel buildDataModel(FastByIDMap fastByIDMap) {
        return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(fastByIDMap));
    }
}

优化 Recommender
1.
UserSimilarity similarity = new CachingUserSimilarity(new LogLikelihoodSimilarity(dataModel), dataModel);
2.
UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, similarity, dataModel, 1.0);
3.
评分标准的改变

Mahout 推荐 推荐
要求:基于MySQL中的电影评分的数据,使用Mahout为每个用户推荐3部电影

  1. 准备 准备 数据库表
    (1)在mahout数据库中创建表:
use mahout;
CREATE TABLE taste_preferences (
user_id BIGINT NOT NULL,
item_id BIGINT NOT NULL,
preference FLOAT NOT NULL,
PRIMARY KEY (user_id, item_id),
INDEX (user_id),
INDEX (item_id)
);

并将 ratings.dat 前三列导入taste_preferences 表中。

LOAD DATA LOCAL INFILE
"/home/root/code/MahoutRecommendation/src/main/resources/ratings.
dat" INTO TABLE mahout.taste_preferences FIELDS TERMINATED BY
'::'(user_id,item_id,preference);
  1. 实现 实现 推荐 算法
    使用MySQLJDBCDataModel
package com.dylan.practice;

import com.mysql.jdbc.jdbc2.optional.MysqlDataSource;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.JDBCDataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;

public class MysqlDataMovieRecommend {
    private MysqlDataMovieRecommend() throws TasteException, IOException {
    }

    public static void main(String[] args) throws TasteException, IOException {
        File resultFile = new File("/tmp", "MysqlMovieRcomed.txt");
        //Mysql Connection
        MysqlDataSource mysqlDataSource = new MysqlDataSource();
        mysqlDataSource.setDatabaseName("mahout");
        mysqlDataSource.setServerName("127.0.0.1");
        mysqlDataSource.setUser("mahout");
        mysqlDataSource.setPassword("mahout");
        mysqlDataSource.setAutoReconnect(true);
        mysqlDataSource.setFailOverReadOnly(false);


        JDBCDataModel dataModel = new MySQLJDBCDataModel(mysqlDataSource, "taste_preferences", "user_id", "item_id", "preference", null);
        DataModel model = dataModel;

        //Recommendations
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        //UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, similarity, model, 1.0);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
        Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);

        try (PrintWriter writer = new PrintWriter(resultFile)) {
            for (int userID = 1; userID <= model.getNumUsers(); userID++) {
                List recommendedItems = recommender.recommend(userID, 3);
                String line = userID + " : ";
                for (RecommendedItem recommendedItem : recommendedItems) {
                    line += recommendedItem.getItemID() + ":" + recommendedItem.getValue() + ",";
                }
                if (line.endsWith(",")) {
                    line = line.substring(0, line.length() - 1);
                }
                writer.write(line);
                writer.write('\n');
            }
        } catch (IOException ioe) {
            resultFile.delete();
            throw ioe;
        }
        System.out.println("Recommended for " + model.getNumUsers() + " users and saved them to " + resultFile.getAbsolutePath());
    }
}

1.movie数据导入MySQL
2.同步MySQL和java IDE

File resultFile = new File("/tmp", "MysqlMovieRcomed.txt");
        //Mysql Connection
        MysqlDataSource mysqlDataSource = new MysqlDataSource();
        mysqlDataSource.setDatabaseName("mahout");
        mysqlDataSource.setServerName("127.0.0.1");
        mysqlDataSource.setUser("mahout");
        mysqlDataSource.setPassword("mahout");
        mysqlDataSource.setAutoReconnect(true);
        mysqlDataSource.setFailOverReadOnly(false);

3.生成JDBCDataModel

JDBCDataModel dataModel = new MySQLJDBCDataModel(mysqlDataSource, "taste_preferences", "user_id", "item_id", "preference", null);
DataModel model = dataModel;

4.recommender,生成文件

改进:
1.MySQL的配置文件
在这里插入图片描述

2.相似个数10
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);

你可能感兴趣的:(大数据推荐算法)