【IFB214】Data Mining Applications

Weight:
- This assignment holds a weightage of 50% towards your final grade.
Assessment Type:
- Individual
Format Requirements:
- Your assignment should include a cover page with your student ID.
- The report format should encompass an introduction, response to each task, conclusion, and
references. Appendix A provides a comprehensive report structure.
- Ensure that all assignments are well-typed, proofread, and maintain a professional appearance.
- Use 'Times New Roman', 'Arial', or 'Calibri' font, size 12, with 1.5 spacing.
- Set left and right margins to 'Justified'.
- Employ Harvard Referencing when citing sources and creating references.
Submission Requirements:
- All students are required to submit one document:
- PDF file of the individual report.
- Submit your files through Learning Mall Online in the designated drop box. Only electronic
submissions are accepted.
- Ensure to name each file as IFB214TC-CW2-Your Student ID.
- After submission, download your file and confirm its viewability. Please note that document
corruption during uploading (e.g., due to slow internet connections) is the student's responsibility.
Submitted files should be functional and correct for assessment purposes.
Word Limit:
- Please ensure your report stays within the range of -10% to +10% of the 1,500 -word limit. Utilize
tables and charts to enhance conciseness while avoiding excessive length.
In the era of online streaming platforms, movie recommendation systems play a pivotal role in enhancing
user experience. These systems rely on data mining techniques to analyze customer preferences and provide
tailored movie suggestions. As a data analyst, you are tasked with exploring a movie recommendation
dataset that includes customer IDs, movie IDs, and movie ratings. By employing Python and data mining
techniques, you can uncover valuable insights that contribute to better movie recommendations and user
satisfaction.
Your mission is to thoroughly analyze the provided movie recommendation dataset using Python and
various data mining techniques. The dataset consists of customer IDs, movie IDs, and movie ratings,
representing the interactions between customers and movies. For each task, you are required to provide
clear explanations, Python code implementations, and relevant visualizations. Ensure to include the
generated Python code screenshots in the report for reference.
Task 1: Customer Preferences and Ratings (10 Marks)
1. Identify and list movies that have the highest average ratings across all customers.
2. Detect customers who consistently rate movies positively or negatively.
Task 2: Movie Popularity and Ratings (15 Marks)
1. Determine the top 10 most popular movies based on the number of ratings they received.
2. Investigate whether there is a correlation between a movie's popularity (number of ratings) and its average
rating.
Task 3: Outlier Detection and Anomalies (10 Marks)
Identify customers who consistently provide extreme ratings (e.g., always giving the lowest or highest
ratings). Explore the potential impact of these outliers on the recommendation system.

 

Task 4: Clustering Analysis (20 Marks)
Apply the K-Means clustering algorithm to group customers based on their movie ratings. Use techniques
like Principal Component Analysis (PCA) for dimensionality reduction and visualize the resulting clusters
using scatter plots or other appropriate visualization methods.
Task 5: Apriori Algorithm for Association Rules (25 Marks)
1. Preprocess the dataset to prepare it for the Apriori algorithm.
2. Apply the Apriori algorithm to discover frequent itemsets, representing movies that are frequently rated
together.
3. Provide a sample of the discovered frequent itemsets and associated association rules.
4. Discuss the insights gained from the results and propose how these insights could be utilized to enhance
movie recommendations.
Overall Presentation and Code Quality (20 Marks)
Evaluate the clarity and coherence of your analysis, and ensure proper documentation and organization of
your code. Pay attention to code readability and efficiency.
Note : Each student will be allocated a unique xls file. The file will be uploaded in LMO in due course. The
sample data structure is shown in Table 1.

 

你可能感兴趣的:(数据挖掘)