Covers: theory of Basic Similarity based methods
Estimated time needed to finish: 30 minutes
Questions this item addresses:
  • What are user-based and item-based models and how does one perform compared to another?
  • How can we build a Rec Sys based on similarity based concepts like Cosine Similarity, Pearson Correlation etc? What does amplifying a similarity function do?
  • What is mean centering and why do we use them? What are the common alternatives?
  • How does inverse user frequency help to handle long-tail? Why should we handle long-tails in the first place?
  • What is the computational complexity of the overall system? How to understand off-line phase and online-phase?
How to use this item?

Section 2.3.1 - explains similarity computation with the help of an example. More https://stats.stackexchange.com/questions/235673/is-there-any-relationship-among-cosine-similarity-pearson-correlation-and-z-sc, https://stackoverflow.com/questions/1838806/euclidean-distance-vs-pearson-correlation-vs-cosine-similarity

Section 2.3.3 - explains the computational Complexity https://www.youtube.com/watch?v=D6xkbGLQesk

Section 2.3.4 - compares user and item based methods and how they generally perform against each other.

Author(s) / creator(s) / reference(s)
Charu C. Aggarwal
0 comment
Recipe
publicShare
Star0

Neighbourhood Based Collaborative Filtering - Basic ideas

Contributors
Total time needed: ~2 hours
Objectives
Helps to quickly understand basic ideas of neighbourhood based collaborative filtering
Potential Use Cases
Build recommendation engines, Identify trends etc
Who is This For ?
BEGINNERPython developers
Click on each of the following annotated items to see details.
ARTICLE 1. Ratings Matrix
  • What is the basic data structure of this Matrix and what are its properties?
  • What is a long-tail and how does it impact recommendation systems and what can we do about them?
30 minutes
ARTICLE 2. User-based and Item-based similarity Computations
  • What are user-based and item-based models and how does one perform compared to another?
  • How can we build a Rec Sys based on similarity based concepts like Cosine Similarity, Pearson Correlation etc? What does amplifying a similarity function do?
  • What is mean centering and why do we use them? What are the common alternatives?
  • How does inverse user frequency help to handle long-tail? Why should we handle long-tails in the first place?
  • What is the computational complexity of the overall system? How to understand off-line phase and online-phase?
30 minutes
ARTICLE 3. Clustering and Similarity Based methods
  • What is the problems of Sparsity as well as computational complexity?
  • what are SVD, PCA and k-means?
  • How to do MLE for estimating missing values in the Ratings Matrix?
20 minutes
ARTICLE 4. Regression View of Neighbourhood based method
  • How the similarity coefficients that we use is similar to learned weights in a linear regression model?
  • How to use Least Squares optimisation to learn the coefficients?
  • How to handle sparsity and bias issues?
  • How to understand regularization?
30 minutes

Concepts Covered

0 comment