Covers: theory of Basic Similarity based methods

- What are user-based and item-based models and how does one perform compared to another?
- How can we build a Rec Sys based on similarity based concepts like Cosine Similarity, Pearson Correlation etc? What does amplifying a similarity function do?
- What is mean centering and why do we use them? What are the common alternatives?
- How does inverse user frequency help to handle long-tail? Why should we handle long-tails in the first place?
- What is the computational complexity of the overall system? How to understand off-line phase and online-phase?

Section 2.3.1 - explains similarity computation with the help of an example. More https://stats.stackexchange.com/questions/235673/is-there-any-relationship-among-cosine-similarity-pearson-correlation-and-z-sc, https://stackoverflow.com/questions/1838806/euclidean-distance-vs-pearson-correlation-vs-cosine-similarity

Section 2.3.3 - explains the computational Complexity https://www.youtube.com/watch?v=D6xkbGLQesk

Section 2.3.4 - compares user and item based methods and how they generally perform against each other.

Charu C. Aggarwal

0 comment

Contributors

- Objectives
- Helps to quickly understand basic ideas of neighbourhood based collaborative filtering
- Potential Use Cases
- Build recommendation engines, Identify trends etc
- Who is This For ?
- BEGINNERPython developers

Click on each of the following **annotated items** to see details.

Resources4/4

ARTICLE 1. Ratings Matrix

- What is the basic data structure of this Matrix and what are its properties?
- What is a long-tail and how does it impact recommendation systems and what can we do about them?

30 minutes

ARTICLE 2. User-based and Item-based similarity Computations

- What are user-based and item-based models and how does one perform compared to another?
- How can we build a Rec Sys based on similarity based concepts like Cosine Similarity, Pearson Correlation etc? What does amplifying a similarity function do?
- What is mean centering and why do we use them? What are the common alternatives?
- How does inverse user frequency help to handle long-tail? Why should we handle long-tails in the first place?
- What is the computational complexity of the overall system? How to understand off-line phase and online-phase?

30 minutes

ARTICLE 3. Clustering and Similarity Based methods

- What is the problems of Sparsity as well as computational complexity?
- what are SVD, PCA and k-means?
- How to do MLE for estimating missing values in the Ratings Matrix?

20 minutes

ARTICLE 4. Regression View of Neighbourhood based method

- How the similarity coefficients that we use is similar to learned weights in a linear regression model?
- How to use Least Squares optimisation to learn the coefficients?
- How to handle sparsity and bias issues?
- How to understand regularization?

30 minutes

0 comment