[Dual Bandit RecSys] Joint Policy-Value Learning for Recommendation

Time: Tuesday 11-Aug-2020 16:00 (This is a past event.)


Artifacts

Motivation / Abstract
Beating offline metrics in Recommender System is challenging but the real question would be how effective is the model in online metrics. This paper utilizes logged data from a model to come up with a higher online evaluation scores
Questions Discussed
1) Using an existing policy (model) to increase online KPIs by applying dual bandit
2) Applying weighted average to maximize likelihood(MLE) and minimizing counterfactual risk management to ensure more personalized recommendations
3) Using recommended items (logged data) to improve personalization by applying a dual bandit setting to learn from unclicked recommendations
Stream Categories:
 Recommender Systems