Past Recording
ShareStar
Similarity Search for Efficient Active Learning and Search of Rare Concepts
Friday Nov 6 2020 17:00 GMT
Please to join the live chat.
Similarity Search for Efficient Active Learning and Search of Rare Concepts
Why This Is Interesting

Many active learning and search approaches are intractable for industrial settings with billions of unlabeled examples. Existing approaches, such as uncertainty sampling or information density, search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. However, in practice, data is often heavily skewed; only a small fraction of collected data will be relevant for a given learning task. For example, when identifying rare classes,detecting malicious content, or debugging model performance, the ratio of positive to negative examples can be 1 to 1,000 or more. In this work, we exploit this skew in large training datasets to reduce the number of unlabeled examples considered in each selection round by only looking at the nearest neighbors to the labeled examples.

Time of Recording: Friday Nov 6 2020 17:00 GMT