Past Recording
Share
Star()
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Thursday Nov 19 2020 00:30 GMT
Please to join the live chat.
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Why This Is Interesting

The motivation of Q-BERT is to enable efficient deployment at the edge with lower inference and power consumption. Furthermore, enabling high accuracy inference at the edge would help with privacy of the user, since his/her data would not need to be transmitted to the cloud for inference.

Discussion Points
  • How can we systematically design efficient NN models without losing accuracy?
  • What are the key fundamentals of compressing the model with low precision and its impact on NN behaviour?
Takeaways
  1. Using Ultra-low precision will help reduce memory footprint without losing too much performance.
  2. Quantization will allow fast reference to real-life applications.
Time of Recording: Thursday Nov 19 2020 00:30 GMT