Covers: theory of NLP Evaluation Metrics
Estimated time needed to finish: 18 minutes
Questions this item addresses:
  • Why do we need evaluation metrics?
  • What are the different categories for natural language evaluation tasks?
  • How can data be divided into subsets for proper training and evaluation?
How to use this item?

Read pages 2-6 and 8-9

Author(s) / creator(s) / reference(s)
Philip Resnik, Jimmy Lin
0 comment
Recipe
publicShare
Star(0)

Nlp Evaluation Benchmarks

Contributors
Total time needed: ~3 hours
Objectives
Here, you will be able to learn about the need and different types of evaluation metrics in NLP tasks
Potential Use Cases
To use these evaluation metrics for judging the performance of our model
Who is This For ?
INTERMEDIATENLP users trying to evaluate models
Click on each of the following annotated items to see details.
Resources5/5
BOOK_CHAPTER 1. The Need for Benchmarks
  • Why do we need evaluation metrics?
  • What are the different categories for natural language evaluation tasks?
  • How can data be divided into subsets for proper training and evaluation?
18 minutes
ARTICLE 2. The BLEU Metric
  • What are the pros for the BLEU metric?
  • Where can BLEU scores be used?
  • How to calculate BLEU scores using Python and NLTK?
40 minutes
ARTICLE 3. The GLUE Metric
  • What is the need for GLUE?
  • Where can GLUE scores be used?
  • How to test models using GLUE?
30 minutes
ARTICLE 4. The ROUGE Metric
  • What is the ROUGE metric?
  • What are the different types of ROUGE scores?
  • How to compute ROUGE scores in Python?
20 minutes
ARTICLE 5. A Comparison of Various Evaluation Metric
  • What are the pros and cons of different NLP metrics?
30 minutes

Concepts Covered

0 comment