PAPERAutomatic Detection and Language Identification of Multilingual Documents

Covers: theory of Language Identification
Estimated time needed: 25 minutes
Questions this item adddesses:
  • What are current methods for language detection in multilingual documents?
How to use this item?

Skim the paper, especially sections 1, 2, 8, and 9.

Author(s) / creator(s) / reference(s)
Marco Lui, Jey Han Lau, and Timothy Baldwin
Shortlist
publicShare

Overview of Language Identification

Yan NusinovichTotal time needed: ~28 minutes
Learning Objectives
Provide an introduction to language identification.
Potential Use Cases
People who want to implement their first language identification model.
Target Audience
BEGINNERNewcomers to natural language processing and text classification.
Go through the following annotated items in order:
ARTICLE 1. Language Detection
  • What is language detection?
3 minutes
ARTICLE 2. Benchmarking Language Detection for NLP
  • What are some existing tools for language detection in Python?
10 minutes
PAPER 3. Automatic Detection and Language Identification of Multilingual Documents
  • What are current methods for language detection in multilingual documents?
25 minutes

Concepts Convered