Using transformer-based models for Vietnamese language detection
Fig 2
Data acquisition flowchart detailing the automated crawling mechanism, including link validation, content extraction, HTML cleaning, and deduplication logic for constructing the raw corpus.