Datasets For Indian Languages

14 Jun 2021

Some of the datasets I came across while browsing is shared below.
NER For South and South East Asian Langauges
Tab delimited Bi-lingual Pairs
Kaggle - Telugu NLP Dataset
COVID-19 Resources from Facebook and Google
Man ki Bath Parallel Corpus
Parallel Corpus via CrowdSourcing for 6 Indian Languages
HuggingFace Datasets for 11 Indian Languages
Multilingual Bible Parallel Corpus
Open Parallel Corpus
From Keon’s Github