low-resource-nlp

Here are 79 public repositories matching this topic...

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

benchmark information-retrieval retrieval text-classification clustering sts semantic-search reranking text-embedding multimodal neural-search sentence-transformers sbert multilingual-nlp low-resource-nlp bitext-mining mteb

Updated Apr 23, 2026
Python

cisnlp / GlotLID

Star

[EMNLP 2023] 💬 Language Identification with Support for More Than 2000 Labels

language-detection multlingual language-detector language-recognition glot lid language-identification language-classification language-identification-toolkit low-resource-languages language-detection-library language-identifier language-detection-lib langid low-resource-nlp glotcc glotlid

Updated Apr 15, 2026
Python

adbar / simplemma

Sponsor

Star

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

nlp tokenizer language-detection wordlist lemmatizer morphological-analysis lemmatiser tokenization lemmatization corpus-tools language-identification low-resource-nlp

Updated Jun 6, 2025
Python

This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.

machine-translation neural-machine-translation parallel-corpus parallel-corpora bangla-nlp low-resource-languages bangla-machine-translation bangla-dataset-machine-translation emnlp-2020 low-resource-nlp low-resource-machine-translation

Updated Oct 23, 2024
Python

ljvmiranda921 / calamanCy

Star

NLP pipelines for Tagalog using spaCy

nlp machine-learning natural-language-processing spacy computational-linguistics ner low-resource-languages low-resource-nlp

Updated Jul 20, 2025
Python

afrisenti-semeval / afrisent-semeval-2023

Star

AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/

Updated Jan 10, 2024
Jupyter Notebook

KennethEnevoldsen / scandinavian-embedding-benchmark

Star

A Scandinavian Benchmark for sentence embeddings

nlp benchmark natural-language-processing low-resource-nlp scandinavian

Updated Dec 5, 2025
Python

231sm / Reasoning_In_EE

Star

Code and datasets for the ACL 2021 paper "OntoED: Low-resource Event Detection with Ontology Embedding"

information-extraction event-extraction low-resource low-resource-nlp ontoed

Updated Apr 19, 2022
Python

zjunlp / RAP

Star

[SIGIR 2023] Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction

Updated Apr 5, 2023
Python

hausanlp / NaijaSenti

Star

This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, Hausa, Yoruba and Pidgin.

nlp sentiment-analysis sentiment dataset african-languages yoruba hausa sentiment-classification nigeria yorubaname-dictionary igbo low-resource-languages igbo-language nigerian-data sentiment-data low-resource-nlp hausa-nlp hausanlp

Updated Oct 14, 2025
Jupyter Notebook

luciusssss / mc2_corpus

Star

[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)

multilingual natural-language-processing corpus mongolian tibetan tibetan-nlp uyghur kazakh low-resource-languages low-resource-nlp

Updated Jan 17, 2026
Python

NLP-Tutorials / AACL-IJCNLP2022-KGC-Tutorial

Star

Materials for AACL-IJCNLP-2022 tutorial: Efficient and Robust Knowledge Graph Construction

Updated Feb 3, 2023

luciusssss / ZhuangBench

Star

[ACL'24 Findings] Teaching Large Language Models an Unseen Language on the Fly

low-resource-languages zhuang low-resource-nlp large-language-models llm

Updated Jan 6, 2026
Python

kidist-amde / amharic-ir-benchmarks

Star

Official codebase for the ACL 2025 Findings paper: Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval.

information-retrieval bert bm25 passage-retrieval ndcg text-embedding amharic-corpus mrr roberta amharic-nlp huggingface-transformers colbert multilingual-nlp low-resource-nlp dense-retrieval amharic-language retrieval-evaluation academic-benchmark

Updated Jul 26, 2025
Jupyter Notebook

nicolay-r / awesome-sentiment-attitude-extraction

Star

A curated list of awesome sentiment analysis studies, in which attitude corresponds to the text position conveyed by Subject towards other Object mentioned in text such as: entities, events, etc.

nlp naacl machine-learning natural-language-processing awesome deep-learning sentiment-analysis trends awesome-list emnlp language-model aaai nips relation-classification state-of-the-art stance-detection low-resource-nlp sentiment-attitude-extraction chatgpt

Updated Mar 23, 2026

wannaphong / Awesome-Lao-NLP

Sponsor

Star

Awesome Lao Natural Language Processing

natural-language-processing awesome awesome-list lao low-resource low-resource-languages lao-language low-resource-nlp laonlp

Updated Mar 7, 2025

StefanHeng / ProgGen

Star

Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"

natural-language-processing named-entity-recognition data-generation few-shot-learning training-data-generation low-resource-nlp large-language-models efficient-nlp

Updated Mar 29, 2024
Python

csebuetnlp / banglaparaphrase

Star

This repository contains the code, data, and associated models of the paper titled "BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset", accepted in Proceedings of the Asia-Pacific Chapter of the Association for Computational Linguistics: AACL 2022.

paraphrase-generation bangla-nlp low-resource-nlp bangla-paraphrase

Updated Nov 14, 2022
Python

ijazul-haq / nlpashto

Star

Pashto Natural Language Processing Toolkit

nlp natural-language-processing transformers nltk bert pos-tagging persian-nlp tokenization sjtu arabic-nlp pashto urdu-nlp low-resource-nlp llms pashto-nlp pashto-word-segmentation pashto-bert pashto-word-embeddings pashto-text-classification

Updated May 21, 2025

davidschulte / hf-dataset-selector

Star

Find the best datasets for intermediate fine-tuning

nlp transfer-learning dataset-search huggingface hugging-face low-resource-nlp dataset-selection huggingface-datasets

Updated May 4, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the low-resource-nlp topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the low-resource-nlp topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low-resource-nlp

Here are 79 public repositories matching this topic...

embeddings-benchmark / mteb

cisnlp / GlotLID

adbar / simplemma

csebuetnlp / banglanmt

ljvmiranda921 / calamanCy

afrisenti-semeval / afrisent-semeval-2023

KennethEnevoldsen / scandinavian-embedding-benchmark

231sm / Reasoning_In_EE

zjunlp / RAP

hausanlp / NaijaSenti

luciusssss / mc2_corpus

NLP-Tutorials / AACL-IJCNLP2022-KGC-Tutorial

luciusssss / ZhuangBench

kidist-amde / amharic-ir-benchmarks

nicolay-r / awesome-sentiment-attitude-extraction

wannaphong / Awesome-Lao-NLP

StefanHeng / ProgGen

csebuetnlp / banglaparaphrase

ijazul-haq / nlpashto

davidschulte / hf-dataset-selector

Improve this page

Add this topic to your repo