IJSEA Volume 15 Issue 5

POS Tagging and Instance-Based Morphological Analysis of Maithili Language: Bridging Low-Resource NLP with Computational Linguistics

Anu Priya, Md. Irfan Alam
10.7753/IJSEA1505.1008
keywords : Maithili Language, Morphological Analysis, Natural Language Processing, Instance-Based Learning, POS Tagging, Low-Resource Languages.

PDF
The development of robust Natural Language Processing (NLP) tools for Indo-Aryan languages is a critical necessity given their rich linguistic diversity and complex morphological structures; however, languages like Maithili remain significantly underserved due to their low-resource status. This research addresses this gap by presenting a detailed Part-of-Speech (POS) tagging and morphological analysis of the Maithili language, utilizing an Instance-Based Learning (IBL) framework to bridge the divide between traditional computational linguistics and modern machine learning. POS tagging—the process of assigning grammatical categories like nouns, verbs, and adjectives to tokens—serves as a foundational challenge that is exacerbated in Maithili by its highly inflectional nature. By performing morphological analysis, this study identifies the internal structure of words by decomposing them into morphemes, which are essential for understanding word formation and supporting downstream tasks such as lemmatization and machine translation. The methodology employs a "lazy learning" approach through IBL, which is particularly effective for low-resource scenarios as it classifies new linguistic instances based on their similarity to a stored dataset rather than requiring the massive corpora demanded by deep learning architectures. Experimental evaluation was conducted on a curated dataset comprising 201 sentences and 402 tokens, through which unique suffix patterns and morphological variations specific to the Maithili dialect were identified. Despite the inherent challenges of resource scarcity, the proposed IBL model achieved a promising accuracy of 70.71%. These results demonstrate the effectiveness of instance-based classification in capturing the nuances of Maithili’s grammatical features, providing a vital computational baseline for future research. Ultimately, this work contributes to the digital preservation of Maithili and offers a scalable methodology for applying computational techniques to other morphologically complex, low-resource languages within the Indian subcontinent.
@artical{a1552026ijsea15051008,
Title = "POS Tagging and Instance-Based Morphological Analysis of Maithili Language: Bridging Low-Resource NLP with Computational Linguistics",
Journal ="International Journal of Science and Engineering Applications (IJSEA)",
Volume = "15",
Issue ="5",
Pages ="69 - 74",
Year = "2026",
Authors ="Anu Priya, Md. Irfan Alam "}