Chrome Extension
WeChat Mini Program
Use on ChatGLM

Empirical Laws of Natural Language Processing for Hindi Language

crossref

Cited 0|Views2
No score
Abstract
Empirical laws are the statistical laws that describe the relation between entities in a large dataset. They are readily found in nature, and findings have been proven by observations [1]. The primary objective of this study is to verify some of the empirical laws such as Zipf’s law, Mandelbrot’s approximation, and Heap’s law for Hindi language corpus. This involves collecting a corpus, performing text normalization, tokenizing it to get a list of words, identifying word types and their frequency, sorting and ranking the data based on frequency, and representing the relation between the frequency and rank of the word types to validate Zipf’s law and Mandelbrot’s approximation. For Heap’s law, the relation between the number of word types and tokens for different subsets of the corpus is considered. Based on our observations, the Hindi language satisfies the laws mentioned above.
More
Translated text
Key words
natural language processing
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined