Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus
Journal of Data Mining and Digital Humanities, 2018.
We propose a method for efficiently finding all parallel passages in a largecorpus, even if the passages are not quite identical due to rephrasing andorthographic variation. The key ideas are the representation of each word inthe corpus by its two most infrequent letters, finding matched pairs of stringsof four or five words that differ b...More
PPT (Upload PPT)