Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

Journal of Data Mining and Digital Humanities, 2018.

Cited by: 1|Views13
EI

Abstract:

We propose a method for efficiently finding all parallel passages in a largecorpus, even if the passages are not quite identical due to rephrasing andorthographic variation. The key ideas are the representation of each word inthe corpus by its two most infrequent letters, finding matched pairs of stringsof four or five words that differ b...More

Code:

Data:

Your rating :
0

 

Tags
Comments