Flexible Processing and Classification for eDiscovery.

Frontiers in Artificial Intelligence and Applications(2013)

引用 1|浏览12
暂无评分
摘要
A high-performance, scalable text processing pipeline for eDiscovery is outlined. The classification module of the pipeline is based on the random forest model which is fast, flexible and allows for relevance scoring and feature importance coupled with high-accuracy results. The feature selection approach combines natural language processing with legal domain input, and is based on regular expressions, which allows for linguistic variation and subtle fine-tuning. These two components of the pipeline are described in some detail. Briefly discussed are a number of the other features, which include relevance hypothesis testing, deduping and social communication network analysis.
更多
查看译文
关键词
eDiscovery,text classification,regular expressions,machine learning,random forest
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要