A pattern tree-based approach to learning URL normalization rules

WWW, pp. 611-620, 2010.

Cited by: 27|Bibtex|Views14|Links
EI WOS SCOPUS
Keywords:
pattern tree-based approachurl normalizationduplicate urlslocal duplicate pairurl pattern treeMore(8+)

Abstract:

Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs to a canonical form using a set of rewrite rules. Nowadays URL normalization has attracted significant attention as it is lightweight and can be flexibly integr...More

Code:

Data:

Your rating :
0

 

Tags
Comments