基本信息
浏览量:381
职业迁徙
个人简介
Research Interests
· data structures, particularly fast and space efficient structures
· the design, analysis and implementation of algorithms
· database systems and data warehousing, particularly efficiency issues
The main thrust of my current research deals with space efficient data structures. For example, several years ago we were interesting in developing highly efficient techniques for preprocessing large text files (in particular the Oxford English Dictionary). A tree based method was developed that permitted one to find (references to) all occurrences of a given phrase in what amounted to a scan of the phrase (independent of the size of the document). Unfortunately, the index was 4 or 5 times the size of the original file, and this was unacceptable. Most of this space was for pointers in a binary trie (or tree); for each byte of the original file were two pointers in the representation of the tree. There are about 4n trees on n nodes, so, at least in principle, one needs only 2n bits to represent an arbitrary tree, … not 2n pointers. The issue is that one requires a succinct representation that permits finding the children of a given node quickly. Several of my papers have focussed on this and related issues. Indeed we have found (practical) tree representations that take (virtually) the information theoretic minimum space and permit the navigational operations find_parent, find_child, find_subtree_size, etc. in constant time.
· data structures, particularly fast and space efficient structures
· the design, analysis and implementation of algorithms
· database systems and data warehousing, particularly efficiency issues
The main thrust of my current research deals with space efficient data structures. For example, several years ago we were interesting in developing highly efficient techniques for preprocessing large text files (in particular the Oxford English Dictionary). A tree based method was developed that permitted one to find (references to) all occurrences of a given phrase in what amounted to a scan of the phrase (independent of the size of the document). Unfortunately, the index was 4 or 5 times the size of the original file, and this was unacceptable. Most of this space was for pointers in a binary trie (or tree); for each byte of the original file were two pointers in the representation of the tree. There are about 4n trees on n nodes, so, at least in principle, one needs only 2n bits to represent an arbitrary tree, … not 2n pointers. The issue is that one requires a succinct representation that permits finding the children of a given node quickly. Several of my papers have focussed on this and related issues. Indeed we have found (practical) tree representations that take (virtually) the information theoretic minimum space and permit the navigational operations find_parent, find_child, find_subtree_size, etc. in constant time.
研究兴趣
论文共 290 篇作者统计合作学者相似作者
按年份排序按引用量排序主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
ISAACpp.18:1-18:19, (2023)
引用0浏览0EI引用
0
0
arxiv(2023)
引用0浏览0引用
0
0
String Processing and Information Retrievalpp.217-232, (2022)
Acta Informaticano. 6 (2022): 687-708
arXiv (Cornell University) (2021)
引用0浏览0引用
0
0
加载更多
作者统计
合作学者
合作机构
D-Core
- 合作者
- 学生
- 导师
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn