An Upper-Bound on Information Contained Within a Tweet.
TinyToCS(2012)
摘要
While tweets (and this paper) are limited to 140 characters, not all characters are created equal. This paper explores abuses of character encoding schemes to maximize the number of bits that can be conveyed by a tweet. In particular, since Twitter supports Unicode, we examine how we can abuse UTF8. For example, while people equate a Unicode codepoint with a character, some can be combined to form a single character. Does Twitter count these as one or two characters? Furthermore, some encodings (such as UTF8) allow more codepoints than are specied by Unicode { does Twitter accept these too? We ignore external links, embedded media, Twitter entities, and geotags, which are not universally supported.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络