The Universal Transformer combines the following key properties into one model: Weight sharing: Following intuitions behind weight sharing found in CNNs and recurrent neural networks, we extend the Transformer with a simple form of weight sharing that strikes an effective balance...
We empirically show that the global and the local memory pointer are able to effectively produce system responses even in the out-of-vocabulary scenario, and visualize how global memory pointer helps as well
We empirically showed that when dependency parsers are not available for certain languages such as code-mixed languages we can use word co-occurrence frequencies and positive-pointwise mutual information values to extract a contextual graph and use such a graph with Graph Convolu...
Categorize and detect one type of clinical annotations stored in the hospital Picture Archiving and Communication Systems system as a rich retrospective data source, to build a large-scale Radiology lesion image database
Our proposed measures and the analysis of strategies used by different publications and articles propose new directions for evaluating the difficulty of summarization tasks and for developing future summarization models
We propose a denoising distantly supervised open-domain question answering system which contains a paragraph selector to skim over paragraphs and a paragraph reader to perform an intensive reading on the selected paragraphs
Some SQuAD 2.0 questions are unlikely to be asked without significant foreknowledge of the context material and do not occur in QuAC. 4 Both SQuAD 2.0 and QuAC cover a significant number of unanswerable questions that could be plausibly in the article
We presented error type distribution by manually analyzing 100 bad responses sampled from Soft Typed Decoder and Hard Typed Decoder respectively, where bad means the response by our model is worse than that by some baseline during the pair-wise annotation