Dynamic Interaction Dilation for Interactive Human Parsing

IEEE Transactions on Multimedia(2023)

引用 0|浏览1
暂无评分
摘要
Interactive segmentation pursues generating high-quality pixel-level predictions with a few user-provided clicks, which is gaining attention for its convenience in segmentation data annotation. Users are allowed to iteratively refine the prediction by adding clicks until the result is satisfactory. Existing interactive methods usually transform the clicks into a set of localization maps by Euclidian distance computation or RGB texture extraction to guide the segmentation, which makes the click transformation a core module in interactive segmentation networks. However, when adopted in human images where large poses, occlusions, and bad illuminations are prevailing, prior transformation methods tend to cause uncorrectable overlapping across localization maps, i.e. , one click corresponds to multiple transformed values at the same position in different localization map channels, which are difficult to form a good match among human parts and limit the interaction efficiency. Furthermore, the inappropriately transformed information is hard to be refined with the static transformation manner, i.e. , based on the fixed formulas / RGB textures, which is out of tune with the dynamically refined interaction process. Hence, we design a dynamic transformation scheme for interactive human parsing (IHP) named Dynamic Interaction Dilation Net ( DID-Net ), which serves as an initial attempt to break the limitations of static transformation while capturing long-range dependencies of clicks within each human part. Specifically, we construct a Dynamic Dilation Module ( DD-Module ) to dilate clicks radially in several directions assisted by human body edge detection. The continually refined edges guide to improve the dilation quality in each interaction iteration, thereby better fitting user intention. Furthermore, we propose an Adaptive Interaction Excitation Block ( AIE-Block ) to exploit potential semantic clues buried in the dilated clicks and emphasize semantic expression for each human part by feature recalibration. Our DID-Net achieves state-of-the-art performance on 3 public human parsing benchmarks.
更多
查看译文
关键词
Human parsing,Interactive image segmentation,Semantic image segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要