Diffusion-based Data Augmentation for Object Counting Problems
CoRR(2024)
Abstract
Crowd counting is an important problem in computer vision due to its wide
range of applications in image understanding. Currently, this problem is
typically addressed using deep learning approaches, such as Convolutional
Neural Networks (CNNs) and Transformers. However, deep networks are data-driven
and are prone to overfitting, especially when the available labeled crowd
dataset is limited. To overcome this limitation, we have designed a pipeline
that utilizes a diffusion model to generate extensive training data. We are the
first to generate images conditioned on a location dot map (a binary dot map
that specifies the location of human heads) with a diffusion model. We are also
the first to use these diverse synthetic data to augment the crowd counting
models. Our proposed smoothed density map input for ControlNet significantly
improves ControlNet's performance in generating crowds in the correct
locations. Also, Our proposed counting loss for the diffusion model effectively
minimizes the discrepancies between the location dot map and the crowd images
generated. Additionally, our innovative guidance sampling further directs the
diffusion process toward regions where the generated crowd images align most
accurately with the location dot map. Collectively, we have enhanced
ControlNet's ability to generate specified objects from a location dot map,
which can be used for data augmentation in various counting problems. Moreover,
our framework is versatile and can be easily adapted to all kinds of counting
problems. Extensive experiments demonstrate that our framework improves the
counting performance on the ShanghaiTech, NWPU-Crowd, UCF-QNRF, and TRANCOS
datasets, showcasing its effectiveness.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined