Last updated on April 9, 2021, 7:34 a.m. by tarush
Introduction:
Generating realistic images that semantically match given text descriptions is a challenging problem and has tremendous potential applications, such as image editing, video games, and computer-aided design. When the given text description is changed, corresponding visual attributes of the bird are modified, but other unrelated attributes are changed as well. This is typically undesirable in real-world applications, when a user wants to further modify the synthetic image to satisfy her preferences. The authors focus on modifying visual attributes of objects in the generated images by changing given text descriptions. The authors propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can synthesise high-quality images, and allow the user to manipulate objects’ attributes, without affecting the generation of other content
Methods:
The authors adopt the Inception Score to evaluate the quality and diversity of the generated images. As the Inception Score cannot reflect the relevance between an image and a text description, the authors utilise R-precision to measure the correlation between a generated image and its corresponding text. The authors compare the top-1 text-to-image retrieval accuracy on the CUB and COCO datasets
Results:
Experimental results on the CUB and COCO datasets show that the method outperforms existing state of the art both qualitatively and quantitatively.
Conclusion:
The authors have proposed a controllable generative adversarial network (ControlGAN), which can generate and manipulate the generation of images based on natural language descriptions. The authors' ControlGAN can successfully disentangle different visual attributes and allow parts of the synthetic image to be manipulated accurately, while preserving the generation of other content. Extensive experimental results demonstrate the effectiveness and superiority of the method on two benchmark datasets
Important Sentences