Controllable Text-to-Image Generation

Like
0 Likes

Controllable Text-to-Image Generation

Last updated on April 9, 2021, 7:34 a.m. by tarush

Summary of research paper and important sentences

Introduction:

Generating realistic images that semantically match given text descriptions is a challenging problem and has tremendous potential applications, such as image editing, video games, and computer-aided design. When the given text description is changed, corresponding visual attributes of the bird are modified, but other unrelated attributes are changed as well. This is typically undesirable in real-world applications, when a user wants to further modify the synthetic image to satisfy her preferences. The authors focus on modifying visual attributes of objects in the generated images by changing given text descriptions. The authors propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can synthesise high-quality images, and allow the user to manipulate objects’ attributes, without affecting the generation of other content

Methods:

The authors adopt the Inception Score to evaluate the quality and diversity of the generated images. As the Inception Score cannot reflect the relevance between an image and a text description, the authors utilise R-precision to measure the correlation between a generated image and its corresponding text. The authors compare the top-1 text-to-image retrieval accuracy on the CUB and COCO datasets

Results:

Experimental results on the CUB and COCO datasets show that the method outperforms existing state of the art both qualitatively and quantitatively.

Conclusion:

The authors have proposed a controllable generative adversarial network (ControlGAN), which can generate and manipulate the generation of images based on natural language descriptions. The authors' ControlGAN can successfully disentangle different visual attributes and allow parts of the synthetic image to be manipulated accurately, while preserving the generation of other content. Extensive experimental results demonstrate the effectiveness and superiority of the method on two benchmark datasets

Important Sentences

Generating realistic images that semantically match given text descriptions is a challenging problem and has tremendous potential applications, such as image editing, video games, and computer-aided design
Thanks to the success of generative adversarial networks (GANs) in generating realistic images, text-to-image generation has made remarkable progress by implementing conditional GANs, which are able to generate realistic images conditioned on given text descriptions
We propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can synthesise high-quality images, and allow the user to manipulate objects’ attributes, without affecting the generation of other content

The third component is the adoption of the perceptual loss in text-to-image generation, which can reduce the randomness involved in the generation, and enforce the generator to preserve visual appearance related to the unmodified text

We have proposed a controllable generative adversarial network (ControlGAN), which can generate and manipulate the generation of images based on natural language descriptions

Three novel components are introduced in our model: 1) the word-level spatial and channel-wise attention-driven generator can effectively disentangle different visual attributes, 2) the word-level discriminator provides the generator with fine-grained training signals related to each visual attribute, and 3) the adoption of perceptual loss reduces the randomness involved in the generation, and enforces the generator to reconstruct content related to unmodified text

by tarush

Gyaanibuddy

Controllable Text-to-Image Generation

Suggested Posts

Suggested Tags