An Efficiency-Optimized Framework for Sequential Image Generation with Stable Diffusion Models

Authors

  • C. Beulah Christalin Latha
  • S.V. Evangelin Sonia
  • G. Linda Rose
  • Ben M. Jebin
  • Christhya Joseph
  • G. Naveen Sundar

DOI:

https://doi.org/10.52783/jns.v14.2260

Keywords:

Diffusion Models, Image Generation, Stable Diffusion, Text-to-Image, Storyboarding

Abstract

Diffusion models are a type of Generative Artificial Intelligence models that create data by adding noise to the data and gradually removing the noise to generate synthetic data. This research work focuses on visual content generation based on text input. We are using a computationally efficient variant of diffusion models namely, a stable diffusion model to create images from text. The novelty of this research work focuses on sequential image generation from text prompts. This approach has been successfully implemented to create visual storyboards, and diverse, contextually coherent images from text prompts. Hyperparameters have been fine-tuned to generate high-quality and context-aware images from text prompts.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas, StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks in Proceedings of IEEE International Conference on Computer Vision, ICCV 2017.

Sawant, Ronit and Shaikh, Asadullah and Sabat, Sunil and Bhole, Varsha, Text to Image Generation using GAN (July 8, 2021). Proceedings of the International Conference on IoT Based Control Networks & Intelligent Systems - ICICNIS 2021, http://dx.doi.org/10.2139/ssrn.3882570

Ramzan, S., Iqbal, M. M., & Kalsum, T. (2022). Text-to-Image Generation Using Deep Learning. Engineering Proceedings, 20(1), 16. https://doi.org/10.3390/engproc2022020016

Rao, Abhishek & Bhandarkar, P & Devanand, Padmashali & Shankar, Pratheek & Shanti, Srinivas & Pai B H, Karthik. (2023). Text to Photo-Realistic Image Synthesis using Generative Adversarial Networks. 1-6. 10.1109/INCOFT60753.2023.10425482.

L. Indira, M. Sunil, M. Vamshidhar, Ravi Teja, R. V. Praneeth. (2023). Text to Image Generation using GAN, International Research Journal of Engineering and Technology, 10(5), pp. 1479-1484.

Gao, X., Fu, Y., Jiang, X., Wu, F., Zhang, Y., Fu, T., Li, C., & Pei, J. (2025). RSVQ-Diffusion Model for Text-to-Remote-Sensing Image Generation. Applied Sciences, 15(3), 1121. https://doi.org/10.3390/app15031121

Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu. (2024). Reparameterizing and dynamically quantizing image features for image generation. Pattern Recognition. 146, https://doi.org/10.1016/j.patcog.2023.109962

A. Rauniyar, A. Raj, A. Kumar, A. K. Kandu, A. Singh and A. Gupta, "Text to Image Generator with Latent Diffusion Models," 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN), Ghaziabad, India, 2023, pp. 144-148, doi: 10.1109/CICTN57981.2023.10140348.

Huan Li, Feng Xu, Zheng Lin, ET-DM. (2023). Text to image via diffusion model with efficient Transformer, Displays, 80, https://doi.org/10.1016/j.displa.2023.102568

X. Hu et al., "Diffusion Model for Image Generation - A Survey," 2023 2nd International Conference on Artificial Intelligence, Human-Computer Interaction and Robotics (AIHCIR), Tianjin, China, 2023, pp. 416-424, doi: 10.1109/AIHCIR61661.2023.00073. keywords: {Surveys;Human computer interaction;Image synthesis;Superresolution;Robots;Research and development;Generative AI;AIGC;Diffusion Model;image generation;diffusion application},

Sebaq, A., ElHelw, M. RSDiff: remote sensing image generation from text using diffusion model. (2024). Neural Comput & Applic 36, 23103–23111 (2024). https://doi.org/10.1007/s00521-024-10363-3

Renato Sortino, Simone Palazzo, Francesco Rundo, Concetto Spampinato. (2023). Transformer-based image generation from scene graphs, Computer Vision and Image Understanding, 233, https://doi.org/10.1016/j.cviu.2023.103721

Zhang, B., Gu, S., Zhang, B., Bao, J., Chen, D., Wen, F., Wang, Y., & Guo, B. (2022). StyleSwin: Transformer-based GAN for high-resolution image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11304–11314).

S. R. Dubey and S. K. Singh, "Transformer-Based Generative Adversarial Networks in Computer Vision: A Comprehensive Survey," in IEEE Transactions on Artificial Intelligence, vol. 5, no. 10, pp. 4851-4867, Oct. 2024, doi: 10.1109/TAI.2024.3404910.

S. Naveen, M. S. S Ram Kiran, M. Indupriya, T.V. Manikanta, P.V. Sudeep. (2021). Transformer models for enhancing AttnGAN based text to image generation, Image and Vision Computing, 115, https://doi.org/10.1016/j.imavis.2021.104284

Downloads

Published

2025-03-17

How to Cite

1.
Christalin Latha CB, Evangelin Sonia S, Rose GL, Jebin BM, Joseph C, Sundar GN. An Efficiency-Optimized Framework for Sequential Image Generation with Stable Diffusion Models. J Neonatal Surg [Internet]. 2025Mar.17 [cited 2025Sep.21];14(6S):497-504. Available from: https://jneonatalsurg.com/index.php/jns/article/view/2260