An Efficiency-Optimized Framework for Sequential Image Generation with Stable Diffusion Models
DOI:
https://doi.org/10.52783/jns.v14.2260Keywords:
Diffusion Models, Image Generation, Stable Diffusion, Text-to-Image, StoryboardingAbstract
Diffusion models are a type of Generative Artificial Intelligence models that create data by adding noise to the data and gradually removing the noise to generate synthetic data. This research work focuses on visual content generation based on text input. We are using a computationally efficient variant of diffusion models namely, a stable diffusion model to create images from text. The novelty of this research work focuses on sequential image generation from text prompts. This approach has been successfully implemented to create visual storyboards, and diverse, contextually coherent images from text prompts. Hyperparameters have been fine-tuned to generate high-quality and context-aware images from text prompts.
Downloads
Metrics
References
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas, StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks in Proceedings of IEEE International Conference on Computer Vision, ICCV 2017.
Sawant, Ronit and Shaikh, Asadullah and Sabat, Sunil and Bhole, Varsha, Text to Image Generation using GAN (July 8, 2021). Proceedings of the International Conference on IoT Based Control Networks & Intelligent Systems - ICICNIS 2021, http://dx.doi.org/10.2139/ssrn.3882570
Ramzan, S., Iqbal, M. M., & Kalsum, T. (2022). Text-to-Image Generation Using Deep Learning. Engineering Proceedings, 20(1), 16. https://doi.org/10.3390/engproc2022020016
Rao, Abhishek & Bhandarkar, P & Devanand, Padmashali & Shankar, Pratheek & Shanti, Srinivas & Pai B H, Karthik. (2023). Text to Photo-Realistic Image Synthesis using Generative Adversarial Networks. 1-6. 10.1109/INCOFT60753.2023.10425482.
L. Indira, M. Sunil, M. Vamshidhar, Ravi Teja, R. V. Praneeth. (2023). Text to Image Generation using GAN, International Research Journal of Engineering and Technology, 10(5), pp. 1479-1484.
Gao, X., Fu, Y., Jiang, X., Wu, F., Zhang, Y., Fu, T., Li, C., & Pei, J. (2025). RSVQ-Diffusion Model for Text-to-Remote-Sensing Image Generation. Applied Sciences, 15(3), 1121. https://doi.org/10.3390/app15031121
Mingzhen Sun, Weining Wang, Xinxin Zhu, Jing Liu. (2024). Reparameterizing and dynamically quantizing image features for image generation. Pattern Recognition. 146, https://doi.org/10.1016/j.patcog.2023.109962
A. Rauniyar, A. Raj, A. Kumar, A. K. Kandu, A. Singh and A. Gupta, "Text to Image Generator with Latent Diffusion Models," 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN), Ghaziabad, India, 2023, pp. 144-148, doi: 10.1109/CICTN57981.2023.10140348.
Huan Li, Feng Xu, Zheng Lin, ET-DM. (2023). Text to image via diffusion model with efficient Transformer, Displays, 80, https://doi.org/10.1016/j.displa.2023.102568
X. Hu et al., "Diffusion Model for Image Generation - A Survey," 2023 2nd International Conference on Artificial Intelligence, Human-Computer Interaction and Robotics (AIHCIR), Tianjin, China, 2023, pp. 416-424, doi: 10.1109/AIHCIR61661.2023.00073. keywords: {Surveys;Human computer interaction;Image synthesis;Superresolution;Robots;Research and development;Generative AI;AIGC;Diffusion Model;image generation;diffusion application},
Sebaq, A., ElHelw, M. RSDiff: remote sensing image generation from text using diffusion model. (2024). Neural Comput & Applic 36, 23103–23111 (2024). https://doi.org/10.1007/s00521-024-10363-3
Renato Sortino, Simone Palazzo, Francesco Rundo, Concetto Spampinato. (2023). Transformer-based image generation from scene graphs, Computer Vision and Image Understanding, 233, https://doi.org/10.1016/j.cviu.2023.103721
Zhang, B., Gu, S., Zhang, B., Bao, J., Chen, D., Wen, F., Wang, Y., & Guo, B. (2022). StyleSwin: Transformer-based GAN for high-resolution image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11304–11314).
S. R. Dubey and S. K. Singh, "Transformer-Based Generative Adversarial Networks in Computer Vision: A Comprehensive Survey," in IEEE Transactions on Artificial Intelligence, vol. 5, no. 10, pp. 4851-4867, Oct. 2024, doi: 10.1109/TAI.2024.3404910.
S. Naveen, M. S. S Ram Kiran, M. Indupriya, T.V. Manikanta, P.V. Sudeep. (2021). Transformer models for enhancing AttnGAN based text to image generation, Image and Vision Computing, 115, https://doi.org/10.1016/j.imavis.2021.104284
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.