Image-to-Audio Captioning Systems for Visually Impaired Users: Development and Application

Authors

  • Pham Duc Hau Amity School Of Engineering & Technology, Greater Noida Amity University, Greater Noida, Uttar Pradesh, India

DOI:

https://doi.org/10.53469/jrse.2026.08(02).13

Keywords:

Object detection, deep learning, image processing, text extraction, speech synthesis, image captioning, ResNet - LSTM, accessibility, visually impaired, future scope

Abstract

This article provides a comprehensive overview of the evolving landscape of image captioning, with a focus on its applications in accessibility for the visually impaired. It explores the challenges of real - time object recognition, traditional object detection methods, and the transformative impact of deep learning techniques, particularly those employing region proposal object detection algorithms. The paper introduces Vision Voice, a groundbreaking web application that converts text extracted from images into natural - sounding speech. The article details the image processing pipeline, including preprocessing, segmentation, classification, and post - processing stages. It also delves into the mathematical concepts, image preprocessing techniques, and shortcomings of existing models. The study highlights the ResNet - LSTM models significant potential in generating descriptive and contextually coherent image captions, improving the quality of synthesized speech. Moreover, it discusses the future scope of the VisionVoice project, emphasizing the potential for continued advancements in accuracy, hardware capabilities, and the development of full Image - Speech conversion systems. The ultimate goal is to revolutionize accessibility and inclusion, providing visually impaired individuals with better access to information and a higher quality of life.

Downloads

Published

2026-02-22

How to Cite

Hau, P. D. (2026). Image-to-Audio Captioning Systems for Visually Impaired Users: Development and Application. Journal of Research in Science and Engineering, 8(2), 53–57. https://doi.org/10.53469/jrse.2026.08(02).13

Issue

Section

Articles

Deprecated: json_decode(): Passing null to parameter #1 ($json) of type string is deprecated in /www/bryanhousepub/ojs/plugins/generic/citations/CitationsPlugin.inc.php on line 49