ENHANCING AMBIENT ASSISTED LIVING WITH MULTI-MODAL VISION AND LANGUAGE MODELS: A NOVEL APPROACH FOR REAL-TIME ABNORMAL BEHAVIOR DETECTION AND EMERGENCY RESPONSE

Zhiyenbayev, Adil

ENHANCING AMBIENT ASSISTED LIVING WITH MULTI-MODAL VISION AND LANGUAGE MODELS: A NOVEL APPROACH FOR REAL-TIME ABNORMAL BEHAVIOR DETECTION AND EMERGENCY RESPONSE

dc.contributor.author	Zhiyenbayev, Adil
dc.date.accessioned	2024-06-23T20:17:44Z
dc.date.available	2024-06-23T20:17:44Z
dc.date.issued	2024-04-28
dc.description.abstract	The global demographic forecast predicts a surge to over 1.9 billion individuals by 2050, escalating the demand for efficient healthcare delivery, particularly for the elderly and disabled, who frequently require caregiving due to prevalent mental and physical health issues. This demographic trend underscores the critical need for robust long-term care services and continuous monitoring systems. However, the efficacy of these solutions is often compromised by caregiver overload, financial constraints, and logistical challenges in transportation, necessitating advanced technological interventions. In response, researchers have been refining ambient assisted living (AAL) environments through the integration of human activity recognition (HAR) utilizing advanced machine learning (ML) and deep learning (DL) techniques. These methods aim to reduce emergency incidents and enhance early detection and intervention. Traditional sensor-based HAR systems, despite their utility, suffer from significant limitations, including high data variability, environmental interference, and contextual inadequacies. To address these issues, vision language models (VLMs) enhance detection accuracy by interpreting scene contexts via caption generation, visual question answering (VQA), commonsense reasoning, and action recognition. However, VLMs face challenges in real-time application scenarios due to language ambiguity and occlusions, which can degrade the detection accuracy. Large language models (LLMs) combined with text-to-speech (TTS) and speech-to-text (STT) technologies can facilitate direct communication with the individual and enable real-time interactive assessments of a situation. Integrating real-time conversational capabilities via LLM, TTS, and STT into VLM framework significantly improves the detection of abnormal behavior by leveraging a comprehensive scene understanding and direct patient feedback, thus enhancing the system's reliability. A qualitative evaluation showed high system usability results in a subjective questionnaire during real-time experiments with participants. A quantitative evaluation of the developed system demonstrated high performance, achieving detection accuracy and recall rates of 93.44\% and 95\%, respectively, and a specificity rate of 88.88\% in various emergency scenarios before interaction. After the interaction stage, the performance was boosted to 100\% accuracy due to increased context from user's responses. Furthermore, the system not only effectively identifies emergencies but also provides contextual summaries and actionable recommendations to caregivers and patients. The research introduces a multimodal framework that combines VLMs, LLMs, TTS, and STT for real-time abnormal behavior detection and assistance. This study aims to develop a comprehensive framework that overcomes traditional HAR and AAL limitations by integrating instructions-driven VLM, LLM, human detection, TTS, and STT modules to enhance emergency response efficiency in home environments. This innovative approach promises substantial advancements in the field of AAL by providing timely and context-aware detection and response in emergencies.	en_US
dc.identifier.citation	Zhiyenbayev, A. (2024). Enhancing Ambient Assisted Living: Multi-Modal Vision and Language Models for Real-Time Emergency Response. Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.identifier.uri	http://nur.nu.edu.kz/handle/123456789/7973
dc.language.iso	en	en_US
dc.publisher	Nazarbayev University School of Engineering and Digital Sciences	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Type of access: Restricted	en_US
dc.subject	Ambient assisted living	en_US
dc.subject	Human activity recognition	en_US
dc.subject	Vision-language models	en_US
dc.subject	Large language models	en_US
dc.subject	Speech models	en_US
dc.subject	Prompt engineering	en_US
dc.title	ENHANCING AMBIENT ASSISTED LIVING WITH MULTI-MODAL VISION AND LANGUAGE MODELS: A NOVEL APPROACH FOR REAL-TIME ABNORMAL BEHAVIOR DETECTION AND EMERGENCY RESPONSE	en_US
dc.type	Master's thesis	en_US
workflow.import.source	science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Thesis_A.Z..pdf
Size:: 2.92 MB
Format:: Adobe Portable Document Format
Description:: Master`s thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.28 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and Dissertations