Implementación de un prototipo de consulta de vehículos en videos urbanos Integrando Modelos Multimodales y RAG

Piedra Narváez, José David

Please use this identifier to cite or link to this item: http://dspace.utpl.edu.ec/handle/29.500.19856/76521

Title:	Implementación de un prototipo de consulta de vehículos en videos urbanos Integrando Modelos Multimodales y RAG
Authors:	Piedra Narváez, José David
Director:	Barba Guamán, Luis Rodrigo
Keywords:	Ecuador. Tesis digital.
Issue Date:	2026
Citation:	Piedra Narváez, J. D. Barba Guamán, L. R. (2026) Implementación de un prototipo de consulta de vehículos en videos urbanos Integrando Modelos Multimodales y RAG [Tesis de Grado, Universidad Técnica Particular de Loja]. Repositorio Institucional. https://dspace.utpl.edu.ec/handle/29.500.19856/76521
Abstract:	Abstract: A prototype was developed and implemented that allows natural language queries of vehicular events within urban videos captured from drones, integrating computer vision detection and augmented generation through retrieval. The system detects vehicles with YOLO, records class and location, attaches frames as evidence, and converts events into embeddings that are stored in a vector database to retrieve and rank relevant information. Consequently, with the context retrieved, a multimodal language model generates explanatory and substantiated responses, reducing hallucinations through groundedness and the use of visual evidence. For this purpose, an XP methodology, a large dataset of labeled aerial images, and five test videos in different resolutions and lighting conditions were used. The evaluation showed that YOLOv8 achieved the best balance between accuracy and coverage compared to YOLOv5 and YOLOv11, validating the feasibility of the approach for near-real-time urban analysis.
Description:	Resumen: Se desarrolló e implementó un prototipo que permite consultar en lenguaje natural eventos vehiculares dentro de videos urbanos capturados desde dron, integrando detección de visión por computadora y generación aumentada por recuperación. El sistema detecta vehículos con YOLO, registra clase y ubicación, adjunta fotogramas como evidencia y convierte los eventos en embeddings que se almacenan en una base vectorial para recuperar y rankear información relevante. Por consiguiente, con el contexto recuperado, un modelo de lenguaje multimodal genera respuestas explicativas y fundamentadas, aminorando alucinaciones mediante groundedness y el uso de evidencia visual. Por eso se empleó una metodología XP, un dataset amplio de imágenes aéreas etiquetadas y cinco videos de prueba en distintas resoluciones y condiciones de iluminación. La evaluación mostró que YOLOv8 alcanzó el mejor equilibrio entre precisión y cobertura frente a YOLOv5 y YOLOv11, validando la viabilidad del enfoque para análisis urbano en tiempo cercano a real.
Identifier :	Cobarc: 1379484
URI:	https://bibliotecautpl.utpl.edu.ec/cgi-bin/abnetclwo?ACC=DOSEARCH&xsqf99=151179.TITN.
Type:	bachelorThesis
Appears in Collections:	Titulación de Sistemas Informáticos y Computación

Files in This Item:

b78ed57c-acf2-4374-8a56-76c6160e56dd

Show full item record