Processing image system for user head movement recognition

The aim of the paper is to describe a system for the head gesture recognition developed in the frame of INTEROB project which is financed by the research grant 131-CEEX-II03/02.10.2006. The goal of this project consists in developing an interaction based on gestures with information on robotic syste...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2009
1. Verfasser: Ungurean, Ciprian Ovidiu
Format: Artikel
Sprache:English
Veröffentlicht: Інститут фізики напівпровідників імені В.Є. Лашкарьова НАН України 2009
Schriftenreihe:Оптико-електронні інформаційно-енергетичні технології
Schlagworte:
Online Zugang:http://dspace.nbuv.gov.ua/handle/123456789/32212
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:Digital Library of Periodicals of National Academy of Sciences of Ukraine
Zitieren:Processing image system for user head movement recognition / Ciprian Ovidiu Ungurean // Оптико-електронні інформаційно-енергетичні технології. — 2009. — № 1 (17). — С. 32-36. — Бібліогр.: 20 назв. — англ.

Institution

Digital Library of Periodicals of National Academy of Sciences of Ukraine
id irk-123456789-32212
record_format dspace
spelling irk-123456789-322122012-04-15T12:15:12Z Processing image system for user head movement recognition Ungurean, Ciprian Ovidiu Методи та системи оптико-електронної і цифрової обробки зображень та сигналів The aim of the paper is to describe a system for the head gesture recognition developed in the frame of INTEROB project which is financed by the research grant 131-CEEX-II03/02.10.2006. The goal of this project consists in developing an interaction based on gestures with information on robotic systems. In the paper we discussed a method for controlling the mouse pointer movements on the screen by recognizing the operator head movements captured by a video camera. 2009 Article Processing image system for user head movement recognition / Ciprian Ovidiu Ungurean // Оптико-електронні інформаційно-енергетичні технології. — 2009. — № 1 (17). — С. 32-36. — Бібліогр.: 20 назв. — англ. 1681-7893 http://dspace.nbuv.gov.ua/handle/123456789/32212 en Оптико-електронні інформаційно-енергетичні технології Інститут фізики напівпровідників імені В.Є. Лашкарьова НАН України
institution Digital Library of Periodicals of National Academy of Sciences of Ukraine
collection DSpace DC
language English
topic Методи та системи оптико-електронної і цифрової обробки зображень та сигналів
Методи та системи оптико-електронної і цифрової обробки зображень та сигналів
spellingShingle Методи та системи оптико-електронної і цифрової обробки зображень та сигналів
Методи та системи оптико-електронної і цифрової обробки зображень та сигналів
Ungurean, Ciprian Ovidiu
Processing image system for user head movement recognition
Оптико-електронні інформаційно-енергетичні технології
description The aim of the paper is to describe a system for the head gesture recognition developed in the frame of INTEROB project which is financed by the research grant 131-CEEX-II03/02.10.2006. The goal of this project consists in developing an interaction based on gestures with information on robotic systems. In the paper we discussed a method for controlling the mouse pointer movements on the screen by recognizing the operator head movements captured by a video camera.
format Article
author Ungurean, Ciprian Ovidiu
author_facet Ungurean, Ciprian Ovidiu
author_sort Ungurean, Ciprian Ovidiu
title Processing image system for user head movement recognition
title_short Processing image system for user head movement recognition
title_full Processing image system for user head movement recognition
title_fullStr Processing image system for user head movement recognition
title_full_unstemmed Processing image system for user head movement recognition
title_sort processing image system for user head movement recognition
publisher Інститут фізики напівпровідників імені В.Є. Лашкарьова НАН України
publishDate 2009
topic_facet Методи та системи оптико-електронної і цифрової обробки зображень та сигналів
url http://dspace.nbuv.gov.ua/handle/123456789/32212
citation_txt Processing image system for user head movement recognition / Ciprian Ovidiu Ungurean // Оптико-електронні інформаційно-енергетичні технології. — 2009. — № 1 (17). — С. 32-36. — Бібліогр.: 20 назв. — англ.
series Оптико-електронні інформаційно-енергетичні технології
work_keys_str_mv AT ungureanciprianovidiu processingimagesystemforuserheadmovementrecognition
first_indexed 2025-07-03T12:44:30Z
last_indexed 2025-07-03T12:44:30Z
_version_ 1836629801917480960
fulltext 5 CIPRIAN OVIDIU UNGUREAN PROCESSING IMAGE SYSTEM FOR USER HEAD MOVEMENT RECOGNITION “Ştefan cel Mare” University of Suceava, University street 13, 720229, Romania, tel/fax.: +40.230524.801, E-mail: ungurean.ovidiu@gmail.com Abstract. The aim of the paper is to describe a system for the head gesture recognition developed in the frame of INTEROB project which is financed by the research grant 131-CEEX-II03/02.10.2006. The goal of this project consists in developing an interaction based on gestures with information on robotic systems. In the paper we discussed a method for controlling the mouse pointer movements on the screen by recognizing the operator head movements captured by a video camera. Keywords: head movement recognition, gesture and voice commands Adaboost algorithm. INTRODUCTION As the computer becomes more and more necessary, we will have to use different computing systems, miniature work stations, mobile and omnipresent but GUI interface types, especially with peripherals like mouse and keyboard won’t be able to stand up to future HCI requirements. The opportunity to create post-WIMP (window, icon, menu, pointer)[1] interfaces that can be represented through gesture interaction techniques, multimodal interfaces, tactile interfaces, virtual reality or augmented reality is being studied lately. These appear from the necessity to have support for a flexible and efficient interaction that can be easily learned and used in a natural and intuitive way. A system that receives both gesture and voice commands, in case one of the commands is not recognized then the other can substitute. Theoretically, information is efficiently transmitted if the entry modes are many and diverse. That's why the gene ral tendency is to optimize the entry for a certain number of applications but this isn’t a satisfying solution because it leads to the designing of new gesture entries for each application. Gestures must be complementary to each other in order to be used without causing confusion or adverse effects during the interaction with the system. Also if data is difficult to use, to understand or it is tiresome, the users will loose their motivation [2]. Gestures are a natural method of non-verbal communication and their implementation as a non-contact interface command system could bring efficiency in usage [3][4][5]. A method through which the user could easily communicate free and efficiently can be implemented by attaching a digital camera to provide the information for the system to process [6]. We propose to realize such an equipment to allow computer interaction using only head movements and thus to replace the use of mouse and keyboard. The article is organized as follows: • A short description of the user head movements recognition and interpretation system; • Using movement history in a stream of images; • Presentation of a performance testing method for the proposed approach and the obtained results; • Application evaluation in 2 games; • Conclusions. A SHORT DESCRIPTION OF THE USER HEAD MOVEMENT RECOGNITION AND INTERPRETATION SYSTEM Because it proved to be fast and efficient in different illumination and shadow conditions, we use in our approach the Adaboost algorithm for detecting the user’s frontal face. For user head tracking we have used the CAMShift method, after the face is detected with Adaboost. Knowing the user’s head position in every moment for movement recognition, we implemented the method that refers to “movement forms”. The proposed approach has as effect the successful detection, tracking and interpretation of human head movements in  CIPRIAN OVIDIU UNGUREAN, 2009 ПРИНЦИПОВІ КОНЦЕПЦІЇ ТА СТРУКТУРУВАННЯ РІЗНИХ РІВНІВ ОСВІТИ З ОПТИКО-ЕЛЕКТРОННИХ ІНФОРМАЦІЙНО- ЕНЕРГЕТИЧНИХ ТЕХНОЛОГІЙ 6 different light and shadow conditions of the surroundings or the face. Viola et al. [7][8] have used Haar features and a faster method for computing them in images at any scale within the integral image. Haar-like features are efficient in image classification because they encode spatial relations between different image regions. In face detection, these regions are defined by the contrast difference between the eyes and the cheeks or using contrast scales across the nose area and the eyes area. Fig. 1. Block diagram of Adaboost face detection stage A cascade of classifiers is a degenerated decisional tree in which at each stage a classifier will reject some background patterns and accept the front faces from the image. The selected features are sorted according to their importance in order to be used as a cascade of classifiers. Each classifier stage is trained using the machine learning algorithm Discrete Adaboost. It builds a strong classifier using a large set of weak classifiers. The speed of feature evaluation is important because the frontal face detection algorithm consists in sliding windows at all scales over the input image. Using this algorithm, user front face is detected in various varying illumination condition, occlusions or image noise. Because the classifier used by the Viola & Jones algorithm detects only frontal faces (when the user is looking at the webcam) and the purpose of our application is to interpret gestures caused by head translations and rotation, we implemented a tracking algorithm based on color blends - CAMShift (Continuously Adaptive MeanShift) [9] which is an extension of the MeanShift algorithm proposed by [10]. CAMShift is a tracking algorithm that uses a combination of colors (user face skin color, obtained when Adaboost was applied) thus being able to detect head position in video frames, no matter its orientation. CAMShift can be applied in dynamic changing distributions by resizing the next frame’s search window, considering the initial moment (zero-moment) the current frame. This algorithm can be used successfully in anticipating the object’s position in a sequence of video frames. Thus, the face position could be known in case the user is not looking directly towards the webcam, like when the head is nodding, tilting or shaking. After the front face is detected, we create the user’s skin color histogram in Hue Saturation channels from HSV color space [11]. The skin color histogram is used as a model for converting incoming pixels to a corresponding probability of skin image. After the face is detected through Adaboost, CAMShift is used in tracking. CAMShift is not very accurate, but it can tell us in which area the user’s face is. The search area on which Adaboost can be applied again, for an eventual search will be smaller than the entire frame received from the webcam, but larger than the window returned by CAMShift. Using this approach, we benefit from the advantages offered by the two methods. (efficiency in face detection, tracking speed). Method Time cost per frame Adaboost 57ms Adaboost + CAMShift 18ms These results were statistically obtained for a period of 5 minutes on a Windows XP system with an Intel Pentium M 1.7 GHz CPU. The conditions for which Adaboost will search for a face in the whole frame are the following: • the tracking window is too narrow or too tall; • the condition imposed by the ratio between height and width of the tracking window is not met, as to represent a face; • inclination of the ellipse that represents the face is horizontal. ПРИНЦИПОВІ КОНЦЕПЦІЇ ТА СТРУКТУРУВАННЯ РІЗНИХ РІВНІВ ОСВІТИ З ОПТИКО-ЕЛЕКТРОННИХ ІНФОРМАЦІЙНО- ЕНЕРГЕТИЧНИХ ТЕХНОЛОГІЙ 7 After the first frontal face detection in the video frame is done, I'm using anthropometric human head dimensions [12] (width, height) as a constant. Using this approach, gestures like moving towards and back from the webcam can be recognized. USING MOVEMENT HISTORY IN A STREAM OF IMAGES The main methods of movement recognition and interpretation consist of trajectory analysis of the moving parts [14][15][16], using the hidden state-space Markov model [16] or the active-passive model [18]. In [19] there exists an alternative for the gesture recognition methods through frame superposing that form a specific movement for the human behavior motion patterns. Looking at a movement in an unclear video sequence two reference points are obvious. The first is the spatial area in which the movement takes place. This reference point is given by the pixel area where something is significantly modifying, no matter the way in which it moves. The second reference point is the way in which the movement evolves within this area (i.e. an element which expands or which rotates in a certain location). The human visual system has developed its own ways to exploit these notions of how and where, and it captures sufficient movement proprieties to be used in recognition. Li et al [20] recognize out-of-plane head rotations as yaw and pitch using a detector-pyramid. In our approach, knowing the user’s head position in every moment, the MHO algorithm can be applied. Using this method, the advantage consists in the fact that the movement within a number of frames can be resumed in a single gradient image, and this model is not time related. The main problem in using this method consists in segmenting and labeling the silhouette that performs the movement. In our case the silhouette labeling is made using the Adaboost-CAMShift method. Most of the silhouette extraction approaches use background, optic flow or stereoscopic extraction. We chose the frame difference method. We calculate the direction of those components which define the movement, only if they lie within the face area. Making use of the movement direction detection from the gradient (motion gradient operation – MGO) [19] and the ellipsis which frames the face area applying CAMShift, it allows us to recognize translation, rotation and inclination head gestures. Fig. 2. a) Head gesture detection in down/left rectangle; b) Skin color probability - the brighter pixels have the highest probability to be skin pixels; c) MHI representation In the figure 2.b. the red ellipsis successfully marks the user’s face area due to applying the detection and tracking algorithm. Next to it in c. is an example of the frame with MHI applied. The red square marks the face area. Direction and magnitude are calculated only for components within the face area (green marked). The current frame movement direction, after summing the shift vectors, is represented in the lower right area. EXPERIMENTAL RESULTS To test the head movement gestures recognition rate we implemented a test application. At application start, we present a movie of a virtual character that performs a series of directional gestures using its head, and ПРИНЦИПОВІ КОНЦЕПЦІЇ ТА СТРУКТУРУВАННЯ РІЗНИХ РІВНІВ ОСВІТИ З ОПТИКО-ЕЛЕКТРОННИХ ІНФОРМАЦІЙНО- ЕНЕРГЕТИЧНИХ ТЕХНОЛОГІЙ 8 the user must reproduce them. The test application receives the webcam’s frame flow and recognizes user gestures. Accuracy correctly detected gestures is 74.2 %. Fig. 3. Results of the head tracking system In this graphic, the virtual character’s gestures are blue marked. Red marks gestures are recognized by the test application. In this case, in which the user must reproduce the virtual character’s movements, as noticed in the representation, a delay appears due to the necessary user time to recognize the virtual character’s gesture. INTEGRATION IN APPLICATIONS We also tested this approach’s efficiency in 2 computer-games: Pac-Man and Counter Strike 1.6. In the arcade game Pac-Man the user’s head movements direct de main character through the labyrinth to gather points and to avoid the ghosts. One of the users specified that after the test/evaluation period of the implementation, when he wanted to play the game at home, on his one PC, he would have wanted to be able to control the Pac- Man character using head gestures instead of directional keyboard buttons. Counter Strike is a FPS (first person shooter) game, in which movement through the virtual space is done using the directional keys and the point of view is modified through mouse movement. In our test application, a user head movement leads to a change in the point of view, i.e. when moving the head to the right, the point of view will move to the right. CONCLUSION When the users need to select an object of interest by pointing it using physical interfaces as mouse, keyboard, tracking or sensors, the action is carried out indirectly. A software tracker which can follow the user’s face, without any specialized equipment and in a completely passive and non-interfering way is a great success. Our tests have found that most users were able to control sample applications after a few minutes of practice. Perceptual user interfaces are inspired from real world and human-to-human interactions. Research in this field will allow people to use technology more efficient, natural and easy to learn. REFERENCES 1. A. van Dam, “Post-WIMP user interfaces”. Communications of the ACM, Vol. 40, No. 2, Pages 63-67, Feb. 1997. 2. S. Oviatt and W. Wahlster (eds.), Human-Computer Interaction (Special Issue on Multimodal Interfaces), Lawrence Erlbaum Associates, Volume 12, Numbers 1 & 2, 1997. 3. Radu Daniel VATAVU, Ştefan-Gheorghe PENTIUC, Christophe CHAILLO. On Natural Gestures for Interacting in Virtual Environments Advances in Electrical and Computer Engineering, Suceava, Romania ISSN 1582-7445, No 2/2005, volume 5 (12), pp. 72-79. 4. George MAHALU, Radu PENTIUC (2001) Acquisition and Processing System for the Photometry Parameters of The Bright Objects Advances in Electrical and Computer Engineering, Suceava, Romania, ISSN 1582-7445, No 1/2001, volume 1 (8), pp. 26-31. 5. Pentiuc, S.G., Vatavu, R., Cerlinca, T.I., and Ungureanu, O. “Methods and Algoritms for Gestures Recognition and Understanding”. The Eighth All-Ukrainian International Conference, UkrOBRAZ’2006, pp. 15-18, Ukraine, August 2006. 6. Keates S, Perricos C “Gesture as a means of computer access”. Communication Matters. 10. 1. 17-19 7. P. Viola and M. J. Jones, ”Rapid Object Detection using a Boosted Cascade of Simple Features”, Proceedings of IEEE Computer Society’s Computer Vision and Pattern Recognition (CVPR 2001), Vol. 1, pp. 511-518, 2001. 8. R.Lienhart, J.Maydt, “An Extended Set of Haar-like Features for Rapid Object Detection”, Intel ПРИНЦИПОВІ КОНЦЕПЦІЇ ТА СТРУКТУРУВАННЯ РІЗНИХ РІВНІВ ОСВІТИ З ОПТИКО-ЕЛЕКТРОННИХ ІНФОРМАЦІЙНО- ЕНЕРГЕТИЧНИХ ТЕХНОЛОГІЙ 9 Labs,2002, Intel. 9. G. R. Bradski. “Computer video face tracking for use in a perceptual user interface”. Intel Technology Journal, Q2 1998. 10. D. Comaniciu and P. Meer, “Robust Analysis of Feature Spaces: Color Image Segmentation,” CVPR’97, pp. 750-755. 11. T. S. Caetano, D. A. C. Barone, “A probabilistic model for human skin color”, IAP Conf. 2001, pp. 279-283. 12. Leslie G. Farkas, Jeffrey C. Posnick, Tania M. Hreczko, “Anthropometric Growth Study of the Head”, In: The Cleft Palate-Craniofacial Journal: Vol. 29, No. 4, pp. 303–308. 1992. 13. [Electronic resource] – Access mode: http://www.intel.com/technology/computing/opencv. 14. Polana R, Nelson R, “Low level recognition of human motion”. In Workshop on Motion of Nonrigid and Articulated Objects. pp 77–82, 1994. 15. Black M, Yacoob Y, “Tracking and recognizing rigid and nonrigid facial motions using local parametric model of image motion”. In: Proceedings International Conference Computer Vision, pp 374-381. 1995 16. Madabhushi A, Aggarwal J, “A Bayesian approach to human activity recognition”. In: Proceedings of IEEE Workshop on Visual Surveillance, pp 25–32. 1999. 17. Chen, F.S, Fu, C.M., Huang, C.L., “Hand gesture recognition using a real-time tracking method and hidden Markov models”, IVC(21), No. 8, August 2003, pp. 745-758. 18. Gary R. Bradski & James W. Davis Motion segmentation and pose recognition with motion history gradients Machine Vision and Applications (2002) 13: 174–184. 19. Cutler R, Davis L (2000) “Robust real-time periodic motion detection, analysis, and applications”. IEEE Trans Pattern Anal Mach Intel 22(8):781–796. 20. S. Z. Li, L. Zhu, Z. Q. Zhang, A. Blake, H. Zhang, and H. Shum. Statistical learning of multi-view face detection. In Proceedings of the European Conference on Computer Vision, volume 4, pages 67–81, Copenhagen, Denmark, May 28 - June 2 2002. Надійшла до редакції 11.11.2008р. OVIDIU CIPRIAN UNGUREAN - Ph.D student in Computer Science, "Stefan cel Mare" University of Suceava, Suceava, Romania.