Processing image system for user head movement recognition
The aim of the paper is to describe a system for the head gesture recognition developed in the frame of INTEROB project which is financed by the research grant 131-CEEX-II03/02.10.2006. The goal of this project consists in developing an interaction based on gestures with information on robotic syste...
Gespeichert in:
Datum: | 2009 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | English |
Veröffentlicht: |
Інститут фізики напівпровідників імені В.Є. Лашкарьова НАН України
2009
|
Schriftenreihe: | Оптико-електронні інформаційно-енергетичні технології |
Schlagworte: | |
Online Zugang: | http://dspace.nbuv.gov.ua/handle/123456789/32212 |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Назва журналу: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
Zitieren: | Processing image system for user head movement recognition / Ciprian Ovidiu Ungurean // Оптико-електронні інформаційно-енергетичні технології. — 2009. — № 1 (17). — С. 32-36. — Бібліогр.: 20 назв. — англ. |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraineid |
irk-123456789-32212 |
---|---|
record_format |
dspace |
spelling |
irk-123456789-322122012-04-15T12:15:12Z Processing image system for user head movement recognition Ungurean, Ciprian Ovidiu Методи та системи оптико-електронної і цифрової обробки зображень та сигналів The aim of the paper is to describe a system for the head gesture recognition developed in the frame of INTEROB project which is financed by the research grant 131-CEEX-II03/02.10.2006. The goal of this project consists in developing an interaction based on gestures with information on robotic systems. In the paper we discussed a method for controlling the mouse pointer movements on the screen by recognizing the operator head movements captured by a video camera. 2009 Article Processing image system for user head movement recognition / Ciprian Ovidiu Ungurean // Оптико-електронні інформаційно-енергетичні технології. — 2009. — № 1 (17). — С. 32-36. — Бібліогр.: 20 назв. — англ. 1681-7893 http://dspace.nbuv.gov.ua/handle/123456789/32212 en Оптико-електронні інформаційно-енергетичні технології Інститут фізики напівпровідників імені В.Є. Лашкарьова НАН України |
institution |
Digital Library of Periodicals of National Academy of Sciences of Ukraine |
collection |
DSpace DC |
language |
English |
topic |
Методи та системи оптико-електронної і цифрової обробки зображень та сигналів Методи та системи оптико-електронної і цифрової обробки зображень та сигналів |
spellingShingle |
Методи та системи оптико-електронної і цифрової обробки зображень та сигналів Методи та системи оптико-електронної і цифрової обробки зображень та сигналів Ungurean, Ciprian Ovidiu Processing image system for user head movement recognition Оптико-електронні інформаційно-енергетичні технології |
description |
The aim of the paper is to describe a system for the head gesture recognition developed in the frame of INTEROB project which is financed by the research grant 131-CEEX-II03/02.10.2006. The goal of this project consists in developing an interaction based on gestures with information on robotic systems. In the paper we discussed a method for controlling the mouse pointer movements on the screen by recognizing the operator head movements captured by a video camera. |
format |
Article |
author |
Ungurean, Ciprian Ovidiu |
author_facet |
Ungurean, Ciprian Ovidiu |
author_sort |
Ungurean, Ciprian Ovidiu |
title |
Processing image system for user head movement recognition |
title_short |
Processing image system for user head movement recognition |
title_full |
Processing image system for user head movement recognition |
title_fullStr |
Processing image system for user head movement recognition |
title_full_unstemmed |
Processing image system for user head movement recognition |
title_sort |
processing image system for user head movement recognition |
publisher |
Інститут фізики напівпровідників імені В.Є. Лашкарьова НАН України |
publishDate |
2009 |
topic_facet |
Методи та системи оптико-електронної і цифрової обробки зображень та сигналів |
url |
http://dspace.nbuv.gov.ua/handle/123456789/32212 |
citation_txt |
Processing image system for user head movement recognition / Ciprian Ovidiu Ungurean // Оптико-електронні інформаційно-енергетичні технології. — 2009. — № 1 (17). — С. 32-36. — Бібліогр.: 20 назв. — англ. |
series |
Оптико-електронні інформаційно-енергетичні технології |
work_keys_str_mv |
AT ungureanciprianovidiu processingimagesystemforuserheadmovementrecognition |
first_indexed |
2025-07-03T12:44:30Z |
last_indexed |
2025-07-03T12:44:30Z |
_version_ |
1836629801917480960 |
fulltext |
5
CIPRIAN OVIDIU UNGUREAN
PROCESSING IMAGE SYSTEM FOR USER HEAD MOVEMENT
RECOGNITION
“Ştefan cel Mare” University of Suceava,
University street 13, 720229, Romania,
tel/fax.: +40.230524.801,
E-mail: ungurean.ovidiu@gmail.com
Abstract. The aim of the paper is to describe a system for the head gesture recognition developed in the
frame of INTEROB project which is financed by the research grant 131-CEEX-II03/02.10.2006. The goal
of this project consists in developing an interaction based on gestures with information on robotic systems.
In the paper we discussed a method for controlling the mouse pointer movements on the screen by
recognizing the operator head movements captured by a video camera.
Keywords: head movement recognition, gesture and voice commands Adaboost algorithm.
INTRODUCTION
As the computer becomes more and more necessary, we will have to use different computing systems,
miniature work stations, mobile and omnipresent but GUI interface types, especially with peripherals like mouse
and keyboard won’t be able to stand up to future HCI requirements. The opportunity to create post-WIMP
(window, icon, menu, pointer)[1] interfaces that can be represented through gesture interaction techniques,
multimodal interfaces, tactile interfaces, virtual reality or augmented reality is being studied lately. These appear
from the necessity to have support for a flexible and efficient interaction that can be easily learned and used in a
natural and intuitive way.
A system that receives both gesture and voice commands, in case one of the commands is not
recognized then the other can substitute. Theoretically, information is efficiently transmitted if the entry modes
are many and diverse. That's why the gene
ral tendency is to optimize the entry for a certain number of
applications but this isn’t a satisfying solution because it leads to the designing of new gesture entries for each
application. Gestures must be complementary to each other in order to be used without causing confusion or
adverse effects during the interaction with the system. Also if data is difficult to use, to understand or it is
tiresome, the users will loose their motivation [2].
Gestures are a natural method of non-verbal communication and their implementation as a non-contact
interface command system could bring efficiency in usage [3][4][5]. A method through which the user could
easily communicate free and efficiently can be implemented by attaching a digital camera to provide the
information for the system to process [6]. We propose to realize such an equipment to allow computer
interaction using only head movements and thus to replace the use of mouse and keyboard.
The article is organized as follows:
• A short description of the user head movements recognition and interpretation system;
• Using movement history in a stream of images;
• Presentation of a performance testing method for the proposed approach and the obtained results;
• Application evaluation in 2 games;
• Conclusions.
A SHORT DESCRIPTION OF THE USER HEAD MOVEMENT RECOGNITION AND
INTERPRETATION SYSTEM
Because it proved to be fast and efficient in different illumination and shadow conditions, we use in our
approach the Adaboost algorithm for detecting the user’s frontal face. For user head tracking we have used the
CAMShift method, after the face is detected with Adaboost. Knowing the user’s head position in every moment
for movement recognition, we implemented the method that refers to “movement forms”. The proposed
approach has as effect the successful detection, tracking and interpretation of human head movements in
CIPRIAN OVIDIU UNGUREAN, 2009
ПРИНЦИПОВІ КОНЦЕПЦІЇ ТА СТРУКТУРУВАННЯ РІЗНИХ РІВНІВ ОСВІТИ З ОПТИКО-ЕЛЕКТРОННИХ ІНФОРМАЦІЙНО-
ЕНЕРГЕТИЧНИХ ТЕХНОЛОГІЙ
6
different light and shadow conditions of the surroundings or the face.
Viola et al. [7][8] have used Haar features and a faster method for computing them in images at any
scale within the integral image. Haar-like features are efficient in image classification because they encode
spatial relations between different image regions. In face detection, these regions are defined by the contrast
difference between the eyes and the cheeks or using contrast scales across the nose area and the eyes area.
Fig. 1. Block diagram of Adaboost face detection stage
A cascade of classifiers is a degenerated decisional tree in which at each stage a classifier will reject
some background patterns and accept the front faces from the image. The selected features are sorted according
to their importance in order to be used as a cascade of classifiers. Each classifier stage is trained using the
machine learning algorithm Discrete Adaboost. It builds a strong classifier using a large set of weak classifiers.
The speed of feature evaluation is important because the frontal face detection algorithm consists in sliding
windows at all scales over the input image. Using this algorithm, user front face is detected in various varying
illumination condition, occlusions or image noise.
Because the classifier used by the Viola & Jones algorithm detects only frontal faces (when the user is
looking at the webcam) and the purpose of our application is to interpret gestures caused by head translations
and rotation, we implemented a tracking algorithm based on color blends - CAMShift (Continuously Adaptive
MeanShift) [9] which is an extension of the MeanShift algorithm proposed by [10]. CAMShift is a tracking
algorithm that uses a combination of colors (user face skin color, obtained when Adaboost was applied) thus
being able to detect head position in video frames, no matter its orientation. CAMShift can be applied in
dynamic changing distributions by resizing the next frame’s search window, considering the initial moment
(zero-moment) the current frame. This algorithm can be used successfully in anticipating the object’s position in
a sequence of video frames. Thus, the face position could be known in case the user is not looking directly
towards the webcam, like when the head is nodding, tilting or shaking. After the front face is detected, we create
the user’s skin color histogram in Hue Saturation channels from HSV color space [11]. The skin color histogram
is used as a model for converting incoming pixels to a corresponding probability of skin image.
After the face is detected through Adaboost, CAMShift is used in tracking. CAMShift is not very
accurate, but it can tell us in which area the user’s face is. The search area on which Adaboost can be applied
again, for an eventual search will be smaller than the entire frame received from the webcam, but larger than the
window returned by CAMShift. Using this approach, we benefit from the advantages offered by the two
methods. (efficiency in face detection, tracking speed).
Method Time cost per frame
Adaboost 57ms
Adaboost + CAMShift 18ms
These results were statistically obtained for a period of 5 minutes on a Windows XP system with an
Intel Pentium M 1.7 GHz CPU.
The conditions for which Adaboost will search for a face in the whole frame are the following:
• the tracking window is too narrow or too tall;
• the condition imposed by the ratio between height and width of the tracking window is not met, as to
represent a face;
• inclination of the ellipse that represents the face is horizontal.
ПРИНЦИПОВІ КОНЦЕПЦІЇ ТА СТРУКТУРУВАННЯ РІЗНИХ РІВНІВ ОСВІТИ З ОПТИКО-ЕЛЕКТРОННИХ ІНФОРМАЦІЙНО-
ЕНЕРГЕТИЧНИХ ТЕХНОЛОГІЙ
7
After the first frontal face detection in the video frame is done, I'm using anthropometric human head
dimensions [12] (width, height) as a constant. Using this approach, gestures like moving towards and back from
the webcam can be recognized.
USING MOVEMENT HISTORY IN A STREAM OF IMAGES
The main methods of movement recognition and interpretation consist of trajectory analysis of the
moving parts [14][15][16], using the hidden state-space Markov model [16] or the active-passive model [18].
In [19] there exists an alternative for the gesture recognition methods through frame superposing that
form a specific movement for the human behavior motion patterns. Looking at a movement in an unclear video
sequence two reference points are obvious. The first is the spatial area in which the movement takes place. This
reference point is given by the pixel area where something is significantly modifying, no matter the way in
which it moves. The second reference point is the way in which the movement evolves within this area (i.e. an
element which expands or which rotates in a certain location). The human visual system has developed its own
ways to exploit these notions of how and where, and it captures sufficient movement proprieties to be used in
recognition.
Li et al [20] recognize out-of-plane head rotations as yaw and pitch using a detector-pyramid. In our
approach, knowing the user’s head position in every moment, the MHO algorithm can be applied. Using this
method, the advantage consists in the fact that the movement within a number of frames can be resumed in a
single gradient image, and this model is not time related. The main problem in using this method consists in
segmenting and labeling the silhouette that performs the movement. In our case the silhouette labeling is made
using the Adaboost-CAMShift method. Most of the silhouette extraction approaches use background, optic flow
or stereoscopic extraction. We chose the frame difference method. We calculate the direction of those
components which define the movement, only if they lie within the face area. Making use of the movement
direction detection from the gradient (motion gradient operation – MGO) [19] and the ellipsis which frames the
face area applying CAMShift, it allows us to recognize translation, rotation and inclination head gestures.
Fig. 2. a) Head gesture detection in down/left rectangle;
b) Skin color probability - the brighter pixels have the highest probability to be skin pixels;
c) MHI representation
In the figure 2.b. the red ellipsis successfully marks the user’s face area due to applying the detection
and tracking algorithm. Next to it in c. is an example of the frame with MHI applied. The red square marks the
face area. Direction and magnitude are calculated only for components within the face area (green marked). The
current frame movement direction, after summing the shift vectors, is represented in the lower right area.
EXPERIMENTAL RESULTS
To test the head movement gestures recognition rate we implemented a test application. At application
start, we present a movie of a virtual character that performs a series of directional gestures using its head, and
ПРИНЦИПОВІ КОНЦЕПЦІЇ ТА СТРУКТУРУВАННЯ РІЗНИХ РІВНІВ ОСВІТИ З ОПТИКО-ЕЛЕКТРОННИХ ІНФОРМАЦІЙНО-
ЕНЕРГЕТИЧНИХ ТЕХНОЛОГІЙ
8
the user must reproduce them. The test application receives the webcam’s frame flow and recognizes user
gestures. Accuracy correctly detected gestures is 74.2 %.
Fig. 3. Results of the head tracking system
In this graphic, the virtual character’s gestures are blue marked. Red marks gestures are recognized by
the test application. In this case, in which the user must reproduce the virtual character’s movements, as noticed
in the representation, a delay appears due to the necessary user time to recognize the virtual character’s gesture.
INTEGRATION IN APPLICATIONS
We also tested this approach’s efficiency in 2 computer-games: Pac-Man and Counter Strike 1.6. In the
arcade game Pac-Man the user’s head movements direct de main character through the labyrinth to gather points
and to avoid the ghosts. One of the users specified that after the test/evaluation period of the implementation,
when he wanted to play the game at home, on his one PC, he would have wanted to be able to control the Pac-
Man character using head gestures instead of directional keyboard buttons.
Counter Strike is a FPS (first person shooter) game, in which movement through the virtual space is
done using the directional keys and the point of view is modified through mouse movement. In our test
application, a user head movement leads to a change in the point of view, i.e. when moving the head to the right,
the point of view will move to the right.
CONCLUSION
When the users need to select an object of interest by pointing it using physical interfaces as mouse,
keyboard, tracking or sensors, the action is carried out indirectly. A software tracker which can follow the user’s
face, without any specialized equipment and in a completely passive and non-interfering way is a great success.
Our tests have found that most users were able to control sample applications after a few minutes of practice.
Perceptual user interfaces are inspired from real world and human-to-human interactions. Research in
this field will allow people to use technology more efficient, natural and easy to learn.
REFERENCES
1. A. van Dam, “Post-WIMP user interfaces”. Communications of the ACM, Vol. 40, No. 2, Pages 63-67,
Feb. 1997.
2. S. Oviatt and W. Wahlster (eds.), Human-Computer Interaction (Special Issue on Multimodal
Interfaces), Lawrence Erlbaum Associates, Volume 12, Numbers 1 & 2, 1997.
3. Radu Daniel VATAVU, Ştefan-Gheorghe PENTIUC, Christophe CHAILLO. On Natural Gestures for
Interacting in Virtual Environments Advances in Electrical and Computer Engineering, Suceava,
Romania ISSN 1582-7445, No 2/2005, volume 5 (12), pp. 72-79.
4. George MAHALU, Radu PENTIUC (2001) Acquisition and Processing System for the Photometry
Parameters of The Bright Objects Advances in Electrical and Computer Engineering, Suceava, Romania,
ISSN 1582-7445, No 1/2001, volume 1 (8), pp. 26-31.
5. Pentiuc, S.G., Vatavu, R., Cerlinca, T.I., and Ungureanu, O. “Methods and Algoritms for Gestures
Recognition and Understanding”. The Eighth All-Ukrainian International Conference,
UkrOBRAZ’2006, pp. 15-18, Ukraine, August 2006.
6. Keates S, Perricos C “Gesture as a means of computer access”. Communication Matters. 10. 1. 17-19
7. P. Viola and M. J. Jones, ”Rapid Object Detection using a Boosted Cascade of Simple Features”,
Proceedings of IEEE Computer Society’s Computer Vision and Pattern Recognition (CVPR 2001), Vol.
1, pp. 511-518, 2001.
8. R.Lienhart, J.Maydt, “An Extended Set of Haar-like Features for Rapid Object Detection”, Intel
ПРИНЦИПОВІ КОНЦЕПЦІЇ ТА СТРУКТУРУВАННЯ РІЗНИХ РІВНІВ ОСВІТИ З ОПТИКО-ЕЛЕКТРОННИХ ІНФОРМАЦІЙНО-
ЕНЕРГЕТИЧНИХ ТЕХНОЛОГІЙ
9
Labs,2002, Intel.
9. G. R. Bradski. “Computer video face tracking for use in a perceptual user interface”. Intel Technology
Journal, Q2 1998.
10. D. Comaniciu and P. Meer, “Robust Analysis of Feature Spaces: Color Image Segmentation,” CVPR’97,
pp. 750-755.
11. T. S. Caetano, D. A. C. Barone, “A probabilistic model for human skin color”, IAP Conf. 2001, pp. 279-283.
12. Leslie G. Farkas, Jeffrey C. Posnick, Tania M. Hreczko, “Anthropometric Growth Study of the Head”,
In: The Cleft Palate-Craniofacial Journal: Vol. 29, No. 4, pp. 303–308. 1992.
13. [Electronic resource] – Access mode: http://www.intel.com/technology/computing/opencv.
14. Polana R, Nelson R, “Low level recognition of human motion”. In Workshop on Motion of Nonrigid and
Articulated Objects. pp 77–82, 1994.
15. Black M, Yacoob Y, “Tracking and recognizing rigid and nonrigid facial motions using local parametric
model of image motion”. In: Proceedings International Conference Computer Vision, pp 374-381. 1995
16. Madabhushi A, Aggarwal J, “A Bayesian approach to human activity recognition”. In: Proceedings of
IEEE Workshop on Visual Surveillance, pp 25–32. 1999.
17. Chen, F.S, Fu, C.M., Huang, C.L., “Hand gesture recognition using a real-time tracking method and
hidden Markov models”, IVC(21), No. 8, August 2003, pp. 745-758.
18. Gary R. Bradski & James W. Davis Motion segmentation and pose recognition with motion history
gradients Machine Vision and Applications (2002) 13: 174–184.
19. Cutler R, Davis L (2000) “Robust real-time periodic motion detection, analysis, and applications”. IEEE
Trans Pattern Anal Mach Intel 22(8):781–796.
20. S. Z. Li, L. Zhu, Z. Q. Zhang, A. Blake, H. Zhang, and H. Shum. Statistical learning of multi-view face
detection. In Proceedings of the European Conference on Computer Vision, volume 4, pages 67–81,
Copenhagen, Denmark, May 28 - June 2 2002.
Надійшла до редакції 11.11.2008р.
OVIDIU CIPRIAN UNGUREAN - Ph.D student in Computer Science, "Stefan cel Mare"
University of Suceava, Suceava, Romania.
|