Original PDF Flash format numediart-quartely-progress-scientific-report,-vol.-2-no.-1,-march-...  


Numediart Quartely Progress Scientific Report, Vol. 2 No. 1, March ...

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
Research Program in Digital Art Technologies - QPSR Vol. II - No. 1
Quarterly Progress
Scientific Report
Vol. 2, No. 1, March 2009
T. Dutoit, B. Macq (Editors)
i

Published online by:
Faculté Polytechnique de Mons (FPMs)
Laboratoire de Théorie des Circuits et Traitement du Signal (TCTS)
http://tcts.fpms.ac.be
Université Catholique de Louvain (UCL)
Laboratoire de Télécommunications et Télédétection (TELE)
http://www.tele.ucl.ac.be
Credits:
Editors: Thierry Dutoit (FPMs/TCTS), Benoît Macq (UCL/TELE)
Cover photo: Loïc Reboursière
LATEX editor: Christian Frisson (UCL/TELE), using LATEX’s confproc class (by V. Verfaille)
All copyrights remain with the authors.
numediart homepage: http://numediart.org
Contact: contact@numediart.org

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
Preface
numediart is a long-term research program centered on Digital Media Arts, funded by Région Wallonne,
Belgium (grant N◦716631). Its main goal is to foster the development of new media technologies through
digital performances and installations, in connection with local companies and artists.
numediart is organized around three major R&D themes:
• HyFORGE - Hypermedia Navigation: Information indexing and retrieval rely classically on con-
strained languages to automatically describe contents and allow formulating queries, respectively.
This approach becomes hardly applicable for multimedia contents such as music or video because
of the disparity between computable low-level descriptors and desired high-level semantics - the
so-called semantic gap. Alternatively, HyFORGE investigates human-in-the-loop approaches and
innovative tools for structuring and searching multimedia contents. Along with audio and image
processing, HyFORGE builds up on self-organizing models to derive enhanced views of multimedia
collections and provide users with efficient browsing interfaces.
• COMEDIA - Body & Media: COMEDIA is named from a French contraction between body and
media or stage director and media, which nicely sums up the main objective of this axis: giving to
bodies the means to be their own artistic director! Hence based on position on stage or choreography
between multiple artists for the inter-relationship and gestures or voice for the intra-relationship, CO-
MEDIA aims at creating interactivity between performing artists and the multimedia context around.
Event description, low-level feature analysis, pattern recognition, heterogeneous sensor fusion, ro-
bustness against lighting and real-time are our keywords in 1D, 2D and 3D signal processing to reach
these goals.
• COPI - Digital Instruments Design: COPI aims at developing a software/hardware toolbox for cre-
ating innovative digital musical instruments, from scratch or by augmenting existing instruments
with new interactive channels. The main challenges for this R&D axis are to produce expressive
instruments which maintain a close, embodied relationship with the musician. Our approach is to
produce new sound design architectures using a large database of pre-recorded signals while main-
taining real-time control of the design process. Our scientific work therefore implies three main axes:
the development of expressive production models (audio signal processing), followed by the design
of gestural control systems for their synthesis parameters, coupled with statistical modeling of this
dynamic control.
numediart is the result of collaboration between Polytech’Mons (Information Technology R&D pole)
and UCL (TELE Lab), with a center of gravity in Mons, the cultural capital of Wallonia. It also benefits
from the expertise of the MULTITEL research center on multimedia and telecommunications. As such, it is
the R&D component of MONS 2015, a broader effort towards making Mons the cultural capital of Europe
in 2015.
iii

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
This fifth session of numediart projects was held from January to March 2009.
During a one-week workshop from Feb 23rd to Feb 27th, hosted at BRASS, Brussels, Belgium, the
participants blended their efforts as a midway boost to the projects.
The session ended with a public presentation of the results (with demonstrations) at BRASS, Brus-
sels, Belgium, on Tuesday, March 31st, 2009.
iv

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
Projects
Session #05 (Jan-Mar 2009)
1 Project #05.1: MATRIX: Natural Interaction Between Real and Virtual Worlds
Matei Mancas, Joëlle Tilmanne, Ricardo Chessini, Sullivan Hidot, Caroline Machy, Sidi Mahmoudi, Thierry Ravet
9 Project #05.2: Behavioral Installations: Emergent audiovisual installations influenced by visitors’ behaviours
Jean-Julien Filatriau, Christian Frisson, Loïc Reboursière, Xavier Siebert, Todor Todoroff
19 Project #05.3: MediaCycle: Browsing and Performing with Sound and Image libraries
Xavier Siebert, Stéphane Dupont, Philippe Fortemps, Damien Tardieu
v

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
vi

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
MATRIX: NATURAL INTERACTION BETWEEN REAL AND VIRTUAL WORLDS
Matei Mancas 1, Joëlle Tilmanne 1, Ricardo Chessini 2, Sullivan Hidot 3, Caroline Machy 4, Sidi Mahmoudi 3, Thierry Ravet 1
1 Laboratoire de Théorie des Circuits et Traitement du Signal (TCTS), Faculté Polytechnique de Mons (FPMs), Belgique
2 Service d’Électronique et de Micro-Électronique (SEMI), Faculté Polytechnique de Mons (FPMs), Belgique
3 Service Informatique (INFO), Faculté Polytechnique de Mons (FPMs), Belgique
4 Multitel (ASBL), Belgique
ABSTRACT
2. A second way to do it which was tested here uses a single
camera. The Z information was obtained by the segmen-
The purpose of this project was to enhance the perceived natural-
tation of the participant’s silhouette head. This segmenta-
ness of the interaction between real and virtual worlds. In order to
tion was achieved by using an existing EyesWeb module [7]
achieve this goal, two main axes were followed. The first one is in
called ’centroids’ which fits a human skeleton into a motion
the enhancement of the interaction with virtual worlds through a
silhouette. Even if the silhouette is not entirely visible, the
better visualization of 3D scenes and a haptic feedback from those
head can be located with a good precision if a hand which
worlds. The second axis concerns the intelligence that virtual char-
is moving in front of the head is not higher than the head
acters could have by observing a scene and paying attention like
itself. At the condition of paying attention at this drawback
humans do.
it is possible to get a reliable segmentation of the head and
neck.
KEYWORDS
The apparent area of the head is then used to have an idea about the
Virtual reality, avatar, movement, human-machine interface, ob-
relative depth of the person. If the head area is small, the person is
ject recognition, computational attention, saliency, bottom-up, top-
far from the camera and if the head area is larger, this means that
down
the person is close to the camera.
In that way we managed to save the relative Z position of virtual
objects when they are brought by a real person and dropped in a
1. INTRODUCTION
specific place. The Z position of the person is than compared at
each time to the Z position of the virtual object. If the object is
Virtual worlds and characters invade more and more the real world
closer to the camera than the person, the person will pass in the
scenes to provide interesting augmented reality applications. Some
back of the virtual object (Figure 1, left image) and the virtual
toolkits [2] which provide easy to use solutions to create aug-
object will superpose to the person. On the contrary, if the person
mented reality scenes already exist. Nevertheless, there are sev-
is closer to the camera, in case of superposition, the person will be
eral issues with this kind of visualizations. For example, it is not
behind the virtual object which will be hidden (partially or totally
easy to render the depth of the virtual object compared to the real
as in Figure 1, middle image) by the person. Moreover, if the
objects in the scene. It is also difficult to animate complex avatars
person takes the virtual object and moves with it, the object X and
with natural behaviors and to get feedback from virtual objects as
Y coordinates will of course change (as it was previously the case
they do not have a physical existence. Finally, adding intelligence
in project 4.3) but if the person’s Z position changes, the size of
to those virtual characters is of a strong importance to obtain a
the virtual object will change accordingly: if the person and the
more natural behavior and interaction with humans. In this project
object are far from the camera, the virtual object will be smaller
we investigate solutions for all those issues. In a first part we deal
and it will increase its size (in the same time as the real person
with the depth, manipulation and feedback from virtual objects,
naturally increases its size) while approaching the camera (Figure
than in a second step we provide cues about how to enhance com-
1, right image).
puter intelligence to reach a higher degree of interaction natural-
ness.
2.2. Feedback from the virtual world
One wireless device was developed in order to provide a natural
2. INTERACTION ENHANCEMENT
feedback to the user. Indeed, as the user ’manipulates’ virtual
objects, he can not touch or feel them, that is why he will have
2.1. Getting information about the Z (depth) position
unusual movements. The idea is to provide to the user a glove
In a first step, the system already presented in the previous Nume-
with small motors which vibrate function of the distance (on X,
diart project (4.3: Augmented Virtual Studio [11]) was enhanced
Y and Z) between his hand and the virtual object. In that way
by adding depth information.
he will be able to do more natural gestures as he ’feels’ the vir-
tual object. The device used for this feedback was based on the
Two ways to obtain the
Simius hardware platform and vibration motors. Simius project
Z information were tested:
(www.simius.be) allows fast hardware prototyping by using a
1. The first one uses a second side camera and it is able to
generic hardware based on a PIC Microchip micro-controller on
provide depth information by mapping the X axis of the
which it is possible to plug several hardware drivers and respec-
side camera on the Z axis of the front camera.
tive firmwares very easily. The feedback device was composed by
1

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
Figure 1: Example of interaction between a human and a virtual object.
the basic micro-controller platform, a Bluetooth wireless commu-
We use to acquire the actor’s movements in real time the MO-
nication driver (in order to connect to the PC and get the distance
CAP IGS-190 of ANIMAZOO [1]. On-board instrumentation of
between user’s hand and the virtual object position), motors con-
this device includes 18 gyroscopic inertial sensors. The sensors
trol driver and three vibration motors (see Figure 2).
must be placed on different anatomical locations to transpose the
subject’s movements. They enable to compute the absolute nodes
positions of a 3D skeleton model. The data are sent from the in-
strumentation to a computer by the mean of a wireless connection.
A Software Development Kit enables to retrieve data in the form
of three rotation angles for each node and the three dimensional
coordinates of the hips.
The realization of this project was possible thanks to the collabora-
tion with the company NeuroTV [12]. We use its system of anima-
tion in real time of virtual characters, NeuroTOON. It is possible
to command this 3D engine with an interface programmed in the
languages LUA or C#. To elaborate and test the developed system,
a biped dragon avatar was chosen.
Two processes are programmed to establish a host connection and
to interface MOCAP and graphic engine. The master host (in
LUA) is connected directly with NeuroTOON. The slave host (in
C++) receives the data from the MOCAP and applies the signal
processing to adapt the acquired movements. These two hosts
communicate using TCP/IP by the means of the functions library
Figure 2: Feedback device.
WINSOCK. Several MOCAP’s could be connected with the 3D
engine to animate several characters. The data collected by the
MOCAP are processed with the following modifications in order
The wireless device module s small enough to be located on
to stick to the structure of the avatar:
a user’s glove and to be supplied by a small battery. The three
• The rotations coordinates of each node are encoded in a
vibrators can be located on the glove on three different fingers.
world coordinate system in ANIMAZOO SDK whereas they
Each motor passes the feedback information about virtual object
are in a body-fixed coordinate system in NeuroTOON. A
axis (X, Y and Z) proximity with the user’s hand. The vibration
coordinate transformation is applied to the data [6].
strength increase while user and virtual object is closer. An appli-
cation control function was developed to interface the PC and the
• The calibration position of the MOCAP and the null posi-
tion for the avatars in the 3D engine are not the same. We
wireless device. The control is independent for each axis.
encode a rotation for the left and right upper arm to get a
correct visualization. The angle of this calibration rotation
2.3. Live detailed interaction with virtual avatars
is calculated in order to create a dead (neutral) area around
The work presented in this section aims to animate a cartoon style
the body of the character. That prevents undesirable inter-
anthropomorphic virtual avatar using a movement capture suit (MO-
action between the arms and the body of the avatar in a rest
CAP). We implemented a tool to study the methods and the algo-
position (see Figure 3).
rithms to harmonize the actor’s movements with the morphology
• Movements for the tail of the dragon are interpolated using
of the avatar. To seem realistic and natural, the animation of such
the current positions of the calves and the thighs (see Figure
characters requests to adapt the movements and the moving with
4).
the proportions of the 3D model.
2

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
Figure 3: Null position and dead zone for the arms.
Figure 4: Interpolation of tail movements.
In addition to the use of the original recorded motion data, ap-
Williams. Even if the improvements brought by this algorithm
plied to an avatar without deep modification of the content for the
were not convincing enough, this platform will enable us to visu-
motion itself, we wanted to be able to keep the baseline of the mo-
alize motion modified by any kind of algorithm we will implement
tion but to modify its style.
in the future, and can thus be used as a tool for the validation of
the modifications applied on the motion.
The first method we implemented for the modification of the mo-
tion capture data is based on multiresolution filtering, following
3. COMPUTER INTELLIGENCE ENHANCEMENT
the article by Bruderlin and Williams [5]. The algorithm consists
in decomposing the motion into several frequency bands that can
3.1. Object finding
then be amplified with individual gains before recomposing the
signal. The high frequencies are related to brisk motions or noise
Proposed by Bay et al. [4], SURF (Speeded Up Robust Features)
and middle frequencies correspond to the periodicity of the walk
is a well-known object detection and recognition method based on
files analyzed (the low frequencies containing few information).
some keypoints of an image. The aim is to express the neighbor-
We could thus amplify the walk motion, going from walking to a
hood around those interesting keypoints as a vector descriptor, and
run-like motion, by giving more importance to the middle frequen-
to find some homogeneous features in a novel image.
cies or enhance the briskness of the motion by giving more weight
to the high frequencies. All possible gains combinations could be
A similar method, called SIFT (Scale-Invariant Feature Transform)
tested, but if some combinations gave a result that clearly looked
[8], has already been mentioned in Augmented Virtual Studio [11].
like a puppet running, most of the tested combinations did not lead
However, it has been proved that SURF outperforms SIFT and all
to convincing motions. One of the main problems of this method
the others existing methods [4, 3], especially regarding the compu-
is that it does not take into account any constraint, leading to limbs
tational time and the accuracy of the results. Thus, we have chosen
bending unnaturally and through each other. Another issue is that
to use SURF by attempting to propose two novel applications: ob-
the modification is applied to all the joints of the skeleton in the
ject and face tracking.
same way. From the results it produced, we think that applying a
PCA and only modifying the first components of this PCA (rep-
Like SIFT, SURF is divided into two steps: the learning step where
resenting the main motion of the character) may enable to modify
the descriptors in some images are grouped into a database and the
only - or at least principally - the most important motion of the
recognition step where descriptors are extracted from a new image
character, leading to exaggeration of the main motion, like toons
and compared (matched) to those from the database. In the learn-
would do.
ing step, SURF detects the keypoints from a convolution between
a Gaussian second-order derivative and the initial image, which
Several other algorithms should be analyzed in the future, like a
gives us an approximation of the Hessian matrix. Computing the
simple rescaling of the joint angles - all or a subset - as a first im-
maxima of the approximation gives the location of the keypoints.
plementation of this basic method already gave interesting results.
Once we have detected the keypoints, a robust features computing
step is achieved from their locations. This step takes into account
The application developed using NeuroTV software and the mo-
the strengths and the orientations of local gradients. Finally, the
tion capture data development toolkit enabled us to visualize in
vector descriptor is built in order to perform the matching process
real-time the motions coming from the motion capture suit, but
(recognition step).
also to visualize motions modified by the method of Bruderlin and
3

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
Figure 5 shows the main results concerning head pose detection
using SURF. On the left, we can see the keypoints detection step.
The red circles represent the location of the keypoints whereas the
blue circle is their center of mass. For the training process, several
images (here 5 to 10 images with different orientations) are cho-
sen from the camera and the recognition step (Figure on the right)
is achieved frame-by-frame. Similarly, Figure 6 shows a simple
illustration for object recognition and we can notice that both pro-
cedures works in real-time.
Figure 6: Object tracking using SURF.
going from left to right induces surprise and interest as its bound-
Figure 5: Head pose detection using SURF.
ing box is smaller (more rare) than the one comprising the three
other people.
The quantity of motion (QoM) was also used to detect anomalous
motion in crowds. In this case, it is impossible to track individuals
3.2. Attentive computers
and even groups of people. Based on mathematical morphology,
Artificial intelligence could benefit a lot from mimicking human
this method uses no tracking. Motion is divided in several QoM
attention: they could thus be surprised by novel situations and fo-
classes and then, the class which is in minority at time T is consid-
cus on the most informative and unknown areas of the acquired
ered as the most important. Figure 8 shows an example of crowd
signal. The aim of computational attention is to automatically pre-
(left) and attention map (right) where red colors, then red and fi-
dict human attention on multimodal data such as sounds, images,
nally blue colors show the areas in the crowd which are decreas-
video sequences, smell or taste, etc... The term attention refers
ingly the most interesting.
to the whole attentional process that allows one to focus on some
stimuli at the expense of others. Human attention is mainly di-
vided into two main influences: a bottom-up and a top-down one.
Bottom-up attention uses low-level signal characteristics to find
the most salient or outstanding objects. Top-down attention uses
a priori knowledge about the scene or task-oriented knowledge in
order to modify (inhibit or enhance) the bottom-up saliency.
3.2.1. Bottom-up attention
During this project, we developed two different approaches both
based on rarity approach [9, 10] of bottom-up attention for differ-
ent scenarios. In a multi-user scenario where people can still be
tracked individually or by groups, a 8 directions (E, NE, N, NW,
W, SW, S, SE) approach was tested. 8 maps corresponding to the
8 possible directions are showed on the left of Figure 7.
In this example there is a person moving from left to right,
Figure 7: Left: 8 motion maps for the 8 directions, right: interest-
while 3 others move from right to left. By looking where motion
ing movement in red, normal movement in black.
is the rarest in all the motion maps, it is clear that the person alone
4

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
While (Current Frame <= Learn Time)
Extract Current Frame
For (i = 1 to frame size)
if ( pixel[i].val = 1) then Acc[i] ++
End
Current frame++;
End
For (i = 1 to frame size)
if (Acc[i] >= Thr Acc) then Model[i]= 1)
else Model[i]= 0)
Figure 8: Left: crowd, Right: motion attention (red = high, green
End
= medium, blue = low).
End
Where:
• Learn Time: Time (Number of frames) to extract the model.
• Pixel Val: 0 (no motion) or 1 (a motion has happened in this
pixel of frame).
• Acc[i]: Image accumulating moving pixels during the learn
time.
• Thr Acc: Threshold used to select the dominant regions of
motion.
• Model: Image representing the model.
In our method we have taken the value of 100 as Threshold Ac-
cumulator, the extraction time of the model can be longer if we
Figure 9: Two moving persons detected by frame difference.
have a bigger value of this threshold. Figure 10 shows the model
extracted after 1000 frames.
3.2.2. Simple Top-Down long-term attention model
Top-down attention will use motion accumulation to build models
which are able to inhibit motion which is not rare.
In this step we propose an approach which allows extracting a top-
down long term attention model from video sequences. Once this
model is acquired, motion close to that model is inhibited while
only different motions are highlighted, these motions can be con-
sidered as abnormal events.
This approach follows three steps: frame difference, frame accu-
Figure 10: The Model represents dominant regions of motions ex-
mulation and motion inhibition.
tracted after 1000 frames.
Frame difference
We use the simplest motion detection technique which consists in
Motion inhibition
the difference between every two consecutive frames in video, this
method is very fast to implement and it needs no background mod-
In this step we apply a subtraction between the frames of moving
elization which can be a difficult task in some situations. Figure 9
objects (frame difference) and the model extracted. This substrac-
shows the results of this step.
tion allows us to inhibit similar motions and focus us only on ab-
normal motions.
Frame accumulation
To lead the experiment, as a data set we used different videos from
The frame difference allows us to detect moving objects, so every
different outdoor places which comprise of both similar (normal)
frame represents object in movement, the video frames will be ac-
and abnormal motions. The experimental results uses a video of
cumulated using a threshold accumulator in order to have as result
3800 frames, the time of model extraction was 1000 frames and
a top-down model representing dominant regions. These two steps
the motion inhibition starts just after, Figure 11 shows the result of
can be explained by this algorithm:
motion inhibition which allowed us to detect motions happening
out of the top-down model (abnormal events) at the frame number
Begin Video Capture
1900, 2100, 2600 and 3150. These abnormal motions are high-
Frame Difference
lighted using the red color.
5

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
Figure 11: Abnormal events detected with motion inhibition.
3.2.3. Enhanced model using directions
This simple model was extended to motion direction analysis. In
that way, not only the scene occupation statistics are important but
also the motion direction. If at a given location in the scene, mo-
tion mainly directed to the right, with the previous model objects
moving in any direction at this location will be inhibited. With this
new approach only the motion directed to the right will be inhib-
ited and not the other movements.
The directions are divided into the 8 main categories (E, NE, N,
NW, W, SW, S, SE) and a model for each one of this category is
achieved. If a moving objects passes in a location AND a direc-
tion which is usual, the motion is inhibited. Otherwise the motion
Figure 12: Left column, the images show the detected motion of
is visible.
the participants (in red), the motion vector of the model (in green)
and the current motion vector of the frame (in blue). Right column:
Figure 12 shows a test case: a person having repetitive movements
salient motion of the participants (in red) detected after the model
is inhibited while if several people have the same paths, those who
was applied (participants have different motion directions or they
have different paths are detected and those who achieve the same
are located in positions where few motion was detected).
paths are not detected.
4. CONCLUSION
6. REFERENCES
This project let us enhance the virtual object visualization and in-
teraction through two main points of view which lead to a more
6.1. Scientific references
natural interaction between real scenes and virtual objects or char-
acters. Real objects depth was computed to be able to locate virtual
[3] J. Bauer, N. Sünderhauf, and P. Protzel. “Comparing Sev-
objects relatively to real objects in a natural way. More complex
eral Implementations of Two Recently Published Feature
interfaces as a motion capture suit were used to provide natural
Detectors”. In: Proceedings of the International Confer-
ence on Intelligent and Autonomous Systems, IAV, Toulouse,
movements to virtual characters. Electronic devices were used to
France
get feedback from the virtual objects and thus to interact with them
(2007). P.: 3.
more naturally.
[4] H. Bay et al. “SURF: Speeded Up Robust Features”. In:
Computer Vision and Image Understanding (CVIU) 110.3
Finally, computer intelligence was enhanced by providing them
(2008). Pp. 346–359. P.: 3.
methods on how to pay attention as humans do. Those attentive
[5] A. Bruderlin and L. Williams. “Motion Signal Process-
computers can then act in a more natural way and mimic some of
ing”. In: Computer Graphics, Annual Conference Series 29
the human reactions.
(1995). Pp. 97–104. P.: 3.
[6] J. Diebel. Representing Attitude: Euler Angles, Unit
5. ACKNOWLEDGMENTS
Quaternions, and Rotation Vectors. Tech. rep. Stanford
University, 2006. P.: 2.
numediart is a long-term research program centered on Digital
Media Arts, funded by Région Wallonne, Belgium (grant N◦716631).
[8] D.G. Lowe. “Distinctive Image Features from Scale-
Invariant Keypoints”. In: International Journal of Com-
puter Vision 60.2 (2004). Pp. 91–110. P.: 3.
6

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
[9] M. Mancas. “Computational attention: Towards attentive
computers”. In: Similar edition, CIACO University Distrib-
utors (2007). P.: 4.
[10] M. Mancas. “Relative influence of bottom-up and top-
down attention”. In: Attention in Cognitive Systems, Lec-
ture Notes in Computer Science 5395 (2009). Pp. 212–226.
P.: 4.
[11] M. Mancas et al. Augmented Virtual Studio. Tech. rep. 4.
2008. Pp.: 1, 3.
6.2. Software and technologies
[1] “Animazoo”. URL: www.animazoo.com/IGS190.
aspx. P.: 2.
[2] “ARToolKit”. URL: www.hitl.washington.edu/
artoolkit/. P.: 1.
[7] “EyesWeb XMI platform”. URL: www.eyesweb.org.
P.: 1.
[12] “NeuroTV”. URL: www.neurotv.com. P.: 2.
7

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
8

QPSR of the numediart research program, Vol. 2, No. 5, March 2009
BEHAVIORAL INSTALLATIONS: EMERGENT AUDIOVISUAL INSTALLATIONS
INFLUENCED BY VISITORS’ BEHAVIOURS
Christian Frisson 1, Loïc Reboursière 2, Jehan-Julien Filatriau 1, Todor Todoroff 2, Xavier Siebert 3
1 Laboratoire de Télécommunications et Télédétection (TELE), Université Catholique de Louvain (UCL), Belgique
2 Laboratoire de Théorie des Circuits et Traitement du Signal (TCTS), Faculté Polytechnique de Mons (FPMs), Belgique
3 Laboratoire de Mathématique et Recherche Opérationnelle (MathRo), Faculté Polytechnique de Mons (FPMs), Belgique
ABSTRACT
image and giving semantical information) that organizes them by
similarity by the Kohonen Self-Organizing Map algorithm, and
This paper presents a numediart project in collaboration with
displays them on a projection screen near each other in a two-
two artistic projects: Méta-crâne by Thomas Israel [15] and HUM
dimensional map. Another example is the Khronos Projector by
by François Zajéga [30]. The scope of this project was to offer
Alvaro Cassinelli in 2005 [7, 6] with which visitors can alter the
technological forecasting and development consultancy to these
playback of videos by touching and deforming the projection screen.
two highly-interactive installations that both share a common goal
of blending ‘behavioral” recognition of crowd motion with audio-
visual rendering. We achieved initial promising results for the
1.2. Interactive/Live Cinema
Méta-crâne navigation by similarity in a video database and for
the HUM analysis of crowd behaviors by computer vision tech-
Still in the scope the HyForge numediart research theme, other
niques. We also provided a state of the art in domains such as
works offer the possibility to break the fixed narrative timeline of
sound spatialization, video projection on a 3D surface.
cinematographic works, either at the audience will as previously
illustrated for installations, or during performances by artists. A
KEYWORDS
comprehensive retrospective is available in [19], its author hav-
ing later on produced a series of three “soft cinema” movies [20]
Interactive installations, gestural control, audiovisual rendering,
where the viewer can have a fairly limited impact on the timeline.
video similarity, sound spatialization, 3D projection
Experienced “locally”, Late Fragment [8], an interactive video
installation directed by Daryl Cloran, Anita Doron & Mateo Guez;
1. BACKGROUND ON INTERACTIVE INSTALLATIONS
has been shown during Brussels’ Offscreen Festival in 2009 at Cin-
ema Nova. At the end of each chapter, a sequenced is being looped
The following state of the art short study shows that there is still a
until the viewer understands one can “choose” between two possi-
high research potential both for public interaction methods (espe-
ble sequels using the remote control, thus offered a limited number
cially gestural) in interactive installations, and for more complex
of scenarios and possible plot explanations.
timelining methods in live/non-linear movies.
SLIDERS, [10] presented during a workshop and performed
at iMAL in Brussels in 2008, offers a real-time audiovisual inter-
face aimed at recomposing movies “live”, from fragments hosted
1.1. Interactive Installations
in a database. However, not much information is available on the
During the last century, following to the progress in technologies
methods with which fragments can be queried and put in relation
such as domotics, the domain of architecture has been showcasing
to one-another.
many inventive constructions, as highly documented and ilustrated
in Kronenburg’s books [17]; not only homes and living spaces,
2. TWO INSTALLATIONS, TWO ARTISTIC
but more specifically museums, concert rooms; and entertainment
SUBPROJECTS
venues such as The Sphere [1] by the Belgian company Alterface
using its KioskTMtechnology. Lots of immersive multimedia in-
Two artistic projects have been in development in parallel with this
stallations have been created by artists, as documented in [13, 28].
numediart project. Both have been offered a full-time residency
Most of them use straightforward but simple computer vision tech-
at:
niques to analyse the user(s) behaviour.
BRASS
Regarding interactive installations, besides Thomas Israel’s ex-
Avenue Van Volxem 364
perience in the field with for instance Peeping Tom (2006) and
B-1190 Forest
the Le Lit TröM (2005) [15], other notable artists in Belgium are
Belgium
Olivier Meunier with his interactive dome, Real Unreal [21], made
in collaboration with the artists collective Foton and the hardware
developer Periactes, and first presented at the Altitudes 1000 fes-
tival at Recyclart in 2006; and Pascale Barret with her interactive
corridor Synapse 2.0 created in residency at iMAL in 2008 [3].
Of interest for the numediart HyForge research theme on
hypermedia navigation, George Legrady conceived Pockets Full
Of Memories [18] in 2002, an installation where visitors bring
objects that they encode in situ in a database (by scanning their
9

QPSR of the numediart research program, Vol. 2, No. 5, March 2009
2.1. Méta-crâne by Thomas Israel
2.1.1. Artistic intention
“Méta-crâne is a technical reconstruction of a symbolical process
that we all know well but often repress: free association. A net-
work of neural activities is being simulated and processes a certain
amount of new events, past episodes and ancient memories that
are presented to the “spect-actor” within an immersive environ-
ment. Its purpose: transposing self-disclosure and creating new
meanings inside an interactive object.”
Figure 2: A mashup of visuals extracted from Thomas Israël’s in-
stallation Caresse moi! (2006).
2.2. HUM by François Zajéga
2.2.1. Artistic intention
“HUM is an audiovisual digital artwork where interactivity is used
to enhance the dialog between the artwork and its audience. Through
his motion, the visitor handles a visual and sonic shape and, in
the same time, feeds and educates HUM. Once the visitor stops
moving inside the installation, ready to listen, HUM gives him a
response through the same media by mixing the long-term trends
Figure 1: Side-view mockup of the installation being visited by a
learned since the beginning of its life and the specific behaviour of
spect-actor.
the visitor, highlighting the potential richness of the installation.
The aim, and unique criteria of quality, is to increase the energy
emitted by the visitor by encouraging him to go out of standard
“When the spect-actor enters the Méta-crâne, as depicted in
behavioural scheme. The visitor is this both creator and spectator,
figure 1, he gradually perceives the inherent and emergent be-
guiding and guided by HUM.”
havior of the installation, while the flow of audiovisual fragments
which are sequenced around him tends to take into consideration
2.2.2. Prototype setup
the color of his clothes, his body rhythm revealing his serenity
or discomfort, and so on..., but without being granted “1:1” con-
The first HUM prototype features a large projection screen, a pro-
trol. The initial content of the multimedia database presented to
jector, 4 speakers for sound spatialization, a webcam for crowd
the spect-actor is composed of material extracted from Thomas
behaviour analysis, 3 computers to run the analysis and rendering.
Israël’s previous productions [15], sequenced in real-time and pro-
cessed with spatio-temporal effects, as in figure 2; and gets fed
back from records of the spect-actor’s behavior.”
3. MÉTA-CRÂNE: TECHNOLOGICAL CHALLENGES
This part of the project was done in the context of a close collabo-
ration between numediart and the artist Thomas Israel.
2.1.2. Prototype setup
3.1. Main contribution: navigation in an audiovisual database
based on video similarity
The first Méta-crâne prototype has been designed and realized by
Thomas Israel and Thierry Sablon [23]. The dome of 3 m diam-
One of the goals of the Méta-crâne installation was how to make
eter features a webcam for gestural analysis of crowds, a Projec-
the projected videos follow each other so that it feels natural for
tion Design F2 sx+ wide SXGA+ video projector and a spherical
the spectator, according to her/his behavior. For example, if the
mirror for the projection as illustrated in figure 3; 5 FAR Audio
spectator moves rapidly, a group of videos that convey a sense of
OBS active loudspeakers for sound spatialization. The software
speed should be displayed. This implies that the videos ought to be
architecture of the installation has been developed by Thomas Is-
organized so that similar videos can be retrieved easily. Similar-
rael and Laura Colmenares Guerra [14] using the Isadora modular
ity between videos was computed using the MediaCycle software
visual programming environment [27].
developed in the numediart project Media Cycle (# 5.3). This
10

QPSR of the numediart research program, Vol. 2, No. 5, March 2009
Figure 4: Thumbnails of images classified by similarity along the
Hue component of the HSV color space
3.2. Technological forecasting: 3D projection
Many 3D projection screen geometries have been developed so far,
namely the Panoscope 360 [9], and its upside-down equivalent the
Panodome, both from Luc Courchesne and the Société des Arts
Technologiques of Montreal, Canada. Thomas Israel chose a top
hemisphere, ressembling the most to a skull, and allowing visitors
to move easily under.
Figure 3: First Méta-crâne hardware prototype, designed and real-
ized by Thomas Israel and Thierry Sablon, featuring a video pro-
So as to reduce costs as much as possible, we decided to use
jector and a spherical mirror
Paul Bourke’s design [5, 4] requiring only one projector, as op-
posed to multiple projectors, and a spherical lens instead of fisheye
lenses.
software was originally developed for sounds in project Audio Cy-
To enhance the viewer’s immersion, 2D images or videos dis-
cle (# 4.1), then adapted to images and now videos at the time of
played on a non-plane surface should be pre-warped accordingly
writing.
to the projection screen geometry so as to have them look accurate.
Features extraction is based on the OpenCV library and has
Many solutions exist, but wouldn’t suit our needs (ie integrated as
been written in C++. In order to make the classification results
most as possible inside the Isadora framework [27] running under
easily available for the artist, a text file has been generated which
OSX): Eluminati’s OmniMap API [11] is running on Microsoft
respects the text-read format in the Isadora software [27].
Windows only, Territoires Ouverts’ lightTWIST [26] didn’t sup-
Several criteria were investigated to compute similarity:
port hemispherical surfaces at the time of writing. Olivier Meunier
used the 3D modeller Blender and the Ashvid video texture plugin
• color, luminosity and saturation (corresponding to hue, sat-
for his installation Real Unreal [21], but it wasn’t made available
uration and value of the HSV color space). These quanti-
to the public.
ties were computed for each frame of the video, then the
average and standard deviation were used for comparison
Closer to a promising result, we tested Paul Bourke’s algo-
between videos, as illustrated in figure 4.
rithm released as a Quartz Composer object (a node-based visual
programming language available on MacOS X system) and that
• speed, defined as the pixel-per-pixel rate of change from a
could be used as a FreeFrame plugin (a cross-platform open frame-
frame to another, either in RGB of HSV color spaces. The
work for developing video effects plugins) inside Isadora. This ob-
average and standard deviation of these quantities were also
ject is then quite easy to use : an incoming image is distorded by
used for comparing between videos.
the chosen matrix file and the resultant image is then output. If the
Preliminary tests with a few users indicated that speed-based
distortion matrix is well defined accordingly to the dome geometry,
descriptors correspond more closely to human perception of videos
the projection works quite perfectly. Paul Bourke provides another
similarity, and was therefore used in the artistic installation.
software application for the generation of the distortion matrix file
Other features like textures, shapes or faces detection could be
that is characteristic of the chosen screen. Two useful matrix files
added in the future and will be investigated in upcoming numedi-
are provided by default : the one of a fish eye lens and the one of a
art projects, e.g., ‘Video navigation tool: application to browsing
spherical mirror. Due to screen/matrix calibration issues, the result
a database of dancers’ performances (#07.3) and can potentially
using this pre-warping feature wasn’t satisfying enough to justify
refine the current measures of similarity between videos.
its use, especially because of to the added processing cost.
11

QPSR of the numediart research program, Vol. 2, No. 5, March 2009
3.3. Technological forecasting: sound spatialization
4. HUM: TECHNOLOGICAL CHALLENGES
Many spatialization techniques have been developed so far and
This part of the project was done in the context of a close collab-
can be somehow classified in two main categories: “individualized
oration between numediart and the artist Francois Zajéga, who
spatialization”, working for one listener at a time preferably with
was in the process of creation of his new art installation named
headphones and customized accordingly to one’s skull sound char-
HUM. HUM is a visual and sonic interactive installation, where
acteristics (namely Head-Related Transfer Functions), and “room
the behavior of the visitors is captured by means of a video cam-
spatialization”, including techniques such as Vector Based Ampli-
era and then analyzed in order to control both visual and sound
tude Panning (VBAP), Wave Field Synthesis, Ambisonics... [24].
rendering modules. HUM was presented in BRASS cultural cen-
Improving such techniques was out of the scope of this project, we
ter (Forest, Belgium) in May 2009 and will be part of Digital Arts
focused on the integration of sound spatialization inside Isadora
festival in Brussels during fall 2009.
[27] according to the movement of the visitors. The minimal num-
ber of loudspeakers can be theoretically determined specifically to
some spatialization techniques, especially Ambisonics. We opted
a 5-speaker system (a quadriphony - 4 speakers in a square evenly
spaced around the listeners’ ears in the same plane - and a zenith
speaker) that we found cost-effective in terms of hardware and
software.
Spatialization is difficult in Isadora as there is no way to con-
nect the sound output of a movie to physical sound outputs. In-
deed, neither “Movie Player” nor the “Sound Movie Player” ob-
jects have a sound output! Rather, the sound is sent directly to
only one output, the default one for “Movie Player” and one that
can be chosen for the “Sound Movie Player”. The only way to
send that same movie sound simulateously to other sound output
channels is to add, for each additional channel, a “Sound Movie
Player” that plays the sound of the movie.
We therefore planned to use the Zirkonium [31], a software de-
veloped at ZKM that allows to place the loudspeakers in a virtual
space and then set the coordinates of the desired sound position us-
ing OSC messages and let the algorithm compute the volumes to
each speaker. The Zirkonium is a stand-alone application that cre-
ates a virtual interface driver that programs can connect to in order
Figure 5: Generic architecture of HUM
to send it their sound channels. We have used it successfully before
with Max/MSP. Zirkonium then outputs the spatialized sounds to
During this three months project we mainly focused on two
the sound interface defined by the user in the preferences settings.
blocks of the pipeline introduced in Fig.5 :
Channels can be freely assigned to individual speakers. We man-
aged to generate the needed OSC messages from inside Isadora to
• the low-level video analysis, which takes as input the video
flux grabbed by a camera and provides a low-level descrip-
move the sound position in Zirkonium.
tion of the scene, mainly corresponding to the position of
Unfortunately, in Isadora, in order to choose a sound output,
each visitor moving in the scene (i.e. ’blob’)
you have to go to the "sound Output Setup" to define the map-
ping between the internal Isadora sound channel outputs to virtual
• a high-level long-term analysis of the scene, which takes as
input the result of the low-level analysis i.e. the position
external channels. The conversion to the “real channels” of an in-
of each blob detected in the scene. The aim of this second
terface can only be done in Mac OS “Audio MIDI Setup”: Isadora
stage of analysis is to provide a more precise description
can only access the interface defined as the “Default Output” in
of the scene, by considering the temporal evolution of one
Mac OS “Audio MIDI Setup”.
blob behavior in several time spans.
As the Zirkonium driver is a virtual interface, it would mean
choosing as Mac OS default sound output a virtual interface rather
In the following sections, we will first describe the low-level
than to a physical one. It therefore proved impossible to connect
analysis module, developed in Java, and then the mid-level anal-
the output of Isadora to the input of the Zirkonium. Every attempt
ysis modules written in Max-MSP, a programming environment
was granted with a crash.
dedicated to audio and video interactive applications and widely
used in digital arts community. The communication between each
A possible solution would be to spatialize sounds in Max/MSP,
of these modules is based on the Open Sound Control protocol
with or without the Zirkonium. But that would mean that all sound
(OSC).
files should be extracted from the video files and that the engine
that selects the videos, depending on the criteria defined elsewhere
in this document, would have to synchronously start the video se-
4.1. Main contribution: low-level analysis by video motion track-
quences in Isadora and the sound sequences in Max/MSP. This
ing
hasn’t been done yet.
By low-level analysis we mean the analysis of the video stream
We also considered using audioTWIST [25] and its succes-
grabbed by a camera placed on top of the scene and capturing the
sor Audioscape [29, 2], but the current version weren’t satisfying
visitors. A video motion tracking analysis is achieved and pro-
enough due to extensive compilation issues.
vides a number of basic features characterizing the scene, such as
12

QPSR of the numediart research program, Vol. 2, No. 5, March 2009
the position of each detected blob, its size etc... By video motion
4.1.2. Step 1: pixels comparison
tracking, we mean the process of locating one or several moving
objects in time using a camera. An algorithm analyses the video
The reference image and the current image are represented as an
frames and outputs the location of moving targets within the video
array of pixels, each pixel being characterized by three value cor-
frame.
responding to its red, green and blue levels. Each pixel of the cur-
For the creation of HUM, F. Zajéga has implemented, in col-
rent frame and the reference image are compared and a threshold,
laboration with numediart researchers, a video motion-tracking
which can be set in the graphical user interface of the application
module. This module can be used as a standalone application and
(Fig.9), is used to determine if a pixel is considered as different or
is able to communicate with others programs with the Open Sound
identical than its parent in the reference image. A pixel is consid-
Control protocol. It is written in Java, and both the application and
ered different if one of these conditions is true:
the source code have been made publicly available at the end of
the project on the numediart website.
• if the absolute value of the difference between the current
pixel red or green or blue value and the reference pixel red
We give here a brief technical description of this module; as
or green or blue is bigger than the threshold;
shown in Fig.6, the process is divided in five steps:
1. Initialization of the process, which mainly consists in cre-
• or if the absolute value of the addition of two differences
(red+green for instance) is bigger than the threshold
ation a reference image that will be used for subtracting the
background of the incoming stream.
If at least one of these conditions is true, the pixel is considered as
2. Background subtraction: the incoming image is compared
different than the reference and also set as "active". This implies
with the reference image in order to find elements having
the activation of the second step.
appeared.
3. Pixels grouping: pixels are grouped in cells according to a
4.1.3. Step 2: pixels grouping
grid over the image
The image is virtually covered by a grid. When an active pixel is
4. Cells grouping: cells of pixels are grouped in ’cells groups’
detected, it is placed in one of the cells of this grid. A cell is not
5. Cells groups tracking: cells groups are then compared to
only an array of pixels and includes additional information:
the ones of the previous frame
• It has a short memory (several frames) used to smooth its
’occupation’. The occupation of one cell is a division of
the number of pixels it contains on the size of the cell. The
short-term memory is a way to know if the cell is getting
more or less active, depending on the rise or the diminution
of pixels number.
• It contains also the position x and y of the extreme pixels
(top, right, bottom and left).
• It obviously contains the information of the current occupa-
tion of the cell, and the average of the red, green and blue
value of the pixels.
Once last pixel is compared, the application jumps to step 3.
4.1.4. Step 3: cells grouping
This step consists in grouping cells of pixels in entities called "cells
groups" or "blobs". To start a cells group, there must be at least
one pixel in the cell. Once the group is setup, any cell containing
pixels or still active (due to the smoothing, a cell can be active even
if it is empty) can respond positively to the test explained here
below. Starting with the most active cell, the grouping function
searches around the current cell if there are other not empty cells
next to it. There are eight possible positions: above, sharing right-
Figure 6: Overview of the video tracking analysis
top corner, right, sharing right-bottom corner, bottom, sharing left-
bottom corner, left and sharing left-top corner. The analysis is
This chain, except the initialization, is executed for each in-
made in this order, each position represented by a letter from A
coming frame. Here are some details about each of these steps:
to H. If it finds some, it stores the cells in the cells group. Once
no more cells are found in the neighborhood of all the cells stored
4.1.1. Step 0: initialization
in the cells group, the group is closed and, if it remains active
cells, the function declares a new cells group. A cells group is
The initialization phase mainly consists in creating a reference im-
then encapsulated in a rectangular bounding box (Fig.8), as it is
age that will be used in the sequel for subtracting the background
done in most of the motion tracking systems. The cells group also
of the incoming stream. This can also be done again in case of
contain average of the features of each cell and position of the
changes in lightning conditions in the scene.
extreme points of the bounding box (Fig.7).
13

QPSR of the numediart research program, Vol. 2, No. 5, March 2009
Name of the feature
Description
box_center_X
Coordinates of the center of the bounding box
box_center_Y
box_top_X
Coordinates of the top-most point of the blob
box_top_Y
box_bottom_X
Coordinates of the bottom-most point of the blob
box_bottom_Y
box_left_X
Bounding box
Coordinates of the left-most point of the blob
box_left_Y
box_right_X
Coordinates of the right-most point of the blob
box_right_Y
Barycenter_X
Coordinates of the barycenter of the blob (active cells only)
Barycenter_Y
Barycenter_Pos_X
Coordinates of the barycenter of the newly active cells
Barycenter_Pos_Y
cells)
Barycenter_Neg_X
Blob (active
Coordinates of the barycenter of the newly inactive cells
Barycenter_Neg_Y
Figure 8: Result of a motion tracking analysis
NumOfCells
Number of active cells of the blob
Size
NumOfPixels
Number of pixels of the blob
Occupation
Number of active cells divided by area of the blob
Occupation_Pos
Number of newly active cells divided by area of the blob
tion
Occupa
Occupation_Neg
Number of newly inactive cells divided by area of the blob
Age Lifetime
Number of frames the blob is active
Figure 9: Snapshot of the video motion tracking module
4.2.1. Preprocessing of the data
We develop some tools in the Max/MSP programming environ-
ment, mainly patches and abstractions, for the long-term analy-
Figure 7: Features provided by the video motion-tracking analyser
sis of some motion features extracted by the tracking module. By
long-term analysis we mean using basic statistical tools to describe
the evolution of a variable during a certain time span. It could also
4.1.5. Step 4: blobs tracking
be interesting to analyze the same feature on different time spans.
The first step of the process is a smoothing of the data coming
At the beginning of the step 3, the previous cells groups are saved.
from the video analysis module. Typically these may be noised by
Once the cells grouping process is finished, the new cells groups
some jitter caused by changes of lightning conditions on the room
are compared to the former ones. The mapping is done on basis of
and need to be cleaned before further analysis. The smoothing step
the distance of each border and of the center. The cells group id is
consists of a median filtering followed by a low-pass filtering. The
adapted and the lifetime of the cells group incremented.
median filter is a non-linear digital filtering technique, often used
This application has been implemented in Java, and is based
in image processing to remove noise from signals. The idea is to
on libraries from the programming language and integrated devel-
examine a sample of the input and decide if it is representative of
opment environment (IDE) Processing [22]. Processing is widely
the signal. This is performed using a sliding window consisting of
used by the interactive arts community and builds on the graph-
an odd number of samples. The values in the window are sorted
ical capabilities of the Java programming language, simplifying
into numerical order; the median value, the sample in the center of
features and creating a few new ones. A Graphical User Interface
the window, is selected as the output (cf. Fig.10).
allows to easily set some parameters of the application, such as the
The output of the median filter is then smoothed using a first-
sensitivity of the video segmentation, the minimum size of a blob
order lowpass filter, where the current output sample y(n) depends
and the maximum number of blobs detected in the scene (Fig. 9).
on both the current input x(n) and the previous output sample y(n-
1):
4.2. Main contribution: high-level analysis
y(n) = y(n − 1) + ((x(n) − y(n − 1))/α),
(1)
The second part of this project aimed to develop tools for a high-
alpha being a parameter to tune the smoothing effect of the filter.
level long-term analysis of the scene grabbed by the video camera.
This smoothing stage is implemented in Max-MSP using the
It takes as inputs the positions of the blobs detected by the motion
ej.mmmm object developed by Emmanuel Jourdan for the median
tracking analyser (low-level analysis) described above and pro-
filtering and the slide object for the low-pass filtering. It allows to
vides features characterizing the temporal evolution of the blobs
remove noise and artifacts introduced in the video analysis stage
behavior on different time spans.
(Fig.11)
14

QPSR of the numediart research program, Vol. 2, No. 5, March 2009
Figure 10: A median filter is used to remove noise in a signal
Figure 13: A preprocessing stage is used to smooth data coming
from the motion tracking analyzer before mid-level analysis
is to compute the statistical moments of the sequence. The most
relevant moments are the mean and the standard deviation. The
median of the sequence could be also useful. The mean and stan-
Figure 11: A preprocessing stage is used to smooth data coming
dard deviation of a sequence are given by the following formulas
from the motion tracking analyzer before mid-level analysis
respectively:
n
1
¯
x =
xi
(2)
4.2.2. Long-term analysis
n · X
i=1
Once the data streamed from the motion-tracking module are smoothed,
N
the next step of the analysis chain consists in buffering the data in
v
σ = u
(xi
a fixed length analysis window. The length of the window can be
u
t1N X −¯x)2
(3)
i=1
parameterized and corresponds to the analysis time-span. The data
buffering works as a shifting register (Fig.12): at the beginning of
The mean describes the central location of the data, whereas
the analysis the window is filled by the incoming data, and once
the standard deviation describes their dispersion/variability in the
the window is full, a first in-first out strategy is used to manage the
sequence. A low standard deviation indicates that the data points
input/output of the window: the incoming data takes the first place
tend to be very close to the same value (the mean), while high
of the window each data is shifted and the oldest value is dropped.
standard deviation indicates that the data are ’spread out’ over a
This process enables to maintain a short-term memory of the sys-
large range of values.
tem and gives it a reminiscence of the behavior of a variable. In
Another useful tool for characterizing the evolution of a se-
the following, this is referred as the "sliding analysis window".
quence of data is its histogram. The histogram of a sequence is a
The simplest way to describe the data in the analysis window
summary graph showing a count of the data points falling various
ranges. It provides a rough approximation of the frequency dis-
tribution of the data. No tool was available in the Max-MSP pro-
gramming environment for a real-time histogram-based analysis
of an incoming stream of data; we also developed an abstraction
num.histo relying on objects natively available in the Max-MSP
distribution and third part objects publicly available. In the future,
we plan to write a Max external in order to optimize the computa-
tion of the histogram. The values of the histogram are normalized
between [0-1] so that individual bins represent the fraction of the
total number of events assigned to the entire histogram. It is also
possible to threshold the histogram so that all bins whose value is
below the threshold factor are set to 0. Finally the abstraction pro-
vides the n first minima and maxima of the histogram, and the bins
associated to those values (Fig.13).
This tool has been used in HUM for analyzing the way a visitor
Figure 12: A preprocessing stage is used to smooth data coming
was occupying the space during a certain amount of time (i.e. the
from the motion tracking analyzer before mid-level analysis
sliding analyzing window). By computing the histogram of the
15

QPSR of the numediart research program, Vol. 2, No. 5, March 2009
position taken by the visitor within the analysis window, we were
able to provide a cartography of ’cumulated space occupation’ of
a visitor or a group of visitors. The scheme of this kind of analysis
is described in Fig.14: first the space, considered as a 2-D plane,
is divided in a grid of n x n cells, and position (x,y) of a blob is
converted in a z position in the grid defined as z = x + n*y (step 1).
Z positions are then stored in a sliding analysis window in order
to provide a memory of the last positions of the studied blob (step
2), and a histogram of this sequence of positions is computed for
each incoming video frame (step 3). This histogram is then used to
construct a map of the areas which have been occupied by the blob
within the analysis window. This map provides for each cell of the
grid - i.e. corresponding to each bin of the histogram - the number
of time the visitor has occupied this position within the analysis
window (step 4); we used a display inspired by meteorological
cartography to visualize the result of this analysis: the space is
visualized as a 2-D plane, and a color is associated to each cell
of the grid following the amount of time it has been occupied; a
black cell corresponds to a cell which has never been occupied,
a red one to a cell occupied during a long-time. This tool allows
to characterize the displacement of a visitor in the installation by
providing information such as:
• has the visitor stayed in a close area or has he/she visited a
Figure 14: A preprocessing stage is used to smooth data coming
large part of the space
from the motion tracking analyzer before mid-level analysis
• which places have been more occupied by the visitor
• is the current position a ’hot’ or ’cold’ position, i.e. a posi-
6. ACKNOWLEDGMENTS
tion which has been a lot occupied or not
This work has been supported by the numediart research project,
Those information characterize the displacement of one visi-
funded by Région Wallonne, Belgium (grant N◦716631).
tor in the installation but one could also apply the same kind of
We would like to thank all the artists that made this collabora-
analysis to a group of visitors. These information can then be used
tion possible, in order of appearance: Thomas Israel [15], François
by the artist in the design of the interactions. In HUM, the result of
Zajéga [30], Laura Colmenares Guerra [14] and Thierry Sablon
this analysis was used to control sound spatialization parameters
[23].
and triggering of sound samples. We plan to keep on investigating
We warmly greet Roger Burton for our welcoming at BRASS,
more complex strategies of mapping relying on this analysis tool
granting us residencies for both artistic projects, a venue and proper
in future versions of the installation.
catering for the end project presentation.
We gratify Rémy Labbé, M.Sc. student at UCL-TELE, for
his help and participation throughout the project workshop, when
5. PERSPECTIVES
trying to find the best-matching sound spatialization software.
We also want to thank Paul Bourke and Sebastien Roy for giv-
A 3-month project is quite short to cover such a long list of tasks.
ing us time to discuss about dome projection.
We hope that our technological forecasting will prove to be helpful
to the artists for their ongoing efforts to finalize all the modules of
their installations.
7. REFERENCES
We need to implement an OSC communication in the next Hy-
Forge applications so as to allow such artistic projects to benefit
7.1. Scientific references
from our future progress in the field, here enabling to query simi-
[4] Paul Bourke. “Digital Fulldome, Techniques and Tech-
lar videos directly from the Isadora framework.
nologies”. In: Course prepared for Graphite (ACM Sig-
We will continue the development of the MediaCycle software
graph). 2007. URL: http : / / local . wasp . uwa .
that was used to compute similarities between videos. In particular
edu . au / ~pbourke / papers / graphite2007 /
we will explore other descriptors of similarity between videos, as
graphite2007.pdf. P.: 11.
well as means of organizing more efficiently the videos when the
[5] Paul Bourke. “Using A Spherical Mirror For Projec-
database is very large.
tion Into Immersive Environments”. In: Proceedings of
HUM has already been presented publicly at BRASS in May
the 3rd international conference on Computer graphics
2009, and an improved version of the installation will be presented
and interactive techniques in Australasia and South East
by the end of this year in Brussels. The major improvements will
Asia, Graphite (ACM Siggraph). 2005. URL: http://
aim to enrich the interaction between the visitor and the resulting
local.wasp.uwa.edu.au/~pbourke/papers/
sound and image. The possibility of using algorithms inspired by
graphite2005/graphite.pdf. P.: 11.
artificial life [12, 16] to educate the system will also be explored
by François Zajéga and the numediart team, who plan for the
future to keep working together in a close collaboration.
16

QPSR of the numediart research program, Vol. 2, No. 5, March 2009
[7] Alvaro Cassinelli and Masatoshi Ishikawa. “Khronos Pro-
[6] Alvaro Cassinelli. The Khronos Projector: a video time-
jector”. In: International Conference on Computer Graph-
warping machine with a tangible deformable screen. On-
ics and Interactive Techniques, ACM SIGGRAPH Emerging
line description containing videos, slides and a Processing
technologies. 2005. P.: 9.
demo applet. URL: http://www.k2.t.u-tokyo.
[9] Luc Courchesne, Guillaume Langlois, and Luc Mar-
ac.jp/members/alvaro/Khronos/. P.: 9.
tinez. “Where are you?: an immersive experience in the
[8] Daryl Cloran, Anita Doron, and Mateo Guez. Late Frag-
panoscope 360”. In: Proceedings of the 14th annual ACM
ment. 2009. URL: http://www.latefragment.
International Conference on Multimedia (MM). 2006. P.:
com. P.: 9.
11.
[14] Laura Colmenares Guerra. URL: http://www.ulara.
[10] Jean-Marie Dallet, Christian Laroche, and Frédéric Curien.
org. Pp.: 10, 16.
“SLIDERS: a collective experience of interactive cinema”.
[15] Thomas Israel. URL: http://www.thomasisrael.
In: Proceedings of the 14th annual ACM International Con-
be. Pp.: 9, 10, 16.
ference on Multimedia (MM). 2006. P.: 9.
[21] Olivier Meunier. Real Unreal. 2006. URL: http://www.
[12] Dario Floreano and Claudio Mattiussi. Bio-Inspired Artifi-
ogeem.be. Pp.: 9, 11.
cial Intelligence: Theories, Methods, and Technologies. In-
telligent Robotics and Autonomous Agents. The MIT Press,
[23] Thierry Sablon. URL: http://www.tysablon.eu.
2008. ISBN: 9780262062718. P.: 16.
Pp.: 10, 16.
[13] Oliver Grau. Virtual Art: From Illusion to Immersion.
[30] Franccois Zajéga. URL: http://www.frankiezafe.
Leonardo. The MIT Press, 2003. ISBN: 0-262-07241-6. P.:
net. Pp.: 9, 16.
9.
[16] Eduardo Kac, ed. Signs of Life: Bio Art and Be-
7.3. Software and technologies
yond. Leonardo Books. The MIT Press, 2006. ISBN:
9780262112932. P.: 16.
[2] “Audioscape”. URL: http : / / www . audioscape .
org. P.: 12.
[17] Robert Kronenburg. Flexible: Architecture that Responds
to Change. Laurence King Publishers, 2007. ISBN:
[11] Eluminati. “OmniMap API”. URL: http : / / www .
9781856694612. P.: 9.
elumenati.com/products/omnimap.html. P.:
11.
[18] George Legrady and Timo Honkela. “Pockets Full of Mem-
ories: an interactive museum installation”. In: Visual Com-
[22] “Processing IDE”. URL: http://www.processing.
munication 1.2 (2002). Pp. 163–169. P.: 9.
org. P.: 14.
[19] Lev Manovich. The Language of New Media. Leonardo
[25] TOT [Territoires Ouverts - Open Territories]. “au-
Books. The MIT Press, 2002. ISBN: 9780262632553. P.:
dioTWIST”. immersive 3D audio architecture. URL:
9.
http : / / tot . sat . qc . ca / logiciels _
audiotwist.html. P.: 12.
[20] Lev Manovich and Andreas Kratky. Soft Cinema: Navigat-
ing the Database. DVD & Booklet. MIT Press, 2005. ISBN:
[26] TOT [Territoires Ouverts - Open Territories]. “light-
9780262134569. P.: 9.
TWIST”. image deformation for 3D projection.
URL: http : / / tot . sat . qc . ca / logiciels _
[24] Alois Sontacchi and Robert Höldrich. “Getting Mixed up
lighttwist.html. P.: 11.
with WFS, VBAP, HOA, TRM... From Acronymic Ca-
cophony to a Generalized Rendering Toolbox”. In: DEGA
[27] TroikaTronix. “Isadora”. URL: http : / / www .
Wave Field Synthesis Work. 2007. P.: 12.
troikatronix.com/isadora.html. Pp.: 10–12.
[28] Stephen Wilson. Information Arts: Intersections of Art, Sci-
[31] ZKM. “Zirkonium”. URL: http://www.zkm.de/
ence, and Technology. Leonardo. The MIT Press, 2002.
zirkonium. P.: 12.
ISBN: 0-262-23209-X. P.: 9.
[29] M. Wozniewski, Z. Settel, and J.R. Cooperstock. “Au-
dioScape: A Pure Data library for management of virtual
environments and spatial audio”. In: Pure Data Convention.
2007. P.: 12.
7.2. Artistic references
[1] Alterface. The Sphere. 2009. URL: http : / / www .
alterface . com / en / science _ centers / the _
sphere/. P.: 9.
[3] Pascale Barret. Synapse 2.0. Interactive video installation.
URL: http://www.imal.org/synapse/. P.: 9.
17

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
18

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
MEDIACYCLE: BROWSING AND PERFORMING WITH SOUND AND IMAGE LIBRARIES
Xavier Siebert 1, Stéphane Dupont 2, Philippe Fortemps 1, Damien Tardieu 2
1 Laboratoire de Mathématique et Recherche Opérationnelle (MathRo), Faculté Polytechnique de Mons (FPMs), Belgique
2 Laboratoire de Théorie des Circuits et Traitement du Signal (TCTS), Faculté Polytechnique de Mons (FPMs), Belgique
ABSTRACT
• In the main frame, the images are displayed in such a way
The MediaCycle project, as part of numediart’s HyForge re-
that their mutual closeness reflects their similarity, as de-
search axis, aims at developing a novel browsing environment for
fined by the cursor’s weights.
multimedia (sound, images, videos) databases, that offers an al-
ternative to conventional search-by-query. The databases are or-
• The images are clustered according to the abovementioned
distance. The user can navigate in and out one such cluster
ganized so that users can conveniently retrieve the items that they
using the arrows above the interface, on the left.
need. This project is an extension of the software developed in Au-
dioCycle (numediart #4.1), from sound to images. Extensions
• When the mouse hovers on top of an image (as in Fig. 1),
to video databases will be investigated in upcoming projects.
this image becomes instantaneously larger, so that the user
can quickly browse through the whole database.
KEYWORDS
• The bottom right panel (below the cursors, blank on Fig. 1)
MediaCycle, MultiMedia Databases, Content-based Navigation,
serves to display additional information about the images.
Image Features
1. INTRODUCTION
Multimedia database search initially relied on metadata associated
to each media (sound/image,video, . . . ), such as captions or key-
words. This approach suffers from two major drawbacks: tag-
ging is a tedious process (which limits its application to small
databases), and it does not really capture the meaning of the media.
More recently several softwares shifted from a metadata-based ap-
proach to a content-based one, resulting notably in the Query By
Image Content (QBIC) commercial software [6], as well as several
other softwares for image [16, 17, 10] or sound [8] databases.
The MediaCycle project, as part of numediart’s HyForge
research axis, aims at developing a novel browsing environment
that offers an alternative to conventional search-by-query. The
Figure 1: Overview of the MediaCycle Interface
databases are organized so that users can conveniently retrieve the
items (sound, images, videos) that they need. The architecture of
our browsing software is similar to that developed in AudioCycle
Care has been taken in the development of the software to en-
(
sure a common architecture for all media (sound, image, video),
numediart #4.1), extending it from sound to images. It offers a
wide range of potential applications, from browsing a medical im-
with minimal changes in the interface when switching from one
ages library to media art installation and live performances (e.g.,
media to another.
Resolume [15] or Union VJ [18].
In the case of sound and music (AudioCycle), content was re-
3. IMAGE FEATURES
ferring to rhythm, harmony, melody, timbre, . . . , whereas in the
case of images (MediaCycle) it refers to attributes such as color,
Each image contains a wealth of information, that can be readily
shapes, texture, . . . . Possible extensions include navigation in
interpreted by a human eye. However, for the computer an image
video databases, where content can additionally be characterized
is simply a set of pixels with values for each color channel (e.g.,
by camera motion parameters (e.g., zoom, pan), object motion, and
blue, red, green) or grey level. To compare images in terms that are
other dynamic attributes.
interpretable by a person, the corresponding features (e.g. color,
texture, shape) have to be extracted from the image, as described
below.
2. INTERFACE DESIGN
The MediaCycle browsing interface (see snapshot on Fig. 1) con-
3.1. Color
tains the following elements to browse image databases:
As pointed out by a recent review of image features [4], color his-
• The top right corner contains three sliders, labeled shape,
tograms generally provide a simple but efficient way to distinguish
color and texture, that allow the user to define the weights
images. The color range (e.g., from 0 to 255) is partitioned into
of these three attributes (corresponding to rhythm, timber
bins and for each color channel (e.g., blue, red, green) the pixels
and harmony for sounds).
with a color within a range are counted, resulting in a description
19

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
in terms of the relative frequencies of the occurring colors. The
(small R) correspond to bulk shape features, and were used as a
algorithms were implemented using the OpenCV library [2].
complement to the contour moments. The high frequencies corre-
spond to finer details like contours or textures.
3.2. Texture
Texture can be defined as the spatial repetition of a motif, resulting
4. IMAGE DATABASES
in a visual information that can be qualified as grainy, fine, coarse,
Several Image Databases were used for testing purposes.
smooth, marbled, . . . . To describe texture, the image is convoluted
with a set of Gabor wavelets of different orientations and scales,
• Caltech-256 [7]
and the mean and standard deviation of the resulting images are
• Columbia Object Image Library (COIL-100, [14])
calculated. Each wavelet is the product of an oscillating function
and a Gaussian (see Eq. 1). It is used to extract the repeating pat-
• Image Retrieval in Medical Applications (IRMA, [12])
tern (texture) in a given orientation and scale, as in [4]. We used at
• VisTex Texture database [11]
least four different orientations and scales, resulting in a minimum
of 32 features. The wavelet code has been adapted from [20].
σ2
Ψµ,ν(z) = |kµ,ν|2 exp (
)(exp (ik
))
σ2
−|kµ,ν|2|z|2
2σ2
µ,ν z)−exp (− 2
(1)
where
z = (x, y)
kµ,ν = kν exp iφµ
(2)
Figure 4: Illustration of MediaCycle with a subset of the COIL-
Figure 2: Illustration of Gabor Wavelets in two scales and 8 orien-
100 library [14]
tations
4.1. Classification
4.1.1. COIL100
Figure 3: Extraction of textures features from an image by convo-
luting it with Gabor Wavelets in two different orientations.
Figure 5: Clustering of images from the COIL-100 [14] database,
3.3. Shape
using the Fourier descriptors and the color histogram.
To describe shape, we combined two approaches. First, we ex-
tracted the contours of the image and calculated the contour’s Hu
moments [9] using the OpenCV library [2]. Second, we calculated
Simple clustering/classification experiments have been performed
the Fourier transform of the image, which we converted to po-
to test the validity of our descriptors. Fig.5 shows the distance rel-
lar coordinates and grouped into (R, θ) bins. The low-frequencies
ative to a given image, in terms of Fourier descriptors (x axis) and
20

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
color histograms (y axis). Several cluster appear, that correspond
to different views of the same object in the database.
4.1.2. Caltech 101
Classification tasks have been performed on a subset of the Caltech
101 database [5], using descriptors for shape (Hu moments), color
(color histograms) and texture (Gabor wavelets). The database
contains 2364 elements belonging to one of 25 classes (e.g., planes,
accordion, cell phones, crocodiles, . . . ). We used the SVM clas-
sifier (S-SVC with a polynomial kernel) provided by LibSVM [3]
in the Weka software [19]. The results are obtained by a 10-fold
cross validation procedure. Mean results are listed in Table 1.
Global Accuracy
0.53
Recall
0.31
Precision
0.3
F-Measure
0.3
Table 1: Classification results on a subset of the Caltech database.
Figure 6: Features extracted with the SIFT algorithm [13]. Arrow
are located at key points in the image, and the size of the arrow
reflects the size of the region associated to each key point.
Global accuracy means that 53% of the elements are correctly
classified. The mean recall (i.e., the mean of the recall of each
class) is a more relevant measure since it is not sensitive to the class
repartition. Individual class recall goes from 0.1 to 0.9 showing a
user interface (sliders, menus, . . . ) has been developed on Mac-
great disparity in the results. For instance planes and accordions
intosh computers using the Cocoa environment. Future develop-
are very well recognized (more than 0.9), whereas ant or barrel
ments include the rewriting of the Mac-specific components in a
only obtain 0.1. Nevertheless all the classes are recognized better
portable language to make the MediaCycle software fully cross-
than chance. Those simple tests show the ability of the descriptors
platform.
to extract relevant information from images.
5.4. Link with upcoming numediart projects
5. DISCUSSION AND PERSPECTIVES
Several upcoming projects are building upon the MediaCycle project,
5.1. Other Features
notably:
We have been exploring other features, notably the Scale-Invariant
• (AV)LaughterCycle will use some of the AudioCycle/Media-
Cycle feature extractors
Feature Transform(SIFT, [13]) and Speeded-Up Robust Features
(SURF, [1]). These algorithms detect local features in images, by
• incorporating video analysis will lead to the eNTERFACE09
convoluting it with Gaussian filters at different scales to extract
project (DancersCycle / Video Navigation Tool), as appli-
key points. We incorporated the SIFT algorithm to our feature ex-
cation for browsing a database of dancers’ performances.
tractor (e.g., Fig.6). However, different images will have different
number of key points, and a method to compare them on a com-
6. ACKNOWLEDGMENTS
mon ground still needs to be implemented in our software.
numediart is a long-term research program centered on Digital
Media Arts, funded by Région Wallonne, Belgium (grant N◦716631).
5.2. Semantic Gap
One well-known problem in image retrieval procedures is the so-
7. REFERENCES
called “semantic gap”, i.e. the image that the user has in mind
is at a higher semantic level than the features extracted from the
[1] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. “SURF:
image. Methods for taking into account the user’s preferences are
Speeded-Up Robust Features”. In: 9th European Confer-
currently being investigated. One option is to increase the weight
ence on Computer Vision. Graz, Austria 2006. P.: 21.
of the features that match the user’s choices, and to decrease the
[2] Gary Bradski and Adrian Kaehler. Learning OpenCV:
weight of those which do not. To enhance the browsing process,
Computer Vision with the OpenCV Library. Cambridge,
we have started to investigate possible incorporations of graph-
MA: O’Reilly, 2008. P.: 20.
theoretic algorithms.
[3] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a li-
brary for support vector machines. Software available
5.3. Software Portability
at http : / / www . csie . ntu . edu . tw / ~cjlin /
libsvm. 2001. P.: 21.
The core algorithms as well as the OpenSceneGraph display are
architecture-independent, but some components of the graphical
21

QPSR of the numediart research program, Vol. 2, No. 1, March 2009
[4] Thomas Deselaers, Daniel Keysers, and Hermann Ney.
[19] Ian H. Witten and Eibe Frank. Data Mining: Practical Ma-
“Features for Image Retrieval: An Experimental Compar-
chine Learning Tools and Techniques. 2nd ed. San Fran-
ison”. In: Information Retrieval 11.2 (2008). Pp. 77–107.
cisco: Morgan Kaufmann, 2005. P.: 21.
Pp.: 19, 20.
[20] Mian Zhou and Hong Wei. “Face Verification Using Gabor
[5] L. Fei-Fei, R. Fergus, and P. Perona. “Learning generative
Wavelets and AdaBoost”. In: Pattern Recognition, Interna-
visual models from few training examples: An incremen-
tional Conference on 1 (2006). Pp. 404–407. ISSN: 1051-
tal bayesian approach tested on 101 object categories”. In:
4651. P.: 20.
Computer Vision and Image Understanding 106.1 (2007).
Pp. 59–70. P.: 21.
[6] Myron Flickner et al. “Query by Image and Video Content:
The QBIC System”. In: Computer 28.9 (1995). Pp. 23–32.
ISSN: 0018-9162. P.: 19.
[7] G. Griffin, A. Holub, and P. Perona. Caltech-256 Ob-
ject Category Dataset. Tech. rep. 7694. California Insti-
tute of Technology, 2007. URL: http://authors.
library.caltech.edu/7694. P.: 20.
[8] S. Heise, M. Hlatky, and J. A . Loviscach. “SoundTorch:
Quick Browsing in Large Audio Collections”. In: AES
125th Convention. 2008. P. 7544. P.: 19.
[9] Ming K. Hu. “Visual Pattern Recognition by Moment In-
variants”. In: IRE Transactions on Information Theory IT-8
(1962). Pp. 179–187. P.: 20.
[10] Jorma Laaksonen et al. “PicSOM—content-based image re-
trieval with self-organizing maps”. In: Pattern Recogn. Lett.
21.13-14 (2000). Pp. 1199–1207. ISSN: 0167-8655. P.: 19.
[11] MIT Media Lab. VisTex: Vision Texture Database. Main-
tained by the Vision and Modeling group at the MIT Media
Lab. web page:
http://vismod.media.mit.edu/vismod/.
1995. URL: http://vismod.media.mit.edu/
vismod / imagery / VisionTexture / vistex .
html. P.: 20.
[12] Thomas M. Lehmann et al. “The IRMA Project: A State of
the Art Report On Content-Based Image Retrieval in Medi-
cal Applications.” In: In Korea-Germany Workshop on Ad-
vanced Medical Image. 2003. Pp. 161–171. P.: 20.
[13] David G. Lowe. “Object Recognition from Local Scale-
Invariant Features”. In: ICCV ’99: Proceedings of the Inter-
national Conference on Computer Vision-Volume 2. Wash-
ington, DC, USA: IEEE Computer Society, 1999. ISBN:
0769501648. P.: 21.
[14] S. A. Nene, S. K. Nayar, and H. Murase. Columbia Object
Image Library (COIL-100). Tech. rep. Columbia Univer-
sity, 1996. P.: 20.
[15] Resolume Software. URL: http://www.resolume.
com. P.: 19.
[16] Stan Sclaroff, Leonid Taycher, and Marco La Cascia. “Im-
ageRover: A content-based image browser for the world
wide web”. In: In Proc. IEEE Workshop on Content-based
Access of Image and Video Libraries. 1997. Pp. 2–9. P.: 19.
[17] John R. Smith and Shih F. Chang. “VisualSEEk: a fully au-
tomated content-based image query system”. In: Proceed-
ings of the fourth ACM international conference on Multi-
media. ACM Press, 1996. Pp. 87–98. P.: 19.
[18] Union
VJ
software.
URL:
http : / / www .
lividinstruments.com. P.: 19.
22

Document Outline