Keynote Speakers


Résultat de recherche d'images pour "François Bremond"

François Bremond

 Research Center

INRIA, France

Nicolas Dobigeon

 University of Toulouse, France

Nicolas Gillis

University de Mons, Belgium

Giovanni Maria Farinella

University of Catania, Italy

Jean-Mark Odobez

Idiap Research Institue, Switzerland

Fabio Solari

University of Genoa, Italy

Dimitri Ognibene

University of Essex, and Universita’ Milano-Bicocca

Kings College London


Keynote I by Prof. Giovanni Maria Farinella (scheduled  at Wednesday 9 December 9:30 am- 10:20 am)

Title:  Future Predictions in Egocentric Vision


The ability to predict the future is fundamental for humans to explore the world, learn, perceive, navigate, and act. It is hence expected that future AI systems shall be able to reproduce such abilities, reasoning about the world in terms of events and stimuli which live in the near or distant future. Predictive abilities are also fundamental for wearable systems to understand the user’s short- and long-term goals, offer appropriate guidance based on the user’s objectives, and improve user’s safety anticipating future actions. In this talk, we will present recent research on future predictions from egocentric video which has been carried out at the Image Processing Laboratory (IPLAB) at the University of Catania, Italy. We will first introduce the main motivations behind research on egocentric perception, then discuss approaches to predict future interacted objects and actions from egocentric video. The talk will also focus on the relevant datasets to support the study of future prediction tasks from egocentric video, such as the EPIC-KITCHENS series of datasets and challenges, as well as our newly introduced MECCANO dataset for studying human-object interaction recognition and future prediction in industrial-like scenarios.

Giovanni Maria Farinella biography:

Giovanni Maria Farinella is an Associate Professor at the Department of Mathematics and Computer Science, University of Catania, Italy. His research interests lie in the fields of Computer Vision and Machine Learning with focus on First Person (Egocentric) Vision. He is author of more than 120 papers in international book chapters, journals and conference proceedings, and co-inventor of 6 patents involving industrial partners. Dr. Farinella serves as a reviewer and on the programme committee board of major international journals and conferences (CVPR, ICCV, ECCV, BMVC). He has been Area Chair for ICCV 2017/19, CVPR 2020/21, Video Proceedings Chair for ECCV 2012 and ACM MM 2013, Guest Editor for Special Issues on Computer Vision and Image Understanding, Pattern Recognition Letters, and IEEE Journal of Biomedical and Health Informatics.  He is currently Associate Editor of the international journals IEEE Transactions Pattern Analysis and Machine Intelligence (2019- ) Pattern Recognition (2017- ) and IET Computer Vision (2015- ). Dr. Farinella founded (in 2006) and currently directs the International Computer Vision Summer School (ICVSS). He was awarded the PAMI Mark Everingham Prize in October 2017.



Keynote II by Prof. Dimitri Ognibene (scheduled at Wednesday 9 December 2:00 pm-2:50 pm)

Title: Adaptive Vision for Human Robot Collaboration


Unstructured social environments, e.g. building sites, release an overwhelming amount of information yet behaviorally relevant variables may be not directly accessible.
Currently proposed solutions for specific tasks, e.g. autonomous cars, usually employ over redundant, expensive, and computationally demanding sensory systems that attempt to cover the wide set of sensing conditions which the system may have to deal with.
Adaptive control of the sensors and of the perception process input is a key solution found by nature to cope with such problems, as shown by the foveal anatomy of the eye and its high mobility and control accuracy. The design principles of systems that adaptively find and select relevant information are important for both Robotics and Cognitive Neuroscience.
At the same time, collaborative robotics has recently progressed to human-robot interaction in real manufacturing. Measuring and modeling task specific gaze behaviours is mandatory to support smooth human robot interaction. Indeed, anticipatory control for human-in-the-loop architectures, which can enable robots to proactively collaborate with humans, heavily relies on observed gaze and actions patterns of their human partners.
The talk will describe several systems employing adaptive vision to support robot behavior and their collaboration with humans.

Biography of Dimitri Ognibene:

Dimitri Ognibene is Associate Professor of Human Technology Interaction at University of Milano-Bicocca, Italy. His main interest lies in understanding how social agents with limited sensory and computational resources adapt to complex and uncertain environments, how this can induce suboptimal behavior such as addiction or  antisocial behaviors, and how this understanding may be applied to real life problems. To this end he develops both neural and Bayesian models and applies them both in physical, e.g. robots, and virtual, e.g. social media, settings.  Before  joining Milano Bicocca University, he was at the University of Essex as Lecturer in Computer Science and Artificial Intelligence from October 2017 having moved from University Pompeu Fabra (Barcelona, Spain) where he was a Marie Curie Actions COFUND fellow. Previously he developed algorithms for active vision in industrial robotic tasks as a Research Associate (RA) at Centre for Robotics Research, Kings’ College London; He developed Bayesian methods and robotic models for attention in social and dynamic environments as an RA at the Personal Robotics Laboratory in Imperial College London. He studied the interaction between active vision and autonomous learning in neuro-robotic models as an RA at the Institute of Cognitive Science and Technologies of the Italian Research Council (ISTC CNR). He also collaborated with the Wellcome Trust Centre for Neuroimaging (UCL) to study how to model exploration in the active inference modelling paradigm. He has been Visiting Researcher at Bounded Resource Reasoning Laboratory in UMass and at University of Reykjavik (Iceland) exploring the symmetries between active sensor control and active computation or metareasoning. He obtained his PhD in Robotics in 2009 from University of Genoa with a thesis titled “Ecological Adaptive Perception from a Neuro-Robotic perspective: theory, architecture and experiments” and graduated in Information Engineering at the University of Palermo in 2004. He is handling editor of Cognitive Processing, review editor for Paladyn – The journal of Behavioral Robotics, Frontiers Bionics and Biomimetics, and Frontiers Computational Intelligence in Robotics, guest associate editor for Frontiers in Neurorobotics and Frontiers in Cognitive Neuroscience. He has been chair of the robotics area of several conferences and workshops.


Keynote III by Prof. François Bremond (scheduled at Thursday 10 December 9:00 am-9:50 am)

Title: Video Analytics for People Monitoring


In this talk, we will discuss how Video Analytics can be applied to human monitoring in general within a video camera network. Specifically, we will present an efficient technique for people detection and several techniques for tracking people in multi-camera settings (People Re-Identification). In particular, we will present several categories of algorithms to recognize human identity based on CNN, with more or less supervision performed during training. We will then present several techniques for the recognition and detection of human activities from 2D video cameras. With the emergence of deep learning and large-scale datasets from internet sources, substantial improvements have been made in video understanding. For instance, state-of-the-art 3D convolutional networks like I3D pre-trained on huge datasets like Kinetics have successfully boosted the recognition of actions from internet videos, but challenges remain to address Activities of Daily Living (ADL), such as – (i) fine-grained actions with short and subtle motion like pouring grain and pouring water, (ii) actions with similar visual patterns differing in motion patterns like rubbing hands and clapping, and finally (iii) long complex actions like cooking. In order to address these challenges, we will discuss 1) multi-modal fusion strategy, 2) pose driven attention mechanism and 3) a Temporal Model to represent long complex actions which is crucial for ADL. We will illustrate the proposed activity monitoring approaches through several home care application datasets: CAD-120, Toyota SmartHome, NTU-RGB+D, Charades and Northwestern UCLA.

Biography of François Bremond:

François Brémond is a Research Director at Inria Sophia Antipolis-Méditerranée, where he created the STARS team in 2012. He has pioneered the combination of Artificial Intelligence, Machine Learning and Computer Vision for Video Understanding since 1993, both at Sophia-Antipolis and at USC (University of Southern California), LA. In 1997 he obtained his PhD degree in video understanding and pursued this work at USC on the interpretation of
videos taken from UAV (Unmanned Airborne Vehicle). In 2000, recruited as a researcher at Inria, he modeled human behavior for Scene Understanding: perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. He is a co-founder of Keeneo, Ekinnox and Neosensys, three companies in intelligent video monitoring and business intelligence. He also co-founded the CoBTek team from Nice University in January 2012 with Prof. P. Robert from Nice Hospital on the study of behavioral disorders for older adults suffering from dementia. He is author or co-author of more than 250 scientific papers published in international journals or conferences in video
understanding. He has (co)- supervised 20 PhD theses. More information is available at:

Keynote IV by Prof. Jean-Mark Odobez (scheduled at Thursday 10 December 2:00pm – 2:50 pm)

Title: Using less data for training: investigating weak labels and unsupervised training for the robust sound localization and gaze estimation


Supervised training is a comfortable option for training deep learning networks. However, it usually comes at the cost of requiring large labeled training datasets, a resource that might be difficult to produce in practice. For instance, in gaze estimation, creating such data usually requires both the video of the camera sensing the person of interest, and information in the 3D scene (calibrated with the sensing camera) about what the sensed person might be looking at. This means that internet videos or captured videos can not be annotated post-acquisition by humans, limiting the ability to build large scale datasets with hundreds of persons.  Regarding audio localization based on microphone-array, the situation is slightly different: the need for training data arises each time a new device configuration (number of microphones, location, microphone type) is used.
In this talk, I will present our work on these two tasks (sound source and human voice localization, gaze estimation), with an emphasis on neural network architectures and methods relying on synthetic data, weak labels, and unsupervised approaches to address the learning problem, including adaptation to users.


Biography of Jean-Mark Odobez:

He is leading the Perception and Activity Understanding group at the Idiap Research Institue. His main research interests are on human activities analysis from multi-modal data. This entails the investigation of fundamental tasks like the detection and tracking of people, the estimation of their pose or the detection of non-verbal behaviors, and the temporal interpretation of this information in forms of gestures, activities, behavior or social relationships. These tasks are addressed through the design of principled algorithms extending models from computer vision, multimodal signal processing, and machine learning, in particular probabilistic graphical models and deep learning techniques. Surveillance, traffic and human behavior analysis,

Keynote V by Prof. Nicolas Gillis (scheduled at Thursday 10 December 3:00 pm-3:50 pm)

Title: Some recent results on nonnegative matrix factorizations with application in hyperspectral imaging


Given a nonnegative matrix X and a factorization rank r, nonnegative matrix factorization (NMF) approximates the matrix X as the product of a nonnegative matrix W with r columns and a nonnegative matrix H with r rows such that X?WH. NMF has become a standard linear dimensionality reduction technique in data mining and machine learning. Although it has been extensively studied in the last 20 years, many questions remain open. In this talk, we address two such questions. The first one is about the uniqueness of NMF decompositions, also known as the identifiability, which is crucial in many applications. We provide a new model and algorithm based on sparsity assumptions that guarantee the uniqueness of the NMF decomposition. The second problem is the generalization of NMF to non-linear models. We consider the linear-quadratic NMF (LQ-NMF) model that adds as basis elements the component-wise product of the columns of W, that is, W(:,j).*W(:,k) for all j,k where .* is the component-wise product. We show that LQ-NMF can be solved in polynomial time, even in the presence of noise, under the separability assumption which requires the presence of the columns of W as columns of X. We illustrate these new results on the blind unmixing of hyperspectral images.

Biography of Nicolas Gillis:

Nicolas Gillis received the Master’s and Ph.D.\ degrees in applied mathematics from the Universit\’e catholique de Louvain, Louvain-la-Neuve, Belgium, in 2007 and 2011, respectively. He is currently an Associate Professor with the Department of Mathematics and Operational Research, Facult\’e polytechnique, Universit\’e de Mons, Mons, Belgium. His research interests include optimization, numerical linear algebra, machine learning, signal processing, and data mining. Dr.\ Gillis received the Householder award in 2014, and an ERC starting grant in 2015. He is currently serving as an Associate Editor of the IEEE Transactions on Signal Processing and of the SIAM Journal on Matrix Analysis and Applications.

Keynote VI by Prof. Nicolas Dobigeon (scheduled at Friday 11 December 9:00 am)

Title: Fusion-based change detection for remote sensing images of different
resolutions and modalities


Change detection is one of the most challenging issues when analyzing remotely sensed images. It consists in detecting alterations occurred in a given scene from images acquired at different time instants. Archetypal scenarios for change detection generally compare two images acquired through the same kind of sensor, i.e., with the same modality and the same spatial/spectral resolutions. This talk will address the problem of detecting changes between images of different resolutions or modalities. We will show that this challenging task can be tackled from
a fusion perspective.

Biography of Nicolas Dobigeon:

Since 2008, Nicolas Dobigeon has been with Toulouse INP (INP-ENSEEIHT, University of Toulouse) where he is currently a Professor. He conducts his research within the Signal and Communications (SC) group of IRIT and
is an associate member of the Apprentissage Optimisation Complexité (AOC) project-team of CIMI. He currently holds an AI Research Chair at the Artificial and Natural Intelligence Toulouse Institute (ANITI) and he is a Junior Member of the Institut Universitaire de France (IUF, 2017-2022). His recent research activities have been focused on statistical signal and image processing, with a particular interest in Bayesian inverse problems and applications to remote sensing, biomedical imaging and microscopy.

Keynote VI by Prof. Fabio Solari (scheduled at Fraiday 11 December 11:50-12:40 am)

Title: Natural perception in virtual and augmented reality: a computational model


The current virtual and augmented reality (VR and AR) technologies provide new experiences for users, though they might cause discomfort and unnatural perception during the interaction in VR and AR environments. Here, a bio-inspired computational neural model of visual perception for action tasks is considered in order to provide a tool to better design VR and AR systems. In particular, the log-polar mapping, disparity and optic flow computation is presented. Then, such a computational model is exploited to mimic (thus to describe) human behavioral data. By leveraging previous outcomes, we employ the modeled perception to improve the experience in VR and AR environments by implementing a foveated depth-of-field blur.

Biography of Fabio Solari:

Fabio Solari is Associate Professor of Computer Science at the Department of Informatics, Bioengineering, Robotics, and Systems Engineering of the University of Genoa. His research interests are related to computational models of motion and depth estimation, space-variant visual processing and scene interpretation. Such models are able to replicate relevant aspects of human experimental data. This can help to improve virtual and augmented reality systems in order to provide a natural perception and human-computer interaction. He is principal investigator of three international projects: Interreg Alcotra CLIP E-Santé/Silver Economy”, PROSOL Jeune” and PROSOL “Senior”. He has participated in five European projects: FP7-ICT, EYESHOTS and SEARISE; FP6-IST-FET, DRIVSCO; FP6-NEST, MCCOOP; FP5-IST-FET, ECOVISION. He has a pending International Patent Application (WO2013088390) on augmented reality, and two Italian Patent Applications on virtual (No. 0001423036) and augmented (No. 0001409382) reality. More information is available at .