Research Institute and EPFL, Switzerland
IIT Italian Institute of Technology, Italy
Nvidia, Santa Carla
Speaker 1: François Bremond
Talk title: Scene understanding for Activity Monitoring
Abstract: Since the population of the older persons grows highly, the improvement of the quality of life of older persons at home is of a great importance. This can be achieved through the development of technologies for monitoring their activities at home. In this context, we propose activity monitoring approaches which aim at analysing older person behaviors by combining heterogeneous sensor data to recognize critical activities at home. In particular, this approach combines data provided by video cameras with data provided by environmental sensors attached to house furnishings.
There are 3 categories of critical human activities:
- Activities which can be well described or modeled by users
- Activities which can be specified by users and that can be illustrated by positive/negative samples representative of the targeted activities
- Rare activities which are unknown to the users and which can be defined only with respect to frequent activities requiring large datasets
In this talk, we will then present several techniques for the detection of people and for the recognition of human activities using in particular 2D or 3D video cameras. More specifically, there are 3 categories of algorithms to recognize human activities:
- Recognition engine using hand-crafted ontologies based on a priori knowledge (e.g. rules) predefined by users. This activity recognition engine is easily extendable and allows later integration of additional sensor information when needed [König 2015, Crispim 2016, Crispim 2017].
- Supervised learning methods based on positive/negative samples representative of the targeted activities which have to be specified by users. These methods are usually based on Bag-of-Words and CNNs computing a large variety of spatio-temporal descriptors [Bilinski 2015, Das 2017].
- Unsupervised (fully automated) learned methods based on clustering of frequent activity patterns on large datasets which can generate/discover new activity models [Negin 2015].
We will discuss briefly about people detection and tracking, to focus after on activity recognition and the last advances in machine learning and how these new technologies have changed these topics. In particular, we will address what are the impacts of Deep Learning or Machine Learning without Deep Learning (i.e. including classical machine learning) for activity recognition, especially in terms of performance.
We will illustrate the proposed activity monitoring approaches through several home care application datasets:
Short biography: François Brémond is a Research Director at Inria Sophia Antipolis-Méditerranée, where he created the STARS team in 2012. He has pioneered the combination of Artificial Intelligence, Machine Learning and Computer Vision for Video Understanding since 1993, both at Sophia-Antipolis and at USC (University of Southern California), LA. In 1997 he obtained his PhD degree in video understanding and pursued this work at USC on the interpretation of videos taken from UAV (Unmanned Airborne Vehicle). In 2000, recruited as a researcher at Inria, he modeled human behavior for Scene Understanding: perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. He is a co-fonder of Keeneo, Ekinnox and Neosensys, three companies in intelligent video monitoring and business intelligence. He also co-founded the CoBTek team from Nice University in January 2012 with Prof. Robert from Nice Hospital on the study of behavioral disorders for older adults suffering from dementia. He is author or co-author of more than 200 scientific papers published in international journals or conferences in video understanding. He has (co)- supervised 20 PhD theses.
More information is available at: http://www-sop.inria.fr/members/Francois.Bremond/
Speaker 2: Josiane ZERUBIA
Talk title: Marked Point Processes for Object Detection and Tracking in High Resolution Images: Applications to Remote Sensing and Biology
Abstract: In this talk, we combine the methods from probability theory and stochastic geometry to put forward new solutions to the multiple object detection and tracking problem in high resolution remotely sensed image sequences. First, we present a spatial marked point process model to detect a pre-defined class of objects based on their visual and geometric characteristics. Then, we extend this model to the temporal domain and create a framework based on spatio-temporal marked point process models to jointly detect and track multiple objects in image sequences. We propose the use of simple parametric shapes to describe the appearance of these objects. We build new, dedicated energy based models consisting of several terms that take into account both the image evidence and physical constraints such as object dynamics, track persistence and mutual exclusion. We construct a suitable optimization scheme that allows us to find strong local minima of the proposed highly non-convex energy.
As the simulation of such models comes with a high computational cost, we turn our attention to the recent filter implementations for multiple objects tracking, which are known to be less computationally expensive. We propose a hybrid sampler by combining the Kalman filter with the standard Reversible Jump MCMC. High performance computing techniques are also used to increase the computational efficiency of our method. We provide an analysis of the proposed framework. This analysis yields a very good detection and tracking performance at the price of an increased complexity of the models. Tests have been conducted both on high resolution satellite and microscopy image sequences.
Short biography: Josiane Zerubia has been a permanent research scientist at INRIA since 1989 and director of research since July 1995 (DR 1st class since 2002). She was head of the PASTIS remote sensing laboratory (INRIA Sophia-Antipolis) from mid-1995 to 1997 and of the Ariana research group (INRIA/CNRS/University of Nice), which worked on inverse problems in remote sensing and biological imaging, from 1998 to 2011. From 2012 to 2016, she was head of Ayin research group (INRIA-SAM) dedicated to models of spatio-temporal structure for high resolution image processing with a focus on remote sensing and skincare imaging. She has been professor (PR1) at SUPAERO (ISAE) in Toulouse since 1999.
Before that, she was with the Signal and Image Processing Institute of the University of Southern California (USC) in Los-Angeles as a postdoc. She also worked as a researcher for the LASSY (University of Nice/CNRS) from 1984 to 1988 and in the Research Laboratory of Hewlett Packard in France and in Palo-Alto (CA) from 1982 to 1984. She received the MSc degree from the Department of Electrical Engineering at ENSIEG, Grenoble, France in 1981, the Doctor of Engineering degree, her PhD and her ‘Habilitation’, in 1986, 1988, and 1994 respectively, all from the University of Nice Sophia-Antipolis, France.
She is a Fellow of the IEEE (2003- ) and IEEE SP Society Distinguished Lecturer (2016-2017). She was a member of the IEEE IMDSP TC (SP Society) from 1997 till 2003, of the IEEE BISP TC (SP Society) from 2004 till 2012 and of the IVMSP TC (SP Society) from 2008 till 2013. She was associate editor of IEEE Trans. on IP from 1998 to 2002, area editor of IEEE Trans. on IP from 2003 to 2006, guest co-editor of a special issue of IEEE Trans. on PAMI in 2003, member of the editorial board of IJCV from 2004 till March 2013 and member-at-large of the Board of Governors of the IEEE SP Society from 2002 to 2004. She was also associate editor of the on-line resource « Earthzine » (IEEE CEO and GEOSS) from 2006 to mid-2018. She has been a member of the editorial board of the French Society for Photogrammetry and Remote Sensing (SFPT) since 1998, of the Foundation and Trends in Signal Processing since 2007 and member-at-large of the Board of Governors of the SFPT since September 2014. Finally, she has been a member of the senior editorial board of the IEEE Signal Processing Magazine since September 2018.
She was co-chair of two workshops on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR’01, Sophia Antipolis, France, and EMMCVPR’03, Lisbon, Portugal), co-chair of a workshop on Image Processing and Related Mathematical Fields (IPRM’02, Moscow, Russia), technical program chair of a workshop on Photogrammetry and Remote Sensing for Urban Areas (Marne La Vallée, France, 2003), co-chair of the special sessions at IEEE ICASSP 2006 (Toulouse, France) and IEEE ISBI 2008 (Paris, France), publicity chair of IEEE ICIP 2011 (Brussels, Belgium), tutorial co-chair of IEEE ICIP 2014 (Paris, France), general co-chair of the workshop EarthVision at IEEE CVPR 2015 (Boston, USA) and a member of the organizing committee and plenary talk co-chair of IEEE-EURASIP EUSIPCO 2015 (Nice, France). She also organized and chaired an international workshop on Stochastic Geometry and Big Data at Sophia Antipolis, France, in 2015. She was part of the organizing committees of the workshop EarthVision (co-chair) at IEEE CVPR 2017 (Honolulu, USA) and GRETSI 2017 symposium (Juan les Pins, France). She is scientific advisor and co-organizer of ISPRS 2020 congress (Nice, France) and co-technical chair of IEEE-EURASIP EUSIPCO 2021 (Dublin, Ireland).
Her main research interest is in image processing using probabilistic models. She also works on parameter estimation, statistical learning and optimization techniques.
Speaker 3: Jean-Marc Odobez
Talk title: Robust human sensing using deep learning and user-specific models
Abstract: In the future, more and more robots or systems in general will co-exist with humans in dynamic environments. In order for them to understand the social scene, monitor or pro-actively interact autonomously and naturally with people or groups of people, it is crucial to endow them with efficient multi-modal sensing and situated perception capabilities. At the first level, these sensing methodologies should be able to analyse the different streams of information (audio, vision, depth, robot or system state, context) to detect and track the perceived state of people along several dimensions: physical state (location, trajectory, pose, speaking status), communication or social states (engagement, floor control; age, gender), which both can rely on non-verbal cues related to body language, activity state, etc. This information could then be further used to infer higher level states like mood, personality, and state of mind in general.
In this talk, I will present different works around this topic revolving around sound source and human voice localization, 3D head pose tracking under 360 degrees, as well as gaze and attention modeling. Our main goals were to investigate the use of computer vision and deep neural network architectures for the different sensing tasks, including relying on synthetic data for obtaining sufficient amount of training data, or building personalized models through online learning or adaptation, potentially taking advantage of priors on social interactions to obtain weak labels for model adaptation.
Short biography: Dr. Jean-Marc Odobez is the Head of the Idiap Perception & Activity Understanding Group and adjunct faculty at the École Polytechnique Fédérale de Lausanne (EPFL) where he is a member of the school of engineering (STI). He received his Engineer degree from Telécom Bretagne in 1990 and his PhD from Rennes University/INRIA in 1994. His main research interest is the design of multimodal perception systems rooted in signal processing, computer vision, machine learning, or social sciences, for human activity, behavior recognition, human-human or human-robot interactions modeling or understanding. Application domains range from surveillance to media content analysis or social robotics.
He is the author or coauthor of more than 150 papers, and has been the principal investigator of more than 13 European and Swiss projects. He holds several patents in computer vision, and is the cofounder of the Klewel SA (www.klewel.ch) and of the Eyeware SA (eyeware.tech) companies. He is a member of the IEEE, and associate editor of the IEEE Transaction on Circuits and Systems for Video Technology and Machine Vision and Application journals.
Speaker 4: Vittorio Murino
Talk title: Facing the dataset bias: domain adaptation and generalization
Abstract: The ability to generalize to unseen data is one of the fundamental, desired
properties in a learning system, yet having a disruptive importance in actual applications. We recently realized that devising robust models and smart training procedures is not yet sufficient to reach high recognition performance in practical cases. More specifically, a drop in performance is evidenced when a model is trained using a dataset and validated using a different dataset (for the same tasks, e.g., classification), which is the typical situation one may encounter in real scenarios.
This issue is named dataset shift or bias, and depends from the fact that samples from training (called source) and testing (target) datasets are drawn from different distributions, causing the degradation of the achievable accuracy.
Domain adaptation techniques have been recently investigated to tackle this problem.
This talk discusses several domain adaptation approaches to improving the generalization properties of a machine learning system, focusing on neural networks for computer vision tasks. After an overview of the problem and of the related state of the art, we present 2 novel adaptation approaches.
First, we aim at reducing the dataset bias by aligning the probability distributions of the source and target datasets, by performing a mapping of the feature representations in a common embedding and aligning the related 2nd order statistics.
Second, we introduce an algorithm that combines domain invariance and feature augmentation to better adapt models to new domains by relying on adversarial training.
Finally, the general applicability of domain adaptation algorithms is questioned, due to the assumption of knowing the samples of the target distribution a priori. A novel framework is presented to overcome this limitation, where the goal is to generalize to unseen domains by relying only on data from a single source distribution.
Short biography: Vittorio Murino is full professor at the University of Verona, Italy, and director of PAVIS (Pattern Analysis and Computer Vision) department at the Istituto Italiano di Tecnologia. He took the Laurea degree in Electronic Engineering in 1989 and the Ph.D. in Electronic Engineering and Computer Science in 1993 at the University of Genova, Italy. From 1995 to 1998, he was assistant professor at the Dept. of Mathematics and Computer Science of the University of Udine, Italy, and since 1998 he works at the University of Verona. He was chairman of the Department of Computer Science from 2001, year of foundation, to 2007, and coordinator of the Ph.D. program in Computer Science in the same university from 1999 to 2003. Prof. Murino is scientific responsible of several national and European projects, and evaluator of EU project proposals related to several frameworks and programs.
Since 2009, he is working at the Istituto Italiano di Tecnologia in Genova, Italy, leading the PAVIS department. His main research interests include computer vision, pattern recognition and machine learning, more specifically, statistical and probabilistic techniques for image and video processing for (human) behavior analysis and related applications such as video surveillance, biomedical imaging, and bioinformatics. Prof. Murino is co-author of more than 400 papers published in refereed journals and international conferences, member of the technical committees of important conferences (CVPR, ICCV, ECCV, ICPR, ICIP, etc.), and guest co-editor of special issues in relevant scientific journals. He is also member of the editorial board of Computer Vision and Image Understanding, Machine Vision & Applications, and Pattern Analysis and Applications journals. Finally, prof. Murino is senior member of the IEEE and Fellow of the IAPR.
Speaker 5: Ratnesh Kumar
Talk title: Vehicle re-identification for smart cities: a new baseline using triplet embedding.
Abstract: With the proliferation
of surveillance cameras enabling smart and safer
cities, there is an ever-increasing need to re-identify vehicles
across cameras. Typical challenges arising in smart
city scenarios include variations of viewpoints, illumination
and self occlusions. In this talk we will discuss
an exhaustive evaluation of deep embedding losses applied to vehicle
re-identification, and demonstrate that using the best
practices for learning-embeddings outperform most of the
previous approaches in vehicle re-identification, without requiring any explicit multi-view or geometry information.
Short Biography: Ratnesh Kumar is currently a Deep Learning Architect at Nvidia USA from January 2017. He has obtained his PhD from STARS team at Inria, France in Dec 2014. His research focus during PhD was on long term video segmentation using optical flow and multiple object tracking. Subsequently he worked as Postdoc at Mitsubishi Electric Research Labs (MERL) Cambridge, Boston, on detection actions in streaming videos. He also holds Bachelors in Engineering from Manipal University, India and Master of Science from University of Florida at Gainesville, USA.
At Nvidia since 2017, his focus is on leveraging deep learning on hardware accelerated GPU platforms to solve several problems in video analytics ranging from object detection to re-identification and action detection, with low latency and high data throughput.
He is co-author of 9 scientific publications in conferences and journals and has several patents pending.