LDTrack: Dynamic People Tracking by Service Robots using Diffusion Models Angus Fung,
Beno Benhabib,
Goldie Nejat International Journal of Computer Vision, 2024
Paper
We present a novel people tracking architecture for mobile service robots using conditional latent diffusion models, which we name Latent Diffusion Track (LDTrack), to solve the robotic problem of tracking multiple dynamic people under intraclass variations.
We introduce X-Nav, a novel framework for cross-embodiment navigation where a single unified policy can be deployed across various embodiments for both wheeled and quadrupedal robots.
We present MLLM-Search, a novel multimodal language model approach to address the robotic person search problem under event-driven scenarios with incomplete or unavailable user schedules. Our method introduces zero-shot person search using language models for spatial reasoning, a novel visual prompting method generating topological graphs with semantic labels, and an MLLM-based search planner combining region and waypoint planning through our spatial chain-of-thought (SCoT) prompting method.
We introduce a novel Hand-drawn Map Navigation (HAMNav) architecture that leverages pre-trained vision language models for robot navigation across diverse environments, hand-drawing styles, and robot embodiments, even in the presence of map inaccuracies.
Find Everything: A General Vision Language Model Approach to Multi-Object Search Angus Fung,
Daniel Choi,
Haitong Wang,
Aaron Hao Tan CoRL Workshop: Language and Robot Learning, 2024
Under Review at ICRA, 2025
Project Page
/
Paper
We present Finder, a novel approach to the multi-object search problem that leverages vision language models (VLMs) to efficiently locate multiple objects in diverse unknown environments. Our method combines semantic mapping with spatio-probabilistic reasoning and adaptive planning, improving object recognition and scene understanding through VLMs.
Robots Autonomously Detecting People: A Multimodal Deep Contrastive Learning Method Robust to Intraclass Variations Angus Fung,
Beno Benhabib,
Goldie Nejat IEEE Robotics and Autonomation Letters + IROS, 2023
Paper /
Talk /
Abstract /
Poster
We present a novel multimodal person detection architecture to address the mobile robot problem of person detection under intraclass variations (e.g. partial occlusion, varying illumination, pose deformation) by introducing our Temporal Invariant Multimodal Contrastive Learning (TimCLR) method.
Robots Understanding Contextual Information in Human-Centered Environments using Weakly Supervised Mask Data Distillation Daniel Dworakowski,
Angus Fung,
Goldie Nejat International Journal of Computer Vision (IJCV), 2022
Paper
We present the novel Weakly Supervised Mask Data Distillation architecture for autonomously generating pseudo segmentation labels.
A Multi-Robot Person Search System for Finding Multiple Dynamic Users in Human-Centered Environments Sharaf C Mohamed,
Angus Fung,
Goldie Nejat IEEE Transactions on Cybernetics, 2022
Paper /
Video
We present a novel multi-robot person search system to generate search plans for multi-robot teams to find multiple dynamic users before a deadline.
AC/DCC : Accurate Calibration of Dynamic Camera Clusters for Visual SLAM Jason Rebello,
Angus Fung,
Steven Waslander IEEE International Conference on Robotics and Automation (ICRA), 2020
Paper
We present a method to calibrate the time-varying extrinsic transformation between any number of cameras and achieves measurement excitation over the entire configuration space of the mechanism resulting in a more accurate calibration.
Using Deep Learning to Find Victims in Unknown Cluttered Urban Search and Rescue Environments Angus Fung,
Beno Benhabib,
Goldie Nejat Springer Nature, 2020
Paper
We investigate the first use of deep networks for victim identification in Urban Search and Rescue, for cases of partial occlusions and varying illumination, on a RGB-D dataset obtained by a mobile robot navigating cluttered USAR-like environments.
An all-in-one service built in to iMessage aimed at lowering the barrier of entry for LLMs and Generative AI.
Launched multi-modal conversations with proprietary model 4 months before GPT-4V.
Gained significant traction with thousands of monthly active users.
Recognition
2024: Doctoral Completion Award ($4k) 2024: LocalHost Fellowship ($3k) 2024: Microsoft Startup Hub Program ($150k) 2023: Ontario Graduate Scholarship - University of Toronto ($15k) 2022: Rimrott Memorial Graduate Scholarship - University of Toronto ($4k) 2021: RO-MAN Roboethics Competition, McGill University - 1st Place ($1k) 2021: University of Toronto MIE Fellowship ($14k) 2020: Queen Elizabeth II Graduate Scholarship - University of Toronto ($15k) 2020: University of Toronto MIE Fellowship ($14k) 2019: University of Toronto MIE Fellowship ($14k) 2019: Healthcare Robotics NSERC Fellowship ($10k) 2014-2018: Dean's Honour List 2014: Delta Tau Delta Award ($3k) 2014: University of Toronto Scholars (Academic Excellence) ($7.5k) 2014: University of Toronto Scholar ($5k) 2013: ARCT Diploma - Piano Performance 2013: ARCT Diploma - Organ Performance
Teaching
2024F: ROB501: Computer Vision for Robotic, TA, University of Toronto 2024W: MIE443: Mechatronics Systems: Design & Integration Head Tutorial TA, University of Toronto 2023F: MIE443: Mechatronics Systems: Design & Integration Head Tutorial TA, University of Toronto 2022F: ROB501: Computer Vision for Robotic, TA, University of Toronto 2022W: MIE443: Mechatronics Systems: Design & Integration Head Tutorial TA, University of Toronto 2021W: MIE443: Mechatronics Systems: Design & Integration Head Tutorial TA, University of Toronto 2020W: MIE443: Mechatronics Systems: Design & Integration Head Tutorial TA, University of Toronto