Zi-Yan's Website

Continuous-Time Human Motion Field from Events

This paper tackles the challenge of estimating a continuous-time human motion field from an event stream. Current Human Mesh Recovery (HMR) methods predominantly use frame-based approaches, which are susceptible to aliasing and inaccuracies caused by limited temporal resolution and motion blur. In contrast, we propose a method to predict a continuous-time human motion field directly from events. Our approach employs a recurrent feed-forward neural network to model human motion in the latent space of possible movements. Previous state-of-the-art event-based methods rely on computationally expensive optimizations over a fixed number of poses at high frame rates, which become impractical as temporal resolution increases. Instead, we introduce the first method to replace traditional discrete-time predictions with a continuous human motion field, represented as a time-implicit function that supports parallel pose queries at arbitrary temporal resolutions.

Paper

Auto-Zooming Camera in Basketball Game

Basketball games often use only half of the court at a time, requiring the camera to pan and zoom to follow the players' movements. This project aims to develop a system for automated zooming in basketball games. The system benefits producers by automatically tracking players to generate high-quality video streams and assists coaches by focusing on key players for game analysis. Solution: 1. Object Detection: The system detects key elements—players, referees, and the basketball—using models such as YOLOv11 trained on basketball data or the Grounding DINO model. 2. Tracking: Accurate tracking ensures continuity even when object detection fails or misidentifies objects in certain frames. 3. Pose Estimation: Player pose data is used to enhance tracking accuracy, refine zoom effects, and predict transitions when tracking falters. 4. Zooming: The system constructs a bounding box based on segmented objects and adjusts the zoom to center it while maintaining frame constraints. Smooth transitions between frames are achieved through interpolation, guided by pose estimation data and changes in zoom values. This integrated approach ensures seamless, high-quality video output tailored for both streaming and analysis purposes.

Demo1 Demo2 Demo3

Self-made MineCraft sandbox physical engine

This project is a procedurally generated, multi-biome 3D world featuring a range of environments and enhanced visual effects. The core landscape includes grassland, desert, mountainous, sandy, and water biomes that blend seamlessly using procedural noise functions like fbm and Perlin noise. Our team utilized specialized parameters for each biome to create unique elevations, smoother or sharper terrain features, and biome-specific aesthetics. Underground, 3D Perlin noise generates cave systems with moisture-based water levels, adding realism. For optimized rendering, a terrain chunking system dynamically loads and unloads blocks as players navigate. Multithreading was implemented to ensure efficient terrain generation and memory management, with separate threads managing block and VBO data. Player physics, including collision detection and adjustments for water and lava traversal, were refined with raycasting to support smoother interactions. These environments respond to the player's movement, adding features like camera tinting when submerged in water or lava, along with slowed movement and reduced jump forces for an immersive experience. To manage player resources, a GUI-based inventory and toolbar were designed, enabling players to select and track blocks. The toolbar provides quick access, while the inventory system allows players to add and remove block types. Special effects such as Blinn-Phong shading and animated water waves heighten visual appeal, and a day-night cycle creates dynamic lighting across different times of the day, enhancing the world’s ambiance and realism. Together, these elements form a dynamic game environment that blends procedural generation with optimized rendering and user-friendly controls.

Video1

NerF Rendering with Ensemble Learning

The project I led for applied machine learning final project enhances Neural Radiance Fields (NeRF) for generating unseen views of objects from images captured at multiple angles. Inspired by ensemble learning, we introduces two strategies: Random Initialization Ensembles (RIE) and Bagging Ensembles. RIE trains multiple TinyNeRF models with different hyperparameters, then aggregates their outputs using Averaging, Soft Voting, and Hard Voting. The Bagging approach samples data subsets randomly, training a diverse TinyNeRF ensemble from scratch. By adopting Gaussian distribution sampling over uniform distribution, Bagging ensures denser data near the mean, enabling models to perform better in key spatial regions. Position-conditioned aggregation further maximizes this by specializing model predictions in specific spatial zones. Results demonstrate that these ensemble techniques and sampling strategies substantially improve NeRF’s accuracy and robustness in 3D reconstruction.

Video

Obstacle Avoidance in Dense Environments using MPC

This project explores Model Predictive Contouring Control (MPCC) to enhance local planning for mobile robots navigating dynamic and unstructured environments. Recognizing the challenges robots face in non-convex, crowded spaces—such as the risk of collisions or the "freezing robot" problem—the project implemented MPCC with specialized constraints for static and dynamic obstacle avoidance. Static obstacles were approximated with convex rectangular regions, while dynamic obstacles used Euclidean distance metrics in a filtered "obstacle window," focusing only on imminent threats. Testing revealed that MPCC outperformed the Dynamic Window Approach (DWA) in safety, particularly in a challenging environment with 10 dynamic obstacles. Notably, filtering obstacles by obstacle window led to safer navigation, showcasing MPCC’s robustness and adaptability in high-risk settings.

Poster GIF 1 GIF 2

Vision Assistive System for Pedestrian prediction and Special Vehicle Detection

We proposed a integrated assistive system fusing vision, Lidar, sound to predict pedestrians’ movement and recognize directions of specific vehicles such as ambulance. Constructed real‐time data pipelines on Robot Operation System (ROS) This project was funded by MOST to develop a powerful self-driving car assistance system. Our goal was to build a sub-system to assist self-driving cars predict pedestrians’ intent from vision and LiDAR to avoid safety danger and recognize particular vehicle sounds such as ambulance sound. Constructed on Robot Operating System (ROS), our process can run at high speed and be implemented to real-time applications.

Paper Video1 Video2

Assistive Navigation using DRL, UWB/Voice Beacons and Semantic Feedbacks

Facilitating navigation in pedestrian environments is critical for enabling people who are blind and visually impaired (BVI) to achieve independent mobility. A deep-reinforcement-learning-based assistive guiding robot with ultrawide-bandwidth (UWB) beacons that can navigate through routes with designated waypoints was designed in this study. Typically, a simultaneous localization and mapping (SLAM) framework is used to estimate the robot pose and navigational goal; however, SLAM frameworks are vulnerable in certain dynamic environments. The proposed navigation method is a learning approach based on state-of-the-art deep reinforcement learning and can effectively avoid objects. When used with UWB beacons, the proposed strategy is suitable for environments with dynamic pedestrians. We also designed a harness device with an audio interface that enables BVI users to interact with the guiding robot through intuitive feedback. The UWB beacons were installed with an audio interface to obtain environmental information The on-harness and on-beacon verbal feedback provides information on points-of-interest (POI) and turn-by-turn information to BVI users. BVI users were recruited in this study to conduct navigation tasks in different scenarios. A route was designed in a simulated ward to represent daily activities. In real-world situations, SLAM-based state estimation might be affected by dynamic obstacles, and the visual-based trail may suffer from occlusions from pedestrians or other obstacles. The proposed system successfully navigated through environments with dynamic pedestrians, in which systems based on existing SLAM algorithms have failed. Publicationstrong>: C.‐L. Lu, Z.‐Y. Liu,J.‐T. Huang,C.‐I Huang, B.‐H. Wang, Y. Chen, N.‐H. Wu, H.‐C. Wang, L. Giarré, P.‐Y. Kuo. ”Assistive Navigation using DeepReinforcement Learning Guiding Robot with UWB/Voice Beacons and Semantic Feedbacks for Blind and Visually Impaired People,” In Frontier in Robotics and AI. 2021

Paper Website

DARPA SubT Urban Challenge

The DARPA Subterranean (SubT) Challenge aims to develop innovative technologies that would augment operations underground. The SubT Challenge will explore new approaches to rapidly map, navigate, search, and exploit complex underground environments. I participated after the Urban challenge with my team and our robots in a decommissioned nuclear power plant. Publicationstrong>: C.‐L Lu*, J.‐T. Huang*, C.‐I Huang, Z.‐Y. Liu, C.‐C. Hsu, Y.‐Y. Huang, S.‐C. Huang, P.‐K. Chang, Z. L. Ewe, P.‐J. Huang, P.‐L. Li, B.‐H. Wang, L.‐S. Yim, S.‐W. Huang, M.‐S Bai, H.‐C. Wang. ”A Heterogeneous Unmanned Ground Vehicle and Blimp Robot Team for Search and Rescue using Data‐driven Autonomy and Communication‐aware Navigation,” In Field Robotics ‐ Special Issue: Advancements and lessons learned during Phase I II of the DARPA Subterranean Challenge. 2021

Paper Video

Deep Reinforcement Learning in Simulation

Implementation of different reinforcement learning algorithms such as RDPG and D4PG by PyTorch, the DRL agents were trained by interacting with simulation environment via Gazebo simulator.

Using X1 in simulation, and Husky for real robot experiment

Obtain navigation skills by training in virtual SubT cave environment

I design a narrow gate scenario in gazebo for training in order to urge UGV to pass through narrow passage, and evaluate the performance via sim-to-real approach.

Pyrobot-Pick and Place Mission in Simulation(2021/07)

Identify the object and pick it up, then move to the target location and place the item.

Task 1: Object Detection Mask R-CNN

Task 2: Pose Estimation and Pick Dope

Task 3: Move to destination A*

Task 4: Place in the box

Report

Projects and Experiences