Haoming Wang

PhD Student · University of Pittsburgh

Hi! I'm Haoming Wang, a PhD student at the Department of Electrical and Computer Engineering at the University of Pittsburgh, working in the Intelligent System Lab advised by Prof. Wei Gao. I previously earned my bachelor's degree in Automation from the Department of Control Science and Engineering at Zhejiang University, with honors from the Chu Kochen Honors College.

My research centers on spatial intelligence, multimodal LLMs, LLM reasoning, and generative AI on mobile devices. I am also interested in LLM explainability and robotics.

🎯 I am on track to graduate in 2027 and am actively seeking research internship opportunities.
📍 I'll be at MobiSys 26 (Cambridge, Jun 21-25) — let's chat!

news

Jun 17, 2026 Paper accepted: [ECCV26] Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective
Jun 03, 2026 Attended CVPR 2026 in Denver, Jun 3–7.
May 18, 2026 Attended MLSys 2026 in Seattle, May 18–22.
May 10, 2026 I will be serving as TPC member of the ACM S3 2026 workshop. [link]
May 07, 2026 New work: Uncovering and Shaping the Latent Representation of 3D Scene Topology in Vision-Language Models
Apr 12, 2026 Honored as ECE Research Assistant of the Year by the University of Pittsburgh Department of Electrical and Computer Engineering. [link]
Feb 25, 2026 Paper accepted: [CVPR26 oral] InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity
Feb 10, 2026 New work: MosaicThinker: On-Device Visual Spatial Reasoning for Embodied AI via Iterative Construction of Space Representation

see all news →

recent work

  1. preprint
    vlm_latent_shaping
    Uncovering and Shaping the Latent Representation of 3D Scene Topology in Vision-Language Models
    Haoming Wang, Wei Gao
    2026

    We show that current VLMs hold a latent topological map of 3D scenes, but it is overshadowed by non-geometric semantics like color and shape. Isolating this subspace via cross-scene linear feature extraction reveals its correspondence to the Laplacian eigenmaps of the scene’s 3D Gaussian-kernel graph, motivating a Dirichlet-energy-based latent regularizer that beats standard SFT by up to 12.1% on real-world spatial benchmarks.

  2. preprint
    mosaicthinker
    MosaicThinker: On-Device Visual Spatial Reasoning for Embodied AI via Iterative Construction of Space Representation
    Haoming Wang, Qiyao Xue, Weichen Liu, Wei Gao
    2026

    MosaicThinker is an inference-time method for on-device embodied AI that fuses spatial cues from multiple video frames into a unified semantic map, then prompts a small VLM to reason over it. The approach upgrades cross-frame spatial reasoning accuracy on resource-constrained devices across diverse manipulation and planning tasks.

selected publications

  1. ECCV26
    latentstate
    [ECCV26] Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective
    Qiyao Xue, Haoming Wang (co-author), Weichen Liu, Shiqi Wang, Yuyang Wu, Wei Gao
    Proceedings of the European Conference on Computer Vision (ECCV) 2026, 2026

    This work examines how multi-view visual spatial reasoning can be explained through reasoning-path tracking and latent-state analysis, bridging cognitive science perspectives with modern multimodal models.

  2. CVPR oral
    infinibench
    [CVPR26 oral] InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity
    Haoming Wang, Qiyao Xue, Wei Gao
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (accepted), 2026

    Modern VLMs require robust spatial-reasoning evaluation, but existing benchmarks lack diversity, scalability, and fine-grained control over scene complexity. To address this, we introduce InfiniBench, a fully automated and customizable benchmark generator capable of producing an unlimited variety of photo-realistic 3D scenes and videos from natural-language descriptions. Through its agentic constraint-refinement framework, cluster-based layout optimizer, and task-aware camera trajectory design, InfiniBench enables precise analysis of VLM failure modes and outperforms prior 3D generation methods, while supporting diverse spatial-reasoning tasks such as measurement, perspective-taking, and spatiotemporal tracking.

  3. MobiSys
    xpert
    [MobiSys25] Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection
    Haoming Wang, Boyuan Yang, Xiangyu yin, Wei Gao
    Proceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services (Acceptance Ratio: 18.0%), 2025

    Personalizing Large Language Models (LLMs) is crucial for meeting individual user needs on mobile devices. However, on-device personalization faces challenges from limited computational resources and scarce personal data. We propose XPerT, a technique that fine-tunes an already personalized LLM using user data and selects models based on explainability of prior fine-tuning.

  4. MobiCom
    feddc
    [MobiCom25] When Device Delays Meet Data Heterogeneity in Federated AIoT Applications
    Haoming Wang, Wei Gao
    Proceedings of the 31st ACM International Conference on Mobile Computing and Networking (Acceptance Ratio: 17.1%), 2025

    Federated AIoT leverages distributed IoT data to train AI models, but heterogeneous devices introduce data heterogeneity and varying staleness, degrading model performance and slowing training. Existing FL frameworks treat device delays as independent of data heterogeneity, which is unrealistic. We propose FedDC, a technique that mitigates delay impacts when these factors are correlated. FedDC uses gradient inversion to infer local data distributions and compensate for delay-induced update bias.

  5. AAAI
    aaai_fl
    [AAAI25] Tackling Intertwined Data and Device Heterogeneities in Federated Learning with Unlimited Staleness
    Haoming Wang, Wei Gao
    Proceedings of the 39th Annual Conference on Artificial Intelligence (Acceptance Ratio: 23.4%), 2025

    Federated Learning (FL) is challenged by intertwined data and device heterogeneities—differences in clients’ local data distributions and model update staleness. Traditional FL methods treat these separately, which is unrealistic and often ineffective. We propose a novel FL framework that converts stale model updates into unstale ones, addressing these intertwined heterogeneities efficiently. Our method estimates clients’ local data distributions from their stale updates to compute unstale updates, without requiring auxiliary datasets, fully trained local models, or extra client-side computation or communication.

preprints

  1. preprint
    finexl
    Deciphering Personalization: Towards Fine-Grained Explainability in Natural Language for Personalized Image Generation Models
    Haoming Wang, Wei Gao
    preprint, 2025

    Personalized image generation models better serve diverse user needs but often lack explainability regarding how personalization occurs. While visual cues exist, they are hard for users to interpret, and current natural language explanations are too coarse-grained to capture multiple aspects or degrees of personalization. We propose FineXL, a technique for Fine-grained eXplainability in natural Language, which generates detailed textual descriptions and quantitative scores for each personalization aspect.

  2. preprint
    freezeasguard
    Freezeasguard: Mitigating illegal adaptation of diffusion models via selective tensor freezing
    Kai Huang, Haoming Wang (co-author), Wei Gao
    preprint, 2024

    Text-to-image diffusion models can be fine-tuned for personalized domains, but this adaptability also enables misuse, such as forging public figures, replicating copyrighted artworks, or producing explicit content. Existing detection or unlearning methods fail to prevent illegal adaptations. We introduce FreezeAsGuard, a technique that irreversibly mitigates such misuse by selectively freezing tensors in pre-trained diffusion models that are critical to illegal adaptations, while preserving legal fine-tuning capabilities.

other

  1. note
    spatialreasoningsurvey
    Spatial Reasoning in Multimodal Large Language Models: A Survey of Tasks, Benchmarks and Methods
    Weichen Liu, Qiyao Xue, Haoming Wang, Xiangyu Yin, Boyuan Yang, Wei Gao
    technical report, 2024

    survey

experience

Service
  • Reviewer, ACM Transactions on Internet of Things (TIOT)
  • Reviewer, NeurIPS 2026
  • Reviewer, Pacific Graphics 2026
  • TPC Member, ACM S3 2026 Workshop
Teaching Assistant, Department of Electrical and Computer Engineering, University of Pittsburgh
  • ECE 1175 — Embedded System Design (Fall 2024)
  • ECE 1195 — Advanced Digital Design (Spring 2025)
  • ECE 1396 — Introduction to Machine Learning (Fall 2025)
  • ECE 2570 — Robotic Control (Spring 2026)

visitors