About Me

I am a CS Ph.D. student at The Ohio State University (Advisor: Wei-Lun (Harry) Chao). I obtained my MASc. at University of Toronto (Advisor: Scott Sanner), working on Continual Learning collaborating with LG AI Research. I completed BASc. in Engineering Science at University of Toronto.

My research interests lie in

Efficient, Robust and Interpretable Foundation Model Adaptation: NeurIPS’25, CVPR’25 (Highlight), CVPR’25, CVPR’25, NeurIPS’24, CVPR’23, NeurIPS’23
Vision Foundation Models & Multimodal Models: NeurIPS’25 (Spotlight), NAACL’25, CVPR’25-W, NeurIPS’24, NeurIPS’23-W
Continual Learning: ICML’25-W, AAAI’21(Oral), CVPR’21-W, Neurocomputing, CVPR’20-Competition🏅, AIJ

I am actively looking for a research internship! If you are aware of any opportunities or have any recommendations, I would greatly appreciate your insights and referrals. Please feel free to reach out!

News

Sept 2025 — Two NeurIPS 2025 Acceptance
Revisiting semi-supervised learning in the era of foundation models. We provide a comprehensive emperical study of SSL with VFM and propose a simple but strong baseline leveraging the diverse predictions from different parameter-efficient fine-tuning (PEFT) methods.
Bioclip 2: Emergent properties from scaling hierarchical contrastive learning. We curate TreeOfLife-200M, with 214 million images of living organisms, the largest and most diverse biological organism image dataset to date. We train BIOCLIP 2 on TREEOFLIFE-200M and found several emergent properties.
July 2025 — ICML 2025 Workshop Acceptance
An Empirical Exploration of Continual Unlearning for Image Generation. We present the first systematic study of continual unlearning in text-to-image generation.
June 2025 — New preprint about Atomic Visual Ability in Vision Foundation Models
AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models. We introduce AVA-BENCH, the first benchmark that explicitly disentangles 14 Atomic Visual Abilities (AVAs)—foundational skills like localization, depth estimation, and spatial understanding that collectively support complex visual reasoning tasks. By decoupling AVAs and matching train/test distributions within each, AVA-BENCH pinpoints exactly where a VFM excels or falters.
May 2025 — Research intern at Amazon Lab126
I will join Amazon Lab126 as a research intern working on multimodal models.
Feb 2025 — Three CVPR 2025 Main Conference & One Workshop Acceptance
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition. was selected as Highlight (2.98%). Instead of chasing the leaderboard, we offer a complementary perspective of PEFT by conducting a unifying empirical study. We provide (1) a systematic framework for reproducible evaluations; (2) empirical recommendations on how to use different PEFT in various scenarios (low-shots, many-shots, varying domain gaps, and robustness to distribution shifts); (3) insightful directions for future research.
Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis. . We present PROMPT-CAM, an easily implementable, trainable, and reproducible interpretable method that leverages the representations of pre-trained ViTs to identify and localize traits for fine-grained analysis.
Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation. . We propose Finer-CAM, a method that explicitly compares the target class with similar classes to reveal the most discriminative feature channels.
Mitigating Video Content Misalignment on Large Vision Model with Time-Series Data Alignment was accepted to Efficient Large Vision Models Workshop.
Jan 2025 — NAACL 2025 Acceptance
Attention Entropy-Guided Dynamic Cache Allocation for Efficient Multimodal Long-Context Inference was accepted to NAACL 2025. We present a novel approach specifically designed for the complexities of multimodal settings, dynamically allocating KV cache sizes based on attention entropy to better adapt to multimodal interactions.
Dec 2024 — ICASSP 2025 Acceptance
Attention-Driven Causal Discovery: From Transformer Matrices to Granger Causal Graphs for Non-Stationary Time-series Data was accepted to ICASSP 2025. We present a two-stage approach for causal discovery in non-stationary multivariate time series data.
Sept 2024 — Two NeurIPS 2024 Acceptances
Fine-Tuning is Fine, if Calibrated. was accepted to NeurIPS 2024. Fine-tuning a pre-trained classifier capable of recognizing a large number of classes to master a subset of classes at hand is shown to drastically degrade the performance in the other classes it had previously learned. We proposed simple post-processing calibration to bring back the pre-trained model’s capability.
MLLM-COMPBENCH: A Comparative Reasoning Benchmark for Multimodal LLMs. was accepted to NeurIPS 2024. We introduce MLLM-COMPBENCH to evaluate the comparative reasoning capability of MLLMs, which contains 40K image pairs with visually oriented questions covering 8 relativities: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality.
May 2024 — Research intern at Bosch
I will join Bosch as a research intern working on time series + vision + language with foundation models.
Oct 2023 — NeurIPS 2023 Outstanding Reviewer
I am thrilled to be selected as an Outstanding Reviewer for the NeurIPS 2023 conference.
Oct 2023 — NeurIPS 2023 Acceptance
Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data. was accepted to NeurIPS 2023. We address a learning problem involving the adaptation of a pre-trained source model, capable of classifying a wide range of objects to a target domain using data that covers only a partial label space.
Oct 2023 — NeurIPS 2023 Workshop Acceptance
Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly Supervised Semantic Segmentation was accepted to NeurIPS2023 I Can’t Believe It’s Not Better (ICBINB): Failure Modes in the Age of Foundation Models Workshoip. We leverage the Segment Anything Model (SAM) to enhanced pseudo labels for Weakly Supervised Semantic Segmentation (WSSS).
Feb 2023 — CVPR 2023 Acceptance
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning was accepted to CVPR 2023. We propose visual query tuning (VQT), a simple yet effective approach to aggregate intermediate features of Vision Transformers.

Previous News

Click here to see old news

Oct 2022 — IPM Acceptance
Unintended Bias in Language Model-driven Conversational Recommendation was accepted to Information Processing and Management (IPM)! We investigate how unintended bias — i.e., language variations such as name references or indirect indicators of sexual orientation or location that should not affect recommendations — manifests in significantly shifted price and category distributions of restaurant recommendations
Sept 2022 — ECCV 2022 Workshop Acceptance
TransCAM: Transformer Attention-based CAM Refinement for Weakly Supervised Semantic Segmentation was to the Learning from Limited and Imperfect Data (L2ID) Workshop at ECCV 2022! We propose TransCAM, a Conformer-based solution to WSSS that explicitly leverages the attention weights from the transformer branch of the Conformer to refine the CAM generated from the CNN branch. TransCAM is motivated by our observation that attention weights from shallow transformer blocks are able to capture low-level spatial feature similarities while attention weights from deep transformer blocks capture high-level semantic context.
April 2022 — SIGIR 2022 Acceptance
Mitigating the Filter Bubble while Maintaining Relevance: Targeted Diversification with VAE-based Recommender Systems was accepted to ACM SIGIR 2022! In this paper, we propose a novel methodology that trains Concept Activation Vectors (CAVs) for targeted topical dimensions (e.g., political polarization). We then modulate the latent embeddings of user preferences in a state-of-the-art VAE-based recommender system to diversify along the targeted dimension while preserving topical relevance across orthogonal dimensions.
Jan 2022 — WWW 2022 Acceptance
Distributional Contrastive Embedding for Clarification-based Conversational Critiquing was accepted to International World Wide Web Conference (WWW) 2022! In this paper, we propose a novel clarification-based conversational critiquing framework that allows the system to clarify user preferences by using distributional embeddings that can capture the specificity and generality of concepts through distributional coverage.
Nov 2021 — Artificial Intelligence Journal Acceptance
CVPR 2020 continual learning in computer vision competition: Approaches, results, current challenges and future directions was accepted to Artificial Intelligence! In this paper, we report the main results of the CVPR 2020 Continual Learning in Computer Vision competition and summarize the winning approaches, current challenges and future research directions.
Oct 2021 — Neurocomputing Journal Acceptance
Online Continual Learning in Image Classification: An Empirical Survey was accepted to Neurocomputing! We empirically scrutinize recently proposed methods and tricks in Online Continual Learning to study their relative advantages and the settings where they work best. We also discuss recent trends and emerging directions in Online Continual Learning.
April 2021 — CVPR 2021 Workshop Acceptance
Our paper Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-Incremental Continual Learning was accepted to the Workshop on Continual Learning in Computer Vision at CVPR 2021! We leverage supervised contrastive learning and nearest class mean classifier to obtain new state-of-the-art performance for online continual learning.
Dec 2020 — AAAI 2021 Acceptance
Our paper Online Class-Incremental Continual Learning with Adversarial Shapley Value was accepted to AAAI 2021! We contribute a novel Adversarial Shapley value scoring method that scores memory data samples according to their ability to preserve latent decision boundaries for previously observed classes (to maintain learning stability and avoid forgetting) while interfering with latent decision boundaries of current classes being learned (to encourage plasticity and optimal learning of new class boundaries).
Nov 2020 — ICDM 2020 Workshop Acceptance
Our paper Attentive Autoencoders for Multifaceted Preference Learning in One-class Collaborative Filtering (with Ga Wu, Kai Luo, Scott Sanner) was accepted to the Workshop on Advanced Neural Algorithms and Theories for Recommender Systems (NeuRec) at ICDM 2020!
June 2020 — CVPR 2020 CLVision Challenge Champion
I won 1st place in the CVPR 2020 CLVision Challenge with my entry Batch-level Experience Replay with Review for Continual Learning! Welcome to check our winning solution [code] [paper] and the summary of the challenge.

Contact

Email: mai.145@osu.edu

Zheda Mai

News

Previous News

Contact