AvatarMix

Identity-Preserving Cross-Avatar Composition for Outfit Personalization

University of Tsukuba
CVPR 2026 Findings
3D Gaussian Splatting Mesh-based avatars Outfit personalization Cross-avatar composition Diffusion refinement
TL;DR. For outfit personalization, combine User’s identity with Model’s outfit, preserve high-fidelity 3DGS appearance. Compose first, repair locally.
AvatarMix teaser: input 3DGS assets and free-viewpoint outfit personalization results.

AvatarMix performs free-viewpoint outfit personalization by composing a user’s identity cues (head–neck, body shape and scale, and skin tone) with a model’s clothed outfit in a 3DGS representation. The examples show variations in height and body proportions, cross-ethnicity skin tones, and diverse garments, while preserving fine details such as printed text and skirt folds under composition and body-shape retargeting. Examples are rendered in metric scale without post-hoc rescaling.

Abstract

Existing 3D avatar outfit transfer methods face distinct challenges: approaches that lift 2D edits to 3D often suffer from outfit or identity quality degradation, while those that separately model body and clothing layers are prone to intersection artifacts. We introduce AvatarMix, a compositional paradigm that bypasses these issues by directly composing the head and body from two high-fidelity Gaussian avatars. While this paradigm inherently preserves outfit quality and avoids intersections, it introduces challenges in creating a seamless join and maintaining appearance fidelity after body reshaping.

To this end, we propose a two-tier refinement strategy: SeamFix, a localized diffusion module that refines hair and neck to ensure an artifact-free join, and an optional full-body refinement, FullbodyFix, that restores garment appearance when retargeting degrades the clothed body. Both operate on renders from an already 3D-consistent Gaussian avatar, which limits multi-view artifacts compared to 2D-to-3D lifting. To preserve the user's body identity, our mesh-based Gaussian representation enables the adaptation of a robust mesh retargeting technique, precisely reshaping the clothed body to the user's physique and robustly handling diverse body shapes. Extensive experiments demonstrate that our method achieves state-of-the-art results in outfit fidelity and identity preservation, providing a new perspective for realistic 3D outfit personalization.

Task & Motivation

Reference numbers follow the paper/poster.

Task

For outfit personalization, combine User’s identity with Model’s outfit, preserve high-fidelity 3DGS appearance.

Why is this hard?

  • 2D-to-3D try-on can degrade garment texture and view consistency [1].
  • Layered garment & body transfer can introduce intersections [2].

Failure example of 2D-to-3D try-on [1]: red dotted boxes show view-inconsistent garment details.

Motivation & Key Idea

  • To avoid view inconsistency and intersection artifacts, compose reusable avatar’s head & body instead of regenerating outfits.
  • To preserve identity beyond the face, match the clothed body to the User’s physique and skin tone.
  • To reduce hallucinated changes in high-quality regions, refine only seams and reshaping artifacts.

Contributions

  • Cross-avatar composition for 3DGS outfit personalization.
  • GSReshape for body-shape-aware clothed-body retargeting.
  • SeamFix + FullbodyFix for localized and full-body artifact repair.

Method Overview

Overview of the AvatarMix pipeline.

Given multi-view images of a User and a Model, we first employ Mesh-Based Avatar Reconstruction with semantic segmentation. We then perform Cross-Avatar Geometric Composition by aligning the user’s head and neck to the Model’s pose and reshaping the Model’s clothed body via GSReshape so that the body geometry matches the User’s physique. Finally, our two-tier diffusion refinement operates on rendered views, followed by 3D Gaussian fine-tuning, to produce the final identity-transfer result.

Mesh-based Avatar Reconstruction

Reconstruct mesh based 3DGS avatars from multi-view images [3,4], with semantic segmentation and SMPL fitting.

Cross-Avatar Composition

Align the User’s head & neck to the Model pose, then combine with Model’s clothed body. Transfer User’s skin-tone to body for identity preservation.

GSReshape

Retarget the clothed body to the user’s physique by jointly deforming the mesh and the attached Gaussians [5].

Seam/FullbodyFix

Repair seam and reshaping artifacts, then use the refined views to update the composed 3DGS avatar [6].

Training Artifact Refiners

  • Cross-avatar compositions lack pixel-aligned GT for training artifact refiners.
  • Two head-swaps synthesize pairs: A→B introduces artifacts; B→A returns to A’s layout for GT alignment.
  • Seam/FullbodyFix learn artifact removal; refined multi-view renders then update the composed 3DGS avatar.
Training strategy for SeamFix and FullbodyFix.

Results

Quantitative Results on THUman2.0 [9]

Method Outfit DINO ↑ [7] Head + Neck DINO ↑ [7] Warp RMSE ↓ [8] User Preference ↑
VTON360 [1] 0.633 0.786 0.0276 8.7%
TIP-Editor [10] N/A 0.356 0.0388 2.6%
Ours 0.883 0.818 0.0175 88.7%
Qualitative comparison between AvatarMix, TIP-Editor, and VTON360.

Qualitative comparison with TIP-Editor and VTON360. AvatarMix better preserves facial identity, garment texture, and seam quality while avoiding view inconsistency, unnatural wrinkles, and degraded hands.

Ablations

Ablation of diffusion refinement and GSReshape.

SeamFix cleans head-neck seams, FullbodyFix restores garment appearance after reshaping artifacts, and GSReshape adapts the clothed body to the user’s physique.

GSReshape pipeline overview.

Key Takeaways

  • Compose first, repair locally.
  • Mesh-based Gaussians make geometric identity transfer practical.
  • Diffusion refinement acts as cleanup, instead of full outfit generation.
Additional GSReshape ablation.

Additional GSReshape examples showing body-shape-aware garment adaptation.

Hand-aware skin tightness examples.

Hand-aware skin tightness examples for robust Gaussian-avatar reshaping.

Additional Qualitative Results

Additional qualitative comparisons on THUman2.0.

Additional comparisons on more User-Model pairs, demonstrating preservation of identity and outfit appearance across diverse viewpoints.

Video

Demo video. If you host the video externally, replace this local MP4 with an embedded player.

Limitations / Future work

  • Requires high-quality mesh-based Gaussian avatars as input.
  • FullbodyFix is currently triggered by visual inspection.
  • Animatable transfer remains future work.

Contact

Zhaorong Wang
University of Tsukuba
zhaorong.wang1997@gmail.com · larsph.github.io

Citation

@inproceedings{wang2026avatarmix,
    author    = {Wang, Zhaorong and Kanamori, Yoshihiro and Endo, Yuki},
    title     = {{AvatarMix}: Identity-Preserving Cross-Avatar Composition for Outfit Personalization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
    month     = {June},
    year      = {2026},
    pages     = {425-435}
}