AvatarMix: Identity-Preserving Cross-Avatar Composition for Outfit Personalization

AvatarMix teaser: input 3DGS assets and free-viewpoint outfit personalization results.

AvatarMix performs free-viewpoint outfit personalization by composing a user’s identity cues (head–neck, body shape and scale, and skin tone) with a model’s clothed outfit in a 3DGS representation. The examples show variations in height and body proportions, cross-ethnicity skin tones, and diverse garments, while preserving fine details such as printed text and skirt folds under composition and body-shape retargeting. Examples are rendered in metric scale without post-hoc rescaling.

Abstract

Existing 3D avatar outfit transfer methods face distinct challenges: approaches that lift 2D edits to 3D often suffer from outfit or identity quality degradation, while those that separately model body and clothing layers are prone to intersection artifacts. We introduce AvatarMix, a compositional paradigm that bypasses these issues by directly composing the head and body from two high-fidelity Gaussian avatars. While this paradigm inherently preserves outfit quality and avoids intersections, it introduces challenges in creating a seamless join and maintaining appearance fidelity after body reshaping.

To this end, we propose a two-tier refinement strategy: SeamFix, a localized diffusion module that refines hair and neck to ensure an artifact-free join, and an optional full-body refinement, FullbodyFix, that restores garment appearance when retargeting degrades the clothed body. Both operate on renders from an already 3D-consistent Gaussian avatar, which limits multi-view artifacts compared to 2D-to-3D lifting. To preserve the user's body identity, our mesh-based Gaussian representation enables the adaptation of a robust mesh retargeting technique, precisely reshaping the clothed body to the user's physique and robustly handling diverse body shapes. Extensive experiments demonstrate that our method achieves state-of-the-art results in outfit fidelity and identity preservation, providing a new perspective for realistic 3D outfit personalization.

Task & Motivation

Reference numbers follow the paper/poster.

Task

For outfit personalization, combine User’s identity with Model’s outfit, preserve high-fidelity 3DGS appearance.

Why is this hard?

2D-to-3D try-on can degrade garment texture and view consistency [1].
Layered garment & body transfer can introduce intersections [2].

Failure example of 2D-to-3D try-on [1]: red dotted boxes show view-inconsistent garment details.

Motivation & Key Idea

To avoid view inconsistency and intersection artifacts, compose reusable avatar’s head & body instead of regenerating outfits.
To preserve identity beyond the face, match the clothed body to the User’s physique and skin tone.
To reduce hallucinated changes in high-quality regions, refine only seams and reshaping artifacts.

Contributions

Cross-avatar composition for 3DGS outfit personalization.
GSReshape for body-shape-aware clothed-body retargeting.
SeamFix + FullbodyFix for localized and full-body artifact repair.

Method Overview

Given multi-view images of a User and a Model, we first employ Mesh-Based Avatar Reconstruction with semantic segmentation. We then perform Cross-Avatar Geometric Composition by aligning the user’s head and neck to the Model’s pose and reshaping the Model’s clothed body via GSReshape so that the body geometry matches the User’s physique. Finally, our two-tier diffusion refinement operates on rendered views, followed by 3D Gaussian fine-tuning, to produce the final identity-transfer result.

Mesh-based Avatar Reconstruction

Reconstruct mesh based 3DGS avatars from multi-view images [3,4], with semantic segmentation and SMPL fitting.

Cross-Avatar Composition

Align the User’s head & neck to the Model pose, then combine with Model’s clothed body. Transfer User’s skin-tone to body for identity preservation.

GSReshape

Retarget the clothed body to the user’s physique by jointly deforming the mesh and the attached Gaussians [5].

Seam/FullbodyFix

Repair seam and reshaping artifacts, then use the refined views to update the composed 3DGS avatar [6].

Training Artifact Refiners

Cross-avatar compositions lack pixel-aligned GT for training artifact refiners.
Two head-swaps synthesize pairs: A→B introduces artifacts; B→A returns to A’s layout for GT alignment.
Seam/FullbodyFix learn artifact removal; refined multi-view renders then update the composed 3DGS avatar.

Training strategy for SeamFix and FullbodyFix.

Results

Quantitative Results on THUman2.0 [9]

Method	Outfit DINO ↑ [7]	Head + Neck DINO ↑ [7]	Warp RMSE ↓ [8]	User Preference ↑
VTON360 [1]	0.633	0.786	0.0276	8.7%
TIP-Editor [10]	N/A	0.356	0.0388	2.6%
Ours	0.883	0.818	0.0175	88.7%

Additional qualitative AvatarMix results on User-Model outfit personalization pairs.

Qualitative results on additional User-Model pairs, demonstrating high-fidelity outfit personalization with preserved garment details, body-shape adaptation, and identity transfer.

Qualitative comparison between AvatarMix, TIP-Editor, and VTON360.

Qualitative comparison with TIP-Editor and VTON360. AvatarMix better preserves facial identity, garment texture, and seam quality while avoiding view inconsistency, unnatural wrinkles, and degraded hands.

Ablations

Ablation of diffusion refinement, GSReshape, and cross-gender stress test.

SeamFix cleans head-neck seams, FullbodyFix restores garment appearance after reshaping artifacts, and GSReshape adapts the clothed body to the user’s physique.

GSReshape retargets the clothed body to the user's physique by jointly deforming the mesh and the attached Gaussians.

Key Takeaways

Compose first, repair locally.
Mesh-based Gaussians make geometric identity transfer practical.
Diffusion refinement acts as cleanup, instead of full outfit generation.

Video

Demo video. If you host the video externally, replace this local MP4 with an embedded player.

Limitations / Future work

Requires high-quality mesh-based Gaussian avatars as input.
FullbodyFix is currently triggered by visual inspection.
Animatable transfer remains future work.

Contact

Zhaorong Wang
University of Tsukuba
zhaorong.wang1997@gmail.com · larsph.github.io

Citation

@inproceedings{wang2026avatarmix,
    author    = {Wang, Zhaorong and Kanamori, Yoshihiro and Endo, Yuki},
    title     = {{AvatarMix}: Identity-Preserving Cross-Avatar Composition for Outfit Personalization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
    month     = {June},
    year      = {2026},
    pages     = {425-435}
}