Generating novel views using Latent Diffusion – This sneaker does not exist

I’m looking for work! If you, or a company you know, is looking for a machine learning or data science role, please reach out!

Rotate left Rotate right

Full demo

The images shown are made using a latent-diffusion model that takes as input an sneaker and an angle embedding, and outputs an image of the sneaker from the given angle. The model is similar to other models that use conditional image diffusion for to synthesize new viewpoints like 3DiM, RenderDiffusion and 3Ddesigner. This particular model has a focus on high resolution images.

The model was trained for 8 days on a single RTX 3090 GPU and can probably perform much better if trained for longer on a larger dataset. This implies that it may even be possible to create a larger, more general-purpose model.

The model works with both real images of sneakers, and generated images of sneakers. The model and code will be released once I clean up my code.

This project is for research purposes only.

Model Details

Latent Diffusion model with a VQ-f4 autoencoder

Channels: 480

Channels multiplier: 1,1,1,1

Attention resolutions: False,False,True,True

Dropout: 10%

Noise schedule: Cosine

All images were sampled using the PLMS sampler and classifier-free guidance.

All images on the demo are generated using a separate Latent Diffusion model. But I am currently unsatisfied with its diversity and consistency of results. I have some promising initial experiments using Poisson Flow Generative Models in the pipeline. If the results work out I will make a bigger demo and public model.