Generating novel views using Latent Diffusion

I’m looking for work! If you, or a company you know, is looking for a machine learning or data science role, please reach out!


Random sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneakerRandom sneaker

The images shown are made using a latent-diffusion model that takes as input an sneaker and an angle embedding, and outputs an image of the sneaker from the given angle. The model is similar to other models that use conditional image diffusion for to synthesize new viewpoints like 3DiM, RenderDiffusion and 3Ddesigner. This particular model has a focus on high resolution images.

The model was trained for 8 days on a single RTX 3090 GPU and can probably perform much better if trained for longer on a larger dataset. This implies that it may even be possible to create a larger, more general-purpose model.

The model works with both real images of sneakers, and generated images of sneakers. The model and code will be released once I clean up my code.

This project is for research purposes only.


Model Details

Latent Diffusion model with a VQ-f4 autoencoder

  • Channels: 480
  • Channels multiplier: 1,1,1,1
  • Attention resolutions: False,False,True,True
  • Dropout: 10%
  • Noise schedule: Cosine

  • All images were sampled using the PLMS sampler and classifier-free guidance.

    All images on the demo are generated using a separate Latent Diffusion model. But I am currently unsatisfied with its diversity and consistency of results. I have some promising initial experiments using Poisson Flow Generative Models in the pipeline. If the results work out I will make a bigger demo and public model.