[Community Pipeline] UnCLIP image / text interpolations

### Model/Pipeline/Scheduler description

Copied from https://github.com/huggingface/diffusers/pull/1858:

> UnCLIP / Karlo: https://huggingface.co/spaces/kakaobrain/karlo gives some very nice and precise results when doing image generation and can strongly outperform Stable Diffusion in some - see:
> https://www.reddit.com/r/StableDiffusion/comments/zshufz/karlo_the_first_large_scale_open_source_dalle_2/
> 
> Another extremely interesting aspect of Dalle 2 is its ability to interpolate between text and or image embeddings. See e.g. section 3.) of the Dalle 2 paper: https://cdn.openai.com/papers/dall-e-2.pdf . This PR now allows to directly pass text embeddings and image embeddings which should enable those tasks!

I think we could create a super cool community pipeline. The pipeline could allow to automatically create interpolations between two text prompts and similarly we could create one to do interpolations between two images.

In terms of design to stay as efficient as possible the following would make sense:

- 1) The user passes two text prompts and a `num_interpolations` input.
- 2) The pipeline then embeds those two text prompts into the text embeddings x_0 and x_N and `num_interpolations` x_1, x_2, ... x_N-1 are created using the [`slerp` function ](https://discuss.pytorch.org/t/help-regarding-slerp-function-for-generative-model-sampling/32475/4?u=patrickvonplaten). 
- 3) Then we have `num_interpolations` + 2 text embeddings that should be passed in a batch through the model to create a nice interpolation of images.
- 4) It'd be important to make use of `enable_cpu_offload()` to save memory.

It's probably easier to start with the `UnCLIPImageInterpolationPipeline` since image embeddings are just a single 1-d vector where as for text embeddings two latent vectors are used.

Would be more than happy to help if someone is interested in giving this a try - think it'll make for some super cool demos.

### Open source status

- [X] The model implementation is available
- [X] The model weights are available (Only relevant if addition is not a scheduler).

### Provide useful links for the implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Community Pipeline] UnCLIP image / text interpolations #1869

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Community Pipeline] UnCLIP image / text interpolations #1869

Description

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions