table Diffusion Public Release
AI image generation has arrived in a major way. Stable Diffusion, a recently released open source image synthesis model, enables anyone with a PC and a decent GPU to create virtually any visual reality they can imagine. The results appear on your screen as if by magic when you feed it a descriptive phrase.
Some artists are thrilled by the prospect, while others are not, and society as a whole is still largely unaware of the rapidly evolving tech revolution taking place through Twitter, Discord, and Github communities. Image synthesis may have ramifications comparable to the invention of the camera, or even the creation of visual art. Depending on how things play out, even our sense of history could be at risk. In any case, Stable Diffusion is at the forefront of a new wave of creative tools based on deep learning that are poised to revolutionize the creation of visual media.
Advances in deep learning-based image synthesis
Stable Diffusion is the brainchild of Emad Mostaque, a former London-based hedge fund manager whose company, Stability AI, aims to bring novel deep learning applications to the masses. However, the origins of modern image synthesis can be traced back to 2014, and Stable Diffusion was not the first image synthesis model (ISM) to make waves in 2018.
OpenAI announced DALL-E 2 in April 2022, shocking social media with its ability to transform a written scene (called a “prompt”) into a variety of visual styles that can be fantastical, photorealistic, or even mundane. People with privileged access to the restricted tool produced astronauts on horseback, teddy bears purchasing bread in ancient Egypt, and novel sculptures in the style of renowned artists, among other things.
i) The model is being released under a Creative ML OpenRAIL-M license [https://huggingface.co/spaces/CompVis/stable-diffusion-license]. This is a permissive license that allows for commercial and non-commercial usage. This license is focused on ethical and legal use of the model as your responsibility and must accompany any distribution of the model. It must also be made available to end users of the model in any service on it.
ii) They have developed an AI-based Safety Classifier included by default in the overall software package. This understands concepts and other factors in generations to remove outputs that may not be desired by the model user. The parameters of this can be readily adjusted and they welcome input from the community how to improve this. Image generation models are powerful, but still need to improve to understand how to represent what they want better.
As these models were trained on image-text pairs from a broad internet scrape, the model may reproduce some societal biases and produce unsafe content, so open mitigation strategies as well as an open discussion about those biases can bring everyone to this conversation. Learn more about the model strengths and limitations in the model card.
This release is the culmination of many hours of collective effort to create a single file that compresses the visual information of humanity into a few gigabytes.
You can find the weights, model card and code here:[https://huggingface.co/CompVis/stable-diffusion]
An optimized development notebook using the HuggingFace diffusers library: [https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb]
A public demonstration space can be found here: [https://huggingface.co/spaces/stabilityai/stable-diffusion]
DreamStudio beta here: http://beta.dreamstudio.ai.
Additional functionality and API access will be activated shortly, including local GPU support, animation, logic-based multi-stage workflows and many more.
We are also happy to support many partners through our API and other programs and will be posting on these soon.
The recommended model weights are v1.4 470k, a few extra training steps from the v1.3 440k model made available to researchers. The final memory usage on release of the model should be 6.9 Gb of VRAM.
In the coming period They will release optimized versions of this model along with other variants and architectures with improved performance and quality. We will also release optimisations to allow this to work on AMD, Macbook M1/M2 and other chipsets. Currently NVIDIA chips are recommended.
They will also release additional tools to help maximize the impact and reduce potential adverse outcomes from these tools with amazing partners to be announced in the coming weeks.
This technology has tremendous potential to transform the way we communicate and we look forward to building a happier, more communicative and creative future with you all.
Please contact email@example.com if you want us to feature your article.