Efficient Strategies for Effective Stable Diffusion Prompt: A Comprehensive Guide | by Youssef Hosni

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, which is known as prompt. Developing a process to build good prompts is the first step every Stable Diffusion user tackles.

In this article, we will discuss and explore some techniques to increase the effectiveness of our prompt. Starting from adding certain keywords and composites to the prompt, changing the order of the words and their punctuation, to changing the guidance scale.

Efficient Strategies for Effective Stable Diffusion Prompt: A Comprehensive Guide

Table of Content:

Adding Key Words
Adding Composites
Changing the Word’s Order
Changing the Punctuation
Changing the Guidance Scale

If you want to start a career in data science & AI and need to know how. I offer data science mentoring sessions and long-term career mentoring:

Join the Medium membership program for only 5$ to continue learning without limits. I’ll receive a small portion of your membership fee if you use the following link at no extra cost.

The first change we will try to know its effect on image generation is adding different words to the input prompt and observing its effect. We will use the same seed and scale, and prompt and add only some words and compare it to the image generated using the original prompt to have a better intuition about what works and what does not work. The original prompt will be:

A Cyberpunk cat wearing a steampunk hat

The added words will be the following:

focused
sharp
painting
chalk art
concept art
trending on artstation
canon m 50
close-up
charcoal drawing
intricate

So let’s first run the original prompt and see the results in figure 1:

Original prompt:

A Cyberpunk cat wearing a steampunk hat

Figure 1: A Cyberpunk cat wearing a steampunk hat generated using stable diffusion.

Let’s run the same prompt now but after adding the word focused:

A Cyberpunk cat wearing a steampunk hat, focused

Figure 2: A focused cyberpunk cat wearing a steampunk hat images generated using stable diffusion.

Comparing these images to the original one we can see that the images become more focused which is what the word should do. So adding the word focused makes the images more focused and do what is expected to do.

Let’s add the word sharp and see the results in figure 3:

A Cyberpunk cat wearing a steampunk hat, sharp

Figure 3: A Sharp cyberpunk cat wearing a steampunk hat.

For the word sharp we can observe that the images become a little bit sharper but I think it does not do much so adding the word will have an effect but not that much.

Let’s see adding the results of adding the word painting in figure 4:

A Cyberpunk cat wearing a steampunk hat, painting

Figure 4: A painted cyberpunk cat wearing a steampunk hat

The effect of this word is very obvious we can now see that all the images are painting so definitely adding this word will have a strong effect on the generated image.

Let’s add the word chalk art and see the results in figure 5:

A Cyberpunk cat wearing a steampunk hat, chalk art

Figure 5: A chalk art cyberpunk cat wearing a steampunk hat.

Also, this word has a very strong effect on the generated image. We can see that all the images are chalk art created, so adding this word in the prompt has a strong effect on the generated image.

Let’s see the effect of adding the concept art on the generated image in figure 6:

A Cyberpunk cat wearing a steampunk hat, concept art

Figure 6: A concept art cyberpunk cat wearing a steampunk hat

We can observe that adding this word made significant changes in the generated images and they look as if they are generated by a conceptual artist.

Now let’s add the trending on art station word and see the results in figure 7:

A Cyberpunk cat wearing a steampunk hat, trending on art station

Figure 7: A trending on an art station cyberpunk cat wearing a steampunk hat.

The next word we will add is canon m 50 and the results are represented in figure 8.

A Cyberpunk cat wearing a steampunk hat, canon m 50

Figure 8: A canon m 50 cyberpunk cat wearing a steampunk hat.

We can see that the changes in the generated images are not obvious so adding this word will not result in changes in the generated images. Next, we will see the effect of adding the close-up word in the prompt on the generated images as shown in figure 9.

A Cyberpunk cat wearing a steampunk hat, close-up

Figure 9: A close-up cyberpunk cat wearing a steampunk hat,

The effect of adding the close-up word on the prompt on the generated image is noticeable and we can see that the images are zoomed and closed up. Next, we will try the charcoal drawing keyword and see its effect on the generated images as shown in figure 10.

A Cyberpunk cat wearing a steampunk hat, charcoal drawing

Figure 10: A charcoal drawing of a cyberpunk cat wearing a steampunk hat.

We can see that adding the charcoal drawing word has a strong effect on the generated image and it looks as if they are drawn using charcoal. The final word we will add is the intricate word and the images generated are shown in figure 11.

A Cyberpunk cat wearing a steampunk hat, intricate

Figure 11: An intricate cyberpunk cat wearing a steampunk hat.

We can see that this keyword added extra details to the generated images compared to the images generated using the original prompt. Next, we will try to combine some of the keywords in the prompt and observe the effect of adding these composites in the prompt.

The second variation we will add to the prompt is the adding composites of the keywords we used in the previous section. Here are the variations we will use:

charcoal drawing, intricate, concept art
canon m50, close_up, sharp, focused

Let’s start with the first composite which is adding charcoal drawing, intricate, and concept art to the prompt:

A Cyberpunk cat wearing a steampunk hat, charcoal drawing, intricate, concept art

Figure 12: A charcoal drawing, intricate and concept art of a cyberpunk cat wearing a steampunk hat.

We can see that the generated images meet our expectations from the added keywords and the images look as if the three words were taken into consideration and also have an effect on the generated images. We can see that the images generated are full of details and drawn using charcoal and have a taste of concept art.

Next, let’s try the second composite of keywords which is canon m50, close_up, sharp, and focused. The results are shown in figure 13.

A Cyberpunk cat wearing a steampunk hat, canon m50, close_up, sharp, focused

Figure 13: A canon m50, close-up, sharp, and focused cyberpunk cat wearing a steampunk hat.

We can see that the added words have a smaller effect than the previous example since the words used here already had a small effect as we mentioned in the previous example. The keyword that had the strongest effect on the generated images is the close-up word and this is very obvious in the generated images as they are closed up to the face of the cats.

We can try also to change the order of the keywords and see whether it will have an effect on the generated images or not. Let’s see the effect of changing the order of the words in the next section.

We have seen the effect of adding certain keywords and the effect of combining these keywords. Now we will see whether changing the order of the words in the prompt will have an effect on the generated images or not. We will start with the prompt below and then change the order of the words and see the effect of this on the generated images.

A Cyberpunk cat wearing a steampunk hat, intricate, painting

Figure 14: Images generated using the ‘ A Cyberpunk cat wearing a steampunk hat, intricate, painting’ prompt.

Now, let’s change the order of the keywords and add the word painting at the beginning of the prompt. The results are shown in figure 15.

painting, A Cyberpunk cat wearing a steampunk hat, intricate

Figure 15: Image generated using this “painting, A Cyberpunk cat wearing a steampunk hat, intricate” prompt.

We can see that adding the keyword painting at the beginning of the prompt has made the generated images look more like a painting, especially the left-column images. You can see that the generated images look like a painting of a cat. So if you would like to stress a certain word in the prompt a smart move will be to put it at the beginning of the prompt.

Now let’s see what will happen if we add both keywords at the beginning of the prompt. The results are shown in Figure 16.

painting, intricate, A Cyberpunk cat wearing a steampunk hat

Figure 16: Image generated using this “painting, intricate, A Cyberpunk cat wearing a steampunk hat” prompt.

Again we can see that adding the keywords at the beginning of the prompt made the words have a stronger effect on the generated images. So a very good move if you would like your generated image to follow the prompt is to put certain keywords at the beginning of the prompt.

Now that we have seen the effect of changing the word order in the prompt, do you think that changing the punctuation in the prompt would have a strong effect on the generated images? Let’s try this in the next section to know the answer to this question.

Now we will change the punctuation of the prompt, especially for the added keywords. We will try only three variations:

First, we will add a full stop at the end of the prompt.
Second, we will then add three full stops.
Finally, we will remove the commas between the two keywords used.

As usual, we will start with the original prompt used in the previous section so we can have a baseline to compare with:

A Cyberpunk cat wearing a steampunk hat, intricate, painting

Figure 17: A Cyberpunk cat wearing a steampunk hat, intricate, painting

Let’s add a full stop at the end of the prompt and observe the changes in the generated images shown in figure 18.

A Cyberpnk cat wearing a steampunk hat, intricate, painting.

Figure 18: An intricate painting of a cyberpunk cat wearing a steampunk hat with a full stop.

I think there is not much difference in the generated images and this means that adding a full stop at the end of the prompt will have a very small effect on the generated image.

Next, we will add three full stops at the end of the prompt and observe whether this will change the generated images or not. The generated images are shown in figure 19.

A Cyberpunk cat wearing a steampunk hat, intricate, painting...

Figure 19: Image generated using “A Cyberpunk cat wearing a steampunk hat, intricate, painting…” prompt.

We can see that also adding three full stops at the end of the prompt will not have an effect on the generated images. Finally, we will remove the comma between the two keywords and observe the results shown in figure 20.

A Cyberpunk cat wearing a steampunk hat, intricate painting

Figure 20: Image generated using “Cyberpunk cat wearing a steampunk hat, intricate painting” prompt.

Again there is not much difference we can observe between these images and the first one. So in conclusion, we can say that the punctuation changes will not have a huge effect on the generated image. In the next section, we will observe the effect of the guidance scale on the generated images.

In conclusion, adding certain keywords and changing the word’s order will have a strong effect on the generated images. It is noteworthy to mention that this is based on stable diffusion v2 so if you are trying with an older or a newer version the results might be different.

References

If you like the article and would like to support me, make sure to:

Join the Medium membership program for only 5$ to continue learning without limits. I’ll receive a small portion of your membership fee if you use the following link at no extra cost.

Looking to start a career in data science and AI and do not know how. I offer data science mentoring sessions and long-term career mentoring: