In this post, we will learn how to train ChatGPT to generate prompts for realistic photos for AI diffusion platforms like MidJourney. We will focus only on creating street portrait photos, not studio photography, in an AI platform without an advanced camera.

We will focus on the basics of photography and some concepts of street portrait photography, which are required to train ChatGPT to generate the prompts for us.

Though we have used MidJourney as an AI platform for art creation, there are other AI platforms, like Stable diffusion, Fotor AI, Civitai, and Lexica, where you can use these prompts, however the results may vary depending on their AI model. These prompts work best with MidJourney. You can always experiment with the prompts to improve the result.

1. Generating Prompts by Training ChatGPT

Open ChatGPT and type the following:

Me: Do you know what a diffusion model is? Don’t explain, please answer “yes” or “no” unless stated otherwise.

ChatGPT: Yes

Me: Here is a formula I want you to use to generate diffusion prompts for me when asked. The formula is as follows:

Portrait of a (male or female), (5 adjectives), (1 verb), Close up, using (camera model), lens focal length (35mm, 55mm or 90mm), (studio, soft or cinematic) lighting, aperture(1.2, 1.4 or 2.2), bokeh, (day or night), (street photography), (1 location around the world), 32k, sharp focus, (unreal 5 or octane) render, photo-realistic

Now before moving forward to the prompts given by ChatGPT, we need to explain the formula.

The text within the brackets is what AI generates randomely or as per the parameteres which we have defined, however the text outside the brackets is fixed in the prompt.

Portrait of a (male or female): “Portrait” is the fixed part, but AI randomly chooses a female or male in the formula.

(5 adjectives): AI will define 5 randome adjectives which can be anything like sociable, humorous, aggressive, happy, etc.

(1 Verb): AI will describe the subject with one verb.

Close up: We use the word “Close Up” as we want a portrait photography.

Using (camera model): AI here will use a camera model, randomly from Nikon, Fuji, Canon, etc.

Lens Focal Length (35mm, 55mm or 90 mm): The actual distance in millimetres between the optical centre of the lens and the camera sensor. In simple words, the higher the focal length, the more zoom you’ll have, and the lower the focal length, the wider shot you get. In street photography 35mm or 55mm focal length is the sweet spot. Photographers use 90mm focal length in wedding shots, for example, when they need to take a picture from a distance.

(Studio, soft or cinematic) lighting: In every diffusion model, you can choose the type of lighting you want for the environment or subject. For this formula, we are asking the AI to randomise between 3 types of lighting.

Aperture (1.2, 1.4 or 2.2): Aperture is how wide the lens’s diaphragm is open for the light to pass in. In simple terms, the lower the number, the higher the aperture, meaning more light can pass in. For example, when you shoot street photography at night using an aperture of 1.2 (also called f-stop 1.2), you can shoot a handheld photograph with minimal noise compared to photos taken at lower apertures. The typical aperture for a lens could go as low as 0.9 to as high as 22. The higher the aperture, the more expensive the lens, and vice versa. For night street photography, use an f-stop of 1.2 or 2.8 if you are shooting handheld. For daytime, you can choose anything depending on shutter speed, the composure of your shot and the type of photography.

Bokeh: Bokeh is the visual quality of the out of focus areas of the photograph. If you want a dreamy, surreal and smooth bokeh use a high f-stop.

Day or night: We have given the option to AI to define the photo by time of day.

Street photography: This keyword in the prompt is constant since we want street photography samples.

(1 location around the world): The AI here will select one location across the globe randomly.

32K: You can put 4K, 8K, 16K or 32K. We use this in prompts as it defines the details in the output as they are resolution sizes.

Sharp focus: This keyword is prefered when we are generating facial close ups so that all the details in focus are sharp.

(Unreal 5 or Octane) Render: There are different types of engine renders that diffusion models use. For this post, we will stick to Unreal Engine 5 or Octane Render only.

Photo realistic: Is a keyword we usually use when we want realiasm, no matter the subject.

2. Creating Midjourney Photos based on Prompts

We will review each prompt generated by ChatGPT and show you some samples MidJourneys throws out using those prompts. Also MidJourney is not perfect, there are some concepts I could not get MidJourney to generate correctly, no matter what tweaks and hacks I made to the prompts. Read the the end of the article to see the limitations of MidJourney in retrospect to the subject matter.

Prompt 1: Portrait of a male, confident, athletic, charismatic, friendly, relaxed, standing, close up, using Canon EOS R5, lens focal length 35mm, studio lighting, aperture 1.2, bokeh, day street photography, New York City, 32k, sharp focus, Unreal 5 Render, Photo-realistic.

Photos are so realistic with intrinsic facial and hair details along with the bokeh being soft and dreamy.

Prompt 2: Portrait of a female, elegant, mysterious, intelligent, poised, artistic, dancing, close up, using Nikon D850, lens focal length 55mm, soft lighting, aperture 1.4, bokeh, night street photography, Paris, 32k, sharp focus, Octane render, Photo-realistic.

Notice how the AI ignore the word “Dancing” used in the prompt, but did everything right as far as the rest of the prompt is concerned. The Noise levels on these photos is unbelievable. You would require some post processing to get this level of extremely low to no noise and bokeh when taking a photo from an actual DSLR.

Prompt 3: Portrait of a male, rugged, adventurous, pensive, kind, determined, running, close up, using Sony A7R IV, lens focal length 90mm, cinematic lighting, aperture 2.2, bokeh, day street photography, Sydney, 32k, sharp focus, Unreal 5 render, Photo-realistic.

These photos are created based on a Sony A7R IV camera which costs about 3200usd. However, there is no lens at the focal length of 90mm with an aperture of 2.2. Sony officially only has a 90mm with an aperture of 2.8 macro lens. This is the magic of the AI art generation: combining and creating photos based on specifications which they don’t really exist in the market!

Prompt 4: Portrait of a female, vivacious, spirited, carefree, bold, expressive, laughing, close up, using FujiFlim GFX 100, lens focal length 35mm, studio lighting, aperture 1.2, bokeh, night street photography, Tokyo, 32k, sharp focus, Octane render, Photo-realistic.

Notice how the AI heavy used the word “vivacious” which means “attractively lively and animated (typically used of a woman).” Also notice the ethnicity of the subject since the word “Tokyo” was used in the prompt.

Prompt 5: Portrait of a male, contemplative, intellectual, serene, focused, inquisitive, reading, close up, using Panasonic Lumix S1R, lens focal length 55mm, soft lighting, aperture 1.4, bokeh, day street photography, London, 32k, sharp focus, Unreal 5 render, Photo-realistic

For this prompt, one of the images is black and white, which was good, though. However, if you prefer colourful images, you can tweak the prompt and add keywords like “vibrant” and “colourful”.

Prompt 6: Portrait of a female, ethereal, enigmatic, dreamy, gentle, whimsical, gazing, close up, using Nikon Z7 II, lens focal length 90mm, cinematic lighting, aperture 2.2, bokeh, night street photography, Venice, 32k, sharp focus, Octane Render, Photo-realistic

Look at the details! and that bokeh is a killer. However if you prefer less bokeh you can play around and reduce the depth of field, which means to increase the aperture.

Prompt 7: Portrait of a male, energetic, creative, engaging, humorous, sociable, singing, close up, using Canon EOS-1D X Mark III, lens focal length 35 mm, studio lighting, aperture 1.2, bokeh, day street photography, San Francisco, 32k, sharp focus, Unreal 5 render, Photo-realistic.

These images are perfect, Though to get the image 2 we had to use different variations to get the ideal shot we wanted. You always have the option to create further variations of each image in MidJourny to get different variations of what you want. However, sometimes you should change or regenerate the prompt to get the desired result.

3. Mistakes in Midjurney Images

MidJourney or any other diffusion model will be having some mistakes in their pictures. You should pay attention to all the details to not miss such mistakes. One of the common mistakes in MidJourney is creating photos of people with “umbrellas”. It seems almost impossible to get an umbrella image right. Even after regenerating prompts and adding a phrase like “holding an umbrella in hand”, there was still an error with all Umbrella photos.

Other common mistakes in photos are about “hands”. Although one of the significant improvements in MidJourney v5 is to correct the “hands”, you can still see plenty of pictures with the same mistake. Hands are missing, distorted, or they are just not right. You can see some examples of “hand” mistakes below.

If you notice in these examples or (some parts of the)hands are missing or fingers are not right. We could not get the hand images correct in many images, specially couples holding hands.

4. Photographic Realism in Futuristic World

As we told you before, we can tweak prompts to make desired image changes. We can change camera setups, bokeh, colourful or black & white variations, ethnicity, skin colour, hair type, etc. We can also change the prompt to make it in a futuristic or imaginary environment. Check the below images which we have created in a futuristic imaginary world/planet.


To conclude, AI is as good as your imagination. Whether you use ChatGPT or not, your creativity and knowledge will create beautiful street portrait photography images. There is a massive difference between defining a formula for ChatGPT and simply asking it to create prompts. Even with ChatGPT, you have to train it to generate the prompts. There are many AI platforms, and MidJourney is one of the best ones available as of now. But you can try prompts on different platforms and see the variations.

Here you can download all the images created for this video in MidJourney.

Categorized in: