AI really struggles with direction. And it can't count.
Months ago, when I was playing with ideas about a cover for my crime novel, "Dark Water," I described the Vimy Memorial Bridge that spans the Rideau River in Ottawa's south end and wanted ChatGPT to create the image such that there was a woman in the river, floating face-down.
The AI tool created a random bridge, crossing the Ottawa River, below Parliament Hill. Buildings in the background didn't match Ottawa's downtown core, but most importantly, the woman was floating face-up in the river.
I wrote about this struggle with AI when I was still just sussing out my story.
It took some time, between breaks in my writing, to come up with an image that was passable, though I wasn't sold on it as a possible cover when it came time to publish my book. And every once and awhile, when I took a break from writing, I'd experiment with other covers.
When I finished writing the first draft of "Dark Water," I wanted to wait a few days before I would read what I had. I wrote the final sentence in the epilogue on a Thursday and started the reading and editing process on the following Monday.
On that Friday, I thought I'd have another go at designing a cover, using ChatGPT. Here's the information that I gave it:
Create a book cover for my murder mystery using the bridge in the attached photo. The angle of view would be below the water, underneath the bridge, looking upward. In the water, floating on the surface, is a woman who is dressed in blue jeans and a black jacket. She has long, red hair that partially hides her face. She is faced down in the water.
I gave ChatGPT the following attachment photo:
Here's what the AI took came back with:
It wasn't even close to what I had described. First, it made up a title that I hadn't provided (ant the kerning sucks). Also, the woman is lying on the shore or on a dry river bed (I admit that the photo I provided shows the water at a low level). I gave the prompt more instructions.
Remove the title. The angle of view should be underwater, looking up at the woman. It's morning, so light shines down, placing the woman in a partial silhouette.
The result was better but Vimy Bridge now only had two arches that were more round. The woman was fine but for a body floating at the surface, her legs seemed to be too low (I've luckily never seen a dead person floating in water, before, so I don't know how accurate that angle is).
Once more, I gave ChatGPT more instruction:
Much better. Make three arches on the bridge and make the angle of view such that the audience is below the woman, looking up at her. The bridge is visible through the water. Have the woman laying more prone on the surface, such that her legs are closer to the water surface. She should also be wearing shoes.
Okay, so the victim now had shoes. And was more prone. But the bridge still had two arches. Also, for the character of Emily Fraser, I imagined her hair much longer.
I decided to go into a much longer, detailed description of my image idea, so wrote this:
Create a realistic, cinematic-style murder mystery book cover for Dark Water by Ross Brown. Underwater view, looking upward toward the water’s surface and a large steel bridge with three arches. The arches are elongated (more stretched than semi-circular) and clearly visible through the water, above. Morning light streams down through the water, casting soft beams that illuminate the scene. Floating face-down on the surface is a woman with very long, flowing red hair that spreads out in the water. She is wearing blue jeans, a black jacket, and shoes. Her back and most of her legs are close to the surface so they catch the light, creating a partial silhouette effect. The camera angle is from directly below the woman, looking up toward her and the bridge above. The water is a deep, clear blue-green with gentle ripples and light shimmer on the surface. The cover should create a tense, mysterious, and haunting mood, blending realism with cinematic drama.
There, that aught to do it. I pressed Enter and got the following message:
You’ve hit the free plan limit for image generation requests, so I can’t create that cover right now.
Your limit will reset in about 23 hours — once it does, you can paste the exact prompt you’ve written and the AI should render your scene exactly as described.
Ffff... !
Fine, I thought. I had been spending too much time on this and it wasn't productive. I moved onto other things around the house telling myself I'd try again sometime on Saturday night.
On Saturday, I worked on my Peru video, which ate up most of the afternoon and evening. But I did take a break, late in the evening, and posed the same query to ChatGPT. It responded,
I still can’t generate it yet — your free plan image limit hasn’t reset. Once it does, you’ll be able to paste that Dark Water prompt directly and get the cover.
If you want, I can keep your final prompt and have it ready so you can drop it in the instant your reset hits. That way you don’t lose time retyping everything.
Do you want me to store it here for you in a clean, copy-ready format?
Bloody hell. I told the AI tool to store the prompt and told myself that I'd try again on Monday.
At the start of the week, I prompted ChatGPT once again, copying its clean copy into the prompt. I still wasn't able to get my image. Nor on Tuesday. Nor Thursday.
On Friday, I received an e-mail from Bell, which informed me that it was offering me a free one-year subscription to the AI-powered search engine, Perplexity Pro. It was a $249 value.
I thought, what the hell, why not, and downloaded the promo code. I entered a reminder at the beginning of August, next year, to cancel the subscription.
By then, I thought, ChatGPT might let me generate an image again.
I copied the prompt from ChatGPT and entered it into Perplexity, with a few changes and a new image of the Vimy Bridge. I also gave a suggestion for the font style for the title. Here's what the new AI tool gave me:
Honestly, it wasn't bad for a first attempt. But Fraser's legs were still too low for my liking, she seemed to be much deeper in the water, and she was facing away from the viewer. Plus, it seemed as though Fraser had two right hands. Also, I didn't imagine Fraser's hair in a ponytail, so I added the following suggestion to the AI tool:
Place the bridge such that it doesn't look like it's underwater and is diffused by the water. Make the woman more prone, as she is floating on the surface of the water. She should be facing the camera and her hair is not tied back, but is floating wildly about her head.
Here is what Perplexity gave me for its second attempt.
Obviously, there are a couple of issues with this image. Vimy Bridge has only two arches, just as ChatGPT had given it. Fraser was floating on her back with her face visible, and was viewed from the side. Her hair was also shorter than I wanted for my character. And what happened to the title?
Admittedly, I didn't specify to hide Fraser's face so I added a new prompt:
Turn the woman so that she is face-down in the water. Make her hair longer. The back of her head touches the water surface, such that someone who sees her from above the water believes that she could be an animal. Also, you have changed the bridge such that it is now missing an arch: there should be three. And place the title and author as described, above.
Everything about this image is perfect, except for the subject. Fraser is not at the water surface and she is twisted in a bizarre way: her legs and feet are facing in different directions such that it's hard to tell which way she is facing, and she has two right legs. You can't see all of the arches of the bridge but from this angle, it doesn't matter.
Growing frustrated, I tried one more time:
Keep everything in the last image the same but start over with the woman: she is floating face-down at the surface. She has very long, flowing red hair that spreads out around her head. She is wearing blue jeans, a black jacket, and shoes. Her back and most of her legs are close to the surface so they catch the light, creating a partial silhouette effect. The camera angle is from directly below the woman, looking up toward her and the bridge above.
AI struggles with direction and can't count. Fraser is facing away from the viewer. Also, the bridge doesn't have quite the right amount of arches.
Last chance:
Keep everything the same but move the woman so that her head is toward the camera, her body trailing off behind.
I give up.
When and if my book gets published, I'll let the publisher worry about the cover. I'll have some mock-ups prepared but I can't rely on AI to give me a good cover.
I don't know if I'll keep Perplexity, though I have ideas for more Calloway and Hayes books, so I'll see about using the tool to help me with enhanced searches and whether it's any better than ChatGPT.
Happy Monday!