Battle of the bots

Comparing the creativity of different Large Language Models

Oct 23, 2023

Hello again and welcome back to the /promptcollective newsletter!

Today I will be testing the creativity of different Large Language Models (LLMs) by giving them the exact same task:

While there has been a lot of discussions about the question IF LLMs can be creative at all, I haven’t read a lot about the different levels of creativity that different models are capable of.

(prompt: photo of six different large language models having a war of ideas. model: Stable Diffusion XL via Poe.com bot)

If you think somebody else would enjoy this post - please share it now:

Different LLMs?

The platform I use a lot POE.com has started offering quite a few different LLMs recently. Today I will compare the following models:

As you can see, there is quite a wide variation, from relatively small models to huge models and from “secret sauce” models to open source. Let’s see how they perform on my test today:

First up - Solar-0-70b

The next contender is Google-PaLM

And now fw-mistral-7b

More like “What is the most generic idea you can come up with?” ;-)

Now Llama-2-70b

And here comes Claude-2

And last we have GPT-4

So - a few thoughts: All in all I am always impressed by these exercises. What i am showing you now is just a one-shot run of a very simple and quite generic prompt, and still we do (mostly) get ideas that are not bad at all. Yes some of them sound quite generic, and yes there is definitively a bias towards science fiction, but I have heard a lot worse one-liner ideas from actual humans working as writers.

I find it really fascinating how quickly we as humans have accepted, that a machine can now be creative on this level. Sometimes it’s almost like we skipped the phase of being really impressed and went directly to criticizing the ideas generated.

And that probably is the best indicator, that we take this non-human creativity seriously, because why else even bother to criticize?

The science fiction bias I find very interesting as it shows how certain trigger words, in this case “innovative” can frame the model to generate a certain type of output. Just to show you my point here two other trigger words with GPT4:

The winner takes it all…

Ok but let’s not forget this is a competition. My personal favorite idea is Google Palm’s idea about the young woman who can control the weather - I directly see a Greta Thunberg like figure that out of sheer despair for the world develops superpowers to equal those of the carbon lobby…

But of course - i should not make this judgement alone, so who better to ask than a commissioning editor from Netflix (or at least GPT4 prompted as one and then reminded to not be sooooooo chatty):

Ha - so ChatGPT4 picks it’s own idea. I find this interesting, because it does show quite an interesting consistency. Not only is this the concept the model outputs, it is also the one that it analytically chooses as the best.

I wanted to see if other models have the same consistency, so I asked Claude-2 the same thing (with the same reminder to be concise):

Claude-2 does not pick it’s own idea, but the one generated by the LLAMA-2 open source model. It critiques it’s own pitch for lacking more plot/character details. Interesting!

Final thoughts

For me, having different models compete in generating ideas and consequently judging those ideas can be a very interesting technique in idea generation processes. And while ChatGPT4s reasoning capabilities are really above the other models with Claude-2 being a close second for me, the creative spark can also come from less sophisticated models.

I will certainly be experimenting a lot more with chaining different LLMs together in the coming time.

Something I Loved This Week

As I am on holiday at the time of writing this blogpost and holiday for me means NO EMAIL and NO NEWS I don’t have anything super current to share with you. But for everybody interested in AI who hasn’t read “God, Human, Animal, Machin"e” by Meghan O'Gieblyn this is a FAT recommendation.

As always: Thank you for reading along. If you're interested in exploring all things AI in the creative industries, don't hesitate to reach out at join@promptcollective.xyz.

And if haven’t already subscribed please: