Community Comment: Part 38 - Choose your words wisely, not just for LLM prompts but for life in general

  • LLM models respond differently to "What are the top 5 running shoe brands?"
  • This is because "top" can mean different things depending on context
  • For example, the context can be "highest sales revenue" or "greatest popularity"
  • If the context is "best" in terms of recommendation, it gets even more complex

The comments I provided in reaction to a community discussion thread.

VP of SEO at Price Comparison Firm:

I asked a few LLMs for the top 5 running shoe brands.

1️⃣ Brand Bias:
Gemini-2.0-Flash does not like Nike and Adidas; but is a huge Hoka fan.

Claude-3-Opus does not recommend Adidas either.

Both of these models recommend Saucony much more than any of the other models!

Mistral-Large recommended New Balance the least from all models. And it was the only one to recommended Under Armor and Merrell.

2️⃣ Consistency & Variations
Gemini-2.0-Flash was the only model to reply with the same answer every time.

Mistral-Lage was the only model that was not consistent in its number 1 recommendation. All other models always replied with the same brands on position 1 and 2.

Mistral-Lage was also the only model to recommend a total of 8 different brands.

Gemini-2.0-Flash and GPT-4.5-Preview always recommended the same 5 brands.

The other models recommended a total of 6 or 7 brands.

3️⃣ Interpretation
While the exact result will vary based on your prompt(s), and potentially your account history, it is interesting to see how much the models disagree.

The difference in results means you should probably start tracking the visibility of your brand in LLM.

4️⃣ Methodology
Prompts were run vs the API. US-based. Each prompt was run 10 times.

No grounding or web search. So the replies come from the foundational model. In many real-world scenarios, such prompts would be answered with RAG.

Thanks to Zongo AI (Gonzalo Lorca) for allowing me to run a couple of ad-hoc reports like this one against 50+ models via his infrastructure.

Gfesser:


What does "top" mean? This can mean many different things such as highest sales revenue or greatest popularity. In order to ask a question like this, not just to an LLM but for life in general, your audience needs to first understand the definition. In your post, you seem to associate "top" with "recommendation", so perhaps you're looking for the "best" running shoe brands? What does "best" mean? As a lifelong runner who ran as a school athlete for ten years, I assure you that choosing a particular running shoe brand or model needs to be catered to you specifically: what is your weekly mileage, your weight, your level of pronation, the width of your feet, your favorite running surfaces etc. I learned long ago, for example, that Nike shoes are far too narrow for my feet, and I switched from Asics to Altra about 8 years ago, but Altra isn't even on this list. English words often have multiple meanings: choose your words wisely, and depending on the audience, make sure you define them to help ensure there's no miscommunication.

VP of SEO at Price Comparison Firm:

Erik Gfesser you are absolutely right. My prompt lacks context and is very open to interpretation.

I repeated the comparison with a more detailed prompt. Again, the models gave very different replies.

VP of SEO at Price Comparison Firm:

For the record, I would need a lot of convincing to buy anything but whatever is the newest Gel-Kayano from Asics. Those things always worked like a charm and fit me perfectly.

Gfesser:


I believe you. If Kayano works well for you, it would seem you over pronate because it's designed to be a stability shoe. I've never personally considered Kayano for myself because neutral shoes work best for me.

Subscribe to Erik on Software

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe