ChatGPT, Google Bard, Claude2, Bing Chat, Llama 2; who is more accurate?

10 min readAug 13, 2023

As the popularity of large language models (LLMs) has grown, I’ve begun to ponder their impact on public perception. Setting hallucinations aside, consider the consequences of minor nuances in answers. What if these small shifts alter the entire narrative? As reliance on LLMs grows, how readily will their outputs be accepted as truth, especially as people become more dependent and perhaps even complacent?

I was reminded of a Chinese debate variety show I watched a few years ago. The topic was intriguing: “If there’s a pill that grants you all the knowledge in the world, would you take it?” While many participants deliberated the advantages of possessing such vast knowledge, the opposing team presented an intriguing counterargument. Amid discussions of the benefits and the luxury of time saved from not having to learn, they posed a critical question: How do we ensure the knowledge is unbiased? Can we accept everything as an absolute truth, and who decides what knowledge this all-encompassing pill contains?

This dilemma becomes increasingly pertinent with the rise of Large Language Models. These models, in essence, have become the proverbial pill. But the information they provide can vary significantly based on who fine-tunes them and the data they’re trained on.

ChatGPT, Google Bard, Claude2, Bing Chat, Llama 2; who is more accurate?

Written by Kelvin Zhao