The most popular AI technology base got a major upgrade on Tuesday with the GPT-4 release of OpenAI now available in the premium version of the ChatGPT chatbot.
GPT-4 can generate much longer text strings and respond when people add images to it, and it’s designed to better avoid the artificial intelligence pitfalls visible in the earlier GPT-3.5, OpenAI said Tuesday. For example, in bar exams lawyers must take to practice law, GPT-4 ranks in the top 10% of scores compared to the bottom 10% for GPT-3.5, according to the AI research firm.
GPT stands for Generative Pretrained Transformer, a reference to the fact that it can generate text itself – now up to 25,000 words with GPT-4 – and that it uses an AI technology called transformers that Google pioneered. It’s a type of AI called a large language model, or LLM, that’s trained on huge amounts of data collected from the internet, learning mathematically to recognize patterns and reproduce styles. Human overseers review results to steer GPT in the right direction, and GPT-4 has more of this feedback.
OpenAI has made GPT available to developers for years, but ChatGPT, which debuted in November, offered an easy interface for regular people to use. This resulted in an explosion of interest, experimentation and concern about the downsides of the technology. It can do everything from generating programming code and answering exam questions to writing poetry and providing basic facts. It is remarkable, if not always reliable.
ChatGPT is free, but may falter if demand is high. In January, OpenAI started offering ChatGPT Plus for $20 per month with guaranteed availability and now the GPT-4 base. Developers can join a waiting list to get their own access to GPT-4.
GPT-4 improvements
“In casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference emerges when the complexity of the task reaches a sufficient threshold,” said OpenAI. “GPT-4 is more reliable, more creative and can process much more nuanced instructions than GPT-3.5.”
Another major advancement in GPT-4 is the ability to accept input data, including text and photos. In OpenAI’s example, the chatbot is asked to explain a joke that shows a bulky, decades-old computer cable plugged into the tiny Lightning port of a modern iPhone. This feature also helps GPT take tests that are not just textual, but are not yet available in ChatGPT Plus.
Another is better performance in avoiding AI problems like hallucinations – incorrectly fabricated responses, often presented with as much seeming authority as answers the AI gets right. GPT-4 is also better at thwarting attempts to get it to say the wrong thing: “GPT-4 scores 40% higher than our latest GPT-3.5 on our internal contradictory factuality evaluations,” according to OpenAI.
GPT-4 also adds new “steerability” options. Users of large language models today often have to do extensive prompt engineering, learning how to embed specific clues into their prompts to get the right kind of responses. GPT-4 adds a system command option that allows users to set a specific tone or style, for example programming code or a Socratic tutor: “You are a tutor who always responds in the Socratic style. You never give the student the answer, but always try to do exactly the ask the right question to help them learn to think for themselves.”
“Stochastic parrots” and other problems
OpenAI acknowledges significant shortcomings that persist with GPT-4, though it also encourages avoidance of them.
“It can sometimes make simple errors of reasoning… or be overly gullible in accepting blatant false statements from a user. And sometimes it can fail on difficult problems in the same way humans do, such as introducing security vulnerabilities into the code it produces ,” OpenAI said. In addition, “GPT-4 can also be confidently wrong in its predictions, not taking care to double check its work when it is likely to make a mistake.”
Large language models can produce impressive results, seeming to understand vast amounts of topics and converse in human-sounding if somewhat stilted language. But essentially, LLM AIs don’t really know anything. They are just able to string words together in very statistically sophisticated ways.
This statistical but fundamentally somewhat hollow approach to knowledge led researchers, including former Google AI researchers Emily Bender and Timnit Gebru, to warn of the “dangers of stochastic parroting” associated with large language models. Language model AIs tend to encode biases, stereotypes, and negative sentiment into training data, and researchers and other people using these models tend to “…confuse performance gains with actual natural language understanding”.
Sam Altman, Chief Executive of OpenAI, acknowledges the issues, but is generally pleased with the progress being made with GPT-4. “It’s more creative than previous models, it hallucinates significantly less, and it’s less biased. It can pass a bar exam and score a 5 on several AP exams,” Altman tweeted Tuesday.
One concern about AI is that students will use it to cheat, for example answering essay questions. It’s a real risk, although some educators are actively embracing LLMs as a resource, such as search engines and Wikipedia. Plagiarism detection companies are adapting to AI by training their own detection models. One such company, Crossplag, said on Wednesday that after testing about 50 documents that GPT-4 generated, “our accuracy rate exceeded 98.5%.”
OpenAI, Microsoft and Nvidia partnership
OpenAI got a big boost when Microsoft said in February that it uses GPT technology in its Bing search engine, including chat features similar to ChatGPT. On Tuesday, Microsoft said it uses GPT-4 for its Bing work. Together, OpenAI and Microsoft form one major search threat for Googlebut Google also has its own technology for large language models, including one chatbot named Bard which Google is privately testing.
Also on Tuesday, Google announced that it will begin limited testing itself AI technology to boost Gmail email writing and Google Docs word processing documents. “With your collaborative AI partner, you can continue to refine and edit and get more suggestions as needed,” Google said.
That wording reflects Microsoft’s “co-pilot” positioning of AI technology. It is a common view to call it a tool for human-led work, given the problems of the technology and the need for careful human oversight.
Microsoft uses GPT technology both to evaluate the queries people type into Bing and, in some cases, to provide more comprehensive, conversational answers. The results can be much more informative than those of previous search engines, but the more conversational interface that can be called as an option has had issues that make it look unhinged.
To train GPT, OpenAI used Microsoft’s Azure cloud computing service, including thousands of Nvidia’s A100 graphics processing units, or GPUs, pooled together. Azure can now use Nvidia’s new H100 processors, which include specific circuits to speed up AI transformer computations.
AI chatbots everywhere
Another major language model developer, Anthropic, also unveiled an AI chatbot called Claude on Tuesday. The company, which counts Google as an investor, opened a waiting list for Claude.
“Claude is able to perform a wide variety of conversational and word processing tasks while maintaining a high degree of reliability and predictability,” Anthropic said in a blog post. “Claude can help with use cases including summaries, search, creative and collaborative writing, Q&A, coding and more.”
It’s one of a growing crowd. Chinese search and technology giant Baidu is working on a chatbot called Ernie Bot. Meta, the parent company of Facebook and Instagram, consolidated its AI business into a larger team and plans to build more generative AI into its products. Even Snapchat is joining in with a GPT-based chatbot called My AI.
Expect more refinements in the future.
“We’ve been doing the initial training of GPT-4 for a while, but it’s taken a lot of time and a lot of work to feel ready to release it,” Altman tweeted. “We hope you enjoy it and we appreciate feedback on its shortcomings.”
Editor’s note: CNET uses an AI engine to create a number of personal financial statements that are edited and fact-checked by our editors. For more, see this post.