If you need a break from bank failure news, here’s something refreshing. OpenAI’s GPT-4 was released yesterday. The new model is the successor to GPT-3.5-turbo and promises to produce “safer” and “more useful” responses. But what does that mean exactly? And how do the two models compare?
We’ve broken down six things to know about GPT-4.
Processes both image and text input
GPT-4 accepts images as inputs and can analyze the contents of an image alongside text. As an example, users can upload a picture of a group of ingredients and ask the model what recipe they can make using the ingredients in the picture. Additionally, visually impaired users can screenshot a cluttered website and ask GPT-4 to decipher and summarize the text. Unlike DALL-E 2, however GPT-4 cannot generate images.
For banks and fintechs, GPT-4’s image processing could prove useful for helping customers who get stuck during the onboarding process. The bot could help decipher screenshots of the user experience and provide a walk-through for confused customers.
Less likely to respond to inappropriate requests
According to OpenAI, GPT-4 is 82% less likely than GPT-3.5 to respond to disallowed content. It is also 40% more likely to produce factual responses than GPT-3.5.
For the financial services industry, it means using GPT-4 to power a chatbot is less risky than before. The new model is less susceptible to ethical and security risks.
Handles around 25,000 words per query
OpenAI doesn’t measure its inputs and outputs in word count or character count. Rather, it measures text based on units called tokens. While the word-to-token ratio is not straightforward, OpenAI estimates that GPT-4 can handle around 25,000 words per query, compared to GPT-3.5-turbo’s capacity of 3,000 words per query.
This increase enables users to carry on extended conversations, create long form content, search text, and analyze documents. For banks and fintechs, the increased character limit could prove useful when searching and analyzing documents for underwriting purposes. It could also be used to flag compliance errors and fraud.
Performs higher on academic tests
While ChatGPT scored in the 10th percentile on the Uniform BAR Exam, GPT-4 scored in the 90th percentile. Additionally, GPT-4 did well on other standardized tests, including the LSAT, GRE, and some of the AP tests.
While this specific capability won’t come in handy for banks, it signifies something important. It highlights the AI’s ability to retain and reproduce structured knowledge.
While GPT-4 was just released yesterday, it is already being employed by a handful of organizations. Microsoft, for example has been using GPT-4 to power its Bing chatbot since it launched in February. Be My Eyes, a technology platform that helps users who are blind or have low vision, is using the new model to analyze images.
The model is also being used in the financial services sector. Stripe is currently using GPT-4 to streamline its user experience and combat fraud. And J.P. Morgan is leveraging GPT-4 to organize its knowledge base. “You essentially have the knowledge of the most knowledgeable person in Wealth Management—instantly. We believe that is a transformative capability for our company,” said Morgan Stanley Wealth Management Head of Analytics, Data & Innovation Jeff McMillan.
Still messes up
One very human-like aspect of OpenAI’s GPT-4 is that it makes mistakes. In fact, OpenAI’s technical report about GPT-4 says that the model is sometimes “confidently wrong in its predictions.”
The New York Times provides a good example of this in its recent piece, 10 Ways GPT-4 Is Impressive but Still Flawed. The article describes a user who asked GPT-4 to help him learn the basics of the Spanish language. In its response, GPT-4 offered a handful of inaccuracies, including telling the user that “gracias” was pronounced like “grassy ass.”
Photo by BoliviaInteligente on Unsplash