Seeing as GPT-3 is great at faking info, you should lean into that, says Microsoft

Microsoft is aggressively pushing OpenAI’s artificial intelligence technology into seemingly every nook and cranny of its universe.

Thanks to the Windows giant’s fusion-fueled PR engine, everyone now knows Microsoft’s Bing search engine is experimenting with using a cousin of OpenAI’s ChatGPT large language model to answer queries, and the IT titan hopes to inject this kind of machine learning into everything from Teams and Skype to Excel.

Given the billions of dollars the Windows developer has already sunk into OpenAI and the billions of more yet to come into this guess-o-tron, it makes sense that Microsoft wants to get some immediate returns on its massive investment.

The enterprise software slinger also hopes OpenAI’s sentence-predicting tech will help it trample over rivals including Google in the rapidly emerging AI search bot space.

This week, the cloud giant is trying to woo developers and data analysts to GPT-3, the latest iteration of OpenAI’s auto-regressive language model that uses deep learning to predict human-like text responses to queries, by arguing they can use it to quickly generate fake data for testing within Spark when using the Azure Synapse data analytics service.

After all, if GPT-3 is good for one thing in particular, it’s making stuff up and creating false realities.

GPT-3 “can understand text and generate new text based on that input,” Lee Stott, a principal cloud advocate manager at Microsoft, reminded us over the weekend. “By leveraging the prompts available through OpenAI, it is possible to generate a dataset that can be used for testing purposes.”

Inventing information for testing, rather than using production-grade data about actual people and things, is a fairly manual operation that not only involves collecting data but also suitably cleaning it, according to Microsoftie Thomas Costers, a cloud solution architect for data and AI. If you’re building a feature for an online banking app, say, you ideally want your developers and testers wrangling made-up account info rather than copies of people’s actual financial data, for privacy, regulatory, and ethical reasons.

In a video, Costers said he typically would search a company’s data and find datasets on the internet to generate testing data. Such data “is not perfect, it’s not clean, it doesn’t really suit your needs,” he said.

In the video, he and Stijn Wynants, a FastTrack engineer at Microsoft, demonstrated how to use GPT-3 to not only find and clean data for testing – in the demo, information about people’s restaurant reviews – but also how to generate code to use it and ensure it works with other data already pulled together by colleagues.

“We can now just generate random test data to use in our environments – just generated it using that GPT-3 – and we can even make relational data that makes connections between those dataframes that you already have and just create random test data to test your solutions in a safe and secure way,” Wynants said.

While Microsoft is aggressively banging the drum for OpenAI’s technology, there are bugs and quirks that need to be worked out of AI technologies. Most recently, OpenAI this month outlined how it plans to improve ChatGPT’s performance and shape its behavior. Google also has had its share of AI headaches.

Then there are the growing reports of miscreants trying out ChatGPT to create their own malicious code, worries about the tech being used to pump out massive amounts of spam and misinformation, and so on and so forth. ®

Source: https://go.theregister.com/feed/www.theregister.com/2023/02/27/microsoft_ai_gpt3_data/

Seeing as GPT-3 is great at faking info, you should lean into that, says Microsoft

Welcome to

Accessibility Dashboard