ChatGPT can now speak, listen and see images

Published by Victoria Kyle at September 26, 2023

OpenAI collaborated with professional voice actors to train the models to speak.

The realm of generative artificial intelligence (AI) is garnering increased attention as OpenAI discloses the launch of GPT-4V, a vision-enabled model, alongside enhanced multimodal conversational capabilities for its ChatGPT system.

Unveiled on September 25, the new enhancements permit ChatGPT users to engage in more dynamic conversations with the chatbot. The AI models fueling ChatGPT, namely GPT-3.5 and GPT-4, now possess the capability to comprehend plain spoken language inquiries and provide responses in one of five unique voices. As highlighted in a blog post by OpenAI, this innovative multimodal interface will empower users to interact with ChatGPT in previously unavailable ways.

Users can now capture images of landmarks and engage in live discussions about them, take pictures of their fridge and pantry to brainstorm dinner ideas, or help children with math problems by snapping a photo and receiving interactive guidance. The advanced version of ChatGPT is slated to be available to Plus and Enterprise users on mobile platforms within the forthcoming two weeks, with subsequent availability for developers and other users anticipated shortly after.

This new release by OpenAI coincides with the debut of DALL-E 3, the organization’s highly advanced image generation system, which also incorporates natural language processing to allow seamless interaction with users for refined results and integrated assistance from ChatGPT for image prompt creation.

In related news, Anthropic, a competitor of OpenAI, has announced a collaboration with Amazon on September 25. Amazon has committed to an investment of up to $4 billion to encompass cloud services and hardware access. Anthropic, in return, pledges to amplify support for Amazon’s Bedrock foundational AI model, ensuring secure model customization and meticulous fine-tuning for various businesses.