Newzvia

Technology | OpenAI's GPT-5: AI Now Understands Text, Images, And Sound Together

Pankaj Mukherjee, Senior Technology Correspondent

Pankaj Mukherjee

Senior Technology Correspondent · AI, startups & MeitY policy

3 min read

Quick summary

OpenAI has significantly updated its GPT-5 AI model, allowing it to seamlessly understand and generate content across text, images, and audio. This advancement promises new tools for developers and businesses, including those in India, in the coming weeks.

The next jump in AI capability just arrived. OpenAI, the company behind ChatGPT, today announced a major update to its GPT-5 model. This isn't just a small tweak; it’s a significant shift in how AI works.

The core of this advancement is what’s called 'multimodal' capability. Until now, AI models often specialised: one for text, another for images. Think of it like a human who can only read or only see. With GPT-5, the AI can now understand and create using text, images, and audio all at once.

Imagine giving the AI a picture, a song, and some words, and it processes them together. This means the model can now see a photo, hear its description, and then write a story about it, or even create new images and sounds based on your combined input. OpenAI says this update will roll out to developers and enterprise partners in the coming weeks. Developers use APIs (Application Programming Interfaces) to connect such powerful AI tools into their own apps and services.

More Than Just Words

This move makes AI much more powerful and versatile. It goes beyond just generating text replies or simple image edits. For Indian developers, this could unlock new ideas for applications that need to understand the world in a richer way – from smart assistants that truly hear and see, to creative tools that blend different media forms.

Other tech giants are also pushing similar boundaries. Google DeepMind recently launched AlphaFold 3, a new AI model for scientific discovery, aiming to speed up drug research. Adobe, too, has embedded advanced generative AI tools into its Creative Cloud suite, letting users create complex visuals from simple text prompts. OpenAI's GPT-5 update fits into this broader trend of AIs becoming smarter and more integrated into our digital tools.

What We Don't Know Yet

While the 'dramatically improved' claim sounds exciting, the real test will be how it performs in practice. OpenAI has not yet shared specifics on pricing, detailed benchmarks, or a timeline for wider public access beyond developers and businesses. The impact on diverse Indian languages and local contexts also remains to be seen. These are crucial details for Indian startups looking to build on such foundational models.

This update means developers will soon have a much more capable AI canvas. It will be interesting to see what new products and services emerge, especially from India's vibrant tech scene, as they get access to these advanced tools.


  • OpenAI’s GPT-5 model now uses text, images, and audio together.
  • This 'multimodal' AI can understand and create content across these different types.
  • The update will be available for developers and businesses soon.
  • Many details, like specific pricing or public access dates, are still unclear.

Quick questions

What does 'multimodal' mean for GPT-5?
AI processes and generates text, images, and audio concurrently.
How will this help developers?
Yes — Developers can use this API to create richer AI apps that blend various media types, enabling new, more nuanced tools and understandings of the world.
When can common users expect GPT-5?
Still unclear: OpenAI expects developer and enterprise rollout soon. No public release date.
Does this impact India?
Global developers, including those in India, will access these tools, fostering new local AI applications. India-specific details remain unannounced.
Newzvia·11 Jun 2026

Google DeepMind's Gemini Ultra 2.0 AI Model Arrives

Google DeepMind today launched Gemini Ultra 2.0, its most advanced large language model, promising better understanding across text, images, and video. While specific India plans are not yet clear, this update could bring more powerful AI tools to users and developers here.
Read article
Newzvia·8 Jun 2026

Google DeepMind's AlphaFold 4: Mapping Life's Blueprints

Google DeepMind has unveiled AlphaFold 4, an advanced AI model that can predict protein shapes with unmatched accuracy. This breakthrough could dramatically speed up research in medicines and new materials, with potential benefits for India's scientific community.
Read article
Newzvia·4 Jun 2026

OpenAI's GPT-5 Turbo: What it means for developers

OpenAI today launched its GPT-5 Turbo model, promising better AI understanding across text, images, and audio. This update aims to give developers new tools for building smarter applications, with potential impact for India's tech scene.
Read article
Newzvia·1 Jun 2026

Apple's iOS 19: New Look, On-Device AI, and India's Questions

Apple just unveiled iOS 19 for iPhones, bringing a fresh home screen and smarter widgets powered by new on-device artificial intelligence. For Indian users, the true impact and availability details are still awaited as the global tech giant makes its software push.
Read article
Newzvia·30 May 2026

Apple's iOS 19.5: Vision Pro Connects, Privacy Gets Tighter

Apple rolled out iOS 19.5 today, bringing new tools for its Vision Pro headset and stronger privacy checks for Safari and Mail. For Indian iPhone users, this means a step towards future tech, even if Vision Pro isn't here yet.
Read article
Newzvia·26 May 2026

OpenAI's Proton API Promises Smarter AI

OpenAI has released its new 'Proton' API, claiming a major jump in AI's ability to understand text, images, and audio while significantly cutting down on made-up facts. Indian developers will watch closely for details on pricing and how well it handles local contexts.
Read article

More from categories

Business

View all

Technology

View all

Sports

View all