Generative Artificial Intelligence

Generative AI

Let us first ground ourselves and intuitively understand why GenAI is such a big deal? AI (machine learning/predictive models) have been around for so long. Why now? What has changed? Well, why did ChatGPT grow to 100 million users in 2 months ? Few reaons: It uses natural language and is so easy to use. It’s like ChatGPT understands what you want and then generates new content to answer your question. The UI is multi-modal, meaning input-output could be in text, image, charts, video or audio. It understands several languages. It uses it's inbuilt knowledge and reasoning power to fulfil your request.

AI reaching or exceeding Human Level

According to Stanford University’s AI Index report, AI is reaching or surpassing human performance on key benchmarks, like image classification, math, multi language understanding , visual reasoning, English understanding and more. See the diagram below. The impact of this is profound. AI will be able to assist or automate tasks in which these coginitive skills are needed.

GenAI for Society

GenAI is revolutionizing society by enhancing creativity and problem-solving in unprecedented ways. ChatGPT with audio/vision and Gemini 2.0 can hear and see what you see. And can help you with the world around you. GenAI will mold life as we know it. Personalized education tools, AI-driven healthcare for diagnostics or just a everyday helper, GenAI is going to change the way we live.

GenAI for Business

GenAI is a catalyst for innovation, efficiency, and competitive advantage. Businesses can leverage AI for deep data analysis, personalized customer experiences, meaningful content creation. Agentic AI systems represent the next frontier, as they can autonomously handle end-to-end workflows, act as virtual agents for operations, and optimize supply chains with minimal oversight.

Let us dig deeper now. ChatGPT is the User Interface layer. Under the hood is a LLM (Large Language Model) like GPT-4o, which is a Frontier Model or Foundation Model . Examples of Frontier models are GPT-4o (OpenAI), LLaMA (Meta), Claude (Anthropic), etc. Frontier models are cutting-edge AI systems designed to perform advanced tasks across multiple domains, combining capabilities like deep knowledge, analytics, reasoning, and multi-modal understanding (e.g., text, images, and data). They can generate human-like text, create visuals, analyze complex data, solve scientific problems, and interact intuitively with users. These models can be used in diverse applications, including content creation, automation, data transformation, decision-making,etc. and are transforming multiple industries already.

As seen above, LLMs are advancing fast and exceeding or nearning human capability in a variety of tasks. This cognitive intelligence will be put in all applications in the next decade. You need to deeply understand this and start learning AI, how to apply these models in your craft. So to summarize

AI (predictive models/machine learning)

AI (predictive models) was all about perception, prediction, detection or recognition. Given a specific type of input, they produce a specific type of result. They can predict needs, behaviours, risks or preferences. Given a image, the predictive model can detect the object in the image. Given a datapoint, it can detect a anomaly. These purpose specific algorithims are smaller in size than LLMs and are generally dedicated to one specifc task

Generative AI (large models)

Generative artificial intelligence is powered by large size foundation models. Foundation models are general capability models of language, vision, and/or other modalities that are built to support a large variety of AI tasks. You can interact with multi-modal LLMs using natural langauge, speech or video. The AI will understand your request and will generate new content in the form of sentences, images, charts, tables, songs, videos or even proteins. Any knowledge that can be digitized and has a pattern, LLM can learn it. And generate it. It could be videos, physics, proteins, chemicals, songs, etc.

Key Takeaways

We have now arrived not at the AI Era, but a Generative AI Era. Multi-modal LLMs can understand natural langauge, speech or video. The models are versatile tools, and their capabilities can be mixed & matched to use in an array of applications

Document or image summarization
Language translation
Data analysis
Extract knowledge via Q&A from hundreds of documents
Virtual assistant
Language translation
Personal tutor (for math's, physics, programming, etc.)
Radiology
Coding assitants like GitHub

Based on transformer architecture LLM models are giants and can learn to understand human knowledge without supervision & without labelled datasets. LLM models can learn/understand patterns and representation of any sequence be it language, protein, biology, chemistry, etc. LLMs can be multi-modal and so can be used in endless possible applications

Models can be tuned to perform tasks for which they were never trained on
LLMs are excellent few-shot learners. Using prompt engineering you can guide them to fulfil your request in real-time.