Home News Latest Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Headlines

Anthropic eyes $5B funding round that could push valuation to $170B despite CEO's concerns over ties to authoritarian investors

4 days ago

A Data Analyst’s Guide to Learning from YouTube with AI: How to Extract Insights Efficiently Without Watching Full Videos

14 hours ago

The Unfulfilled Promise: Why Today’s AI Falls Short of Genuine Intelligence — Part 1 Everyone’s talking about Artificial General Intelligence (AGI) these days. Industry leaders are pouring immense sums into the pursuit of “true AGI” and even “superintelligence.” But here’s my hot take: if we keep going down the same path we’re on, this goal remains fundamentally out of reach. Current AI systems, while undeniably capable and often astonishing, are still, at their core, statistical models. They’re brilliant at pattern recognition and prediction, but there’s no genuine reasoning happening under the hood. What’s more, compared to the incredible adaptability of the human mind, these models are still relatively limited to the specific tasks they’ve been trained for. Okay, so some folks might say they do reason, but let me ask you: if you build a machine to act exactly like a human, right down to every tiny brain pathway and a gazillion rules, is that true intelligence? Let’s be real for a sec. If you could actually hardcode a machine to behave just like you—every thought, every little twitch, every single reaction—would that really be intelligent behavior, or would it just look smart to anyone watching, like a super sophisticated puppet show? You’ve probably also heard another common statement in AI circles: that humans “learn from fewer samples than machines.” It’s an appealing idea, suggesting a kind of inherent efficiency in biological learning. But in the first part of this blog post, I’m going to break down why that’s a myth, and why our understanding of “data” in human learning might be fundamentally flawed. Part 1: The “Fewer Samples” Myth – Why Human Learning Isn’t About Less Data, But Better Data You may have heard this before in AI discussions: “Humans can learn from fewer samples than machines.” While it sounds intuitively appealing, I’m here to offer a hot take: this idea is fundamentally flawed, and clinging to it might be one of the biggest misconceptions holding back the pursuit of Artificial General Intelligence (AGI). We often underestimate the sheer volume and, more importantly, the quality and richness of the information stream we’re immersed in from the moment we’re born. Let’s break down why this comparison misses the mark, and what AI can truly learn from human cognition. The Daily Data Deluge of a Human Being Forget “fewer samples.” We are data processing machines of an incredible scale. Consider these fascinating insights, some of which have been discussed for years and continue to highlight our immense processing capabilities: The human brain processes about 11 million pieces of information per second, but only 40 of those make it to conscious awareness. Our sensory systems are constantly streaming data: vision, hearing, touch, smell, and internal bodily signals. Even at a conservative estimate, the brain receives and filters an estimated 100,000 to 1 million bits of information per second. These figures underscore that far from learning from “fewer” samples, humans are constantly bathed in a torrent of information. A Child’s First Decade: An Ocean of Information Now, let’s extrapolate this to a child’s foundational learning years. While a baby likely isn’t reading “The Hobbit,” the raw, continuous sensory input is still tremendous. Let’s make a very conservative estimate for the data processed by age of 10: Visual Data: Imagine 12 waking hours a day, 365 days a year. Over 10 years, that’s 43,800 hours. Even at a mere 10 “frames” per second (far less than real-time perception), that’s over 1.5 billion visual “samples.” Each isn’t a static image, but a dynamic, multi-faceted scene. Auditory Data: Conservatively, hearing 10,000 words or distinct auditory events daily for 10 years tallies up to 36.5 million auditory “samples.” Tactile, Proprioceptive, Olfactory, Gustatory Data: The constant stream from touch, movement, smell, and taste adds millions more “samples” daily, forming a continuous, embodied sensory experience. The Terabytes of Life Experience Translating this raw, continuous sensory input into digital terms, even with highly conservative estimates: this leads to a very conservative minimum of around 88 Terabytes of raw, integrated sensory data processed by a human by their 10th birthday. And this is just the raw input; it doesn’t account for the complex internal representations and learned models. The Stark Contrast: Human vs. AI Data Now, let’s compare this to the datasets used by even the most advanced AI models. While these models are trained on immense datasets, their scale, especially in terms of integrated, multimodal, and continuously contextualized data, often pales in comparison to human experience: GPT-3: OpenAI’s groundbreaking GPT-3 was trained on hundreds of billions of tokens, which, when filtered and processed, amounted to a dataset size in the range of ~45 TB of text. Llama 3: Meta’s more recent Llama 3 models were pre-trained on an even larger scale, utilizing over 15 trillion tokens. Depending on token encoding, this translates to approximately 60 Terabytes of text data. Other Foundation Models (GPT-4, Claude 3, Gemini): While exact training data sizes for cutting-edge models like GPT-4, Claude 3, and Gemini are often proprietary, industry estimates suggest they are trained on datasets that range from tens to low hundreds of terabytes. While these AI datasets are vast, they are still significantly smaller in sheer volume than the estimated integrated, multimodal data a human processes in their first decade. More importantly, they lack the inherent richness and real-world interconnectedness of human experience. They are primarily text-based, or multimodal in a stitched-together fashion, rather than inherently integrated from the ground up. It’s Not Less Data, It’s Better Processing of Richer Data The core of the “fewer samples” myth lies in a misunderstanding of what a “sample” truly means for a human. We don’t just consume discrete, isolated data points like an image file or a text token. Our learning is characterized by: Multimodal Integration: Our brain seamlessly fuses sight, sound, touch, smell, and taste, creating a holistic understanding of the world. Recent research into “Embodied Multimodal Large Models (EMLMs)” in AI is a direct acknowledgment of this human advantage, aiming to integrate diverse sensory modalities for more robust AI. Contextual & Embodied Learning: Every piece of information is learned within a dynamic, real-world context, directly linked to our physical interactions and consequences. We don’t just “see” an object; we interact with it, understand its properties, and experience its effects. This embodied interaction is a critical component of how we build understanding. Active & Feedback-Driven: Human learning is a continuous loop of experimentation, immediate feedback, and self-correction. We are not passive observers. This aligns with the growing focus in AI on “agentic AI,” which seeks to teach models to behave and adapt based on real-world interactions and expert guidance. Hierarchical & Abstract Reasoning: Beyond mere pattern recognition, we build complex conceptual models, understanding relationships, categories, and abstract principles. This allows us to generalize from fewer novel experiences because we have a robust internal model of how the world works, built on a lifetime of rich data. The True Path to Intelligence: Learning from Biology and Human Cognition The profound takeaway for AI development isn’t that humans learn with less data, but that our biological architecture is uniquely designed to extract vastly more meaning and build incredibly sophisticated representations from the massive, high-quality, multimodal data stream we’re constantly immersed in. Our learning is incredibly efficient per unit of extracted information, not per raw byte. This suggests that the pursuit of Artificial General Intelligence (AGI) won’t simply be achieved by dumping more data and compute into current, fundamentally unimodal or weakly multimodal architectures. The real leap will come when AI systems can: Integrate truly multimodal, high-dimensional, and contextually rich data streams at their core, mimicking the seamless fusion of human senses. Engage in active, embodied learning, with continuous, real-time feedback loops, allowing them to interact with and learn from their environment like a human child. Develop sophisticated symbolic reasoning, abstraction, and the ability to construct internal models of the world, moving beyond statistical correlations to true understanding. It’s not about the quantity of data alone; it’s about the inherent quality, interconnectedness, and the underlying processing architecture that makes human learning so remarkably powerful and adaptable. The next frontier in AI isn’t just bigger models, but fundamentally smarter ways of learning from the world, much like we do.

15 hours ago

Your chats with Meta's AI could end up in Google searches — just like early versions of ChatGPT did before OpenAI reversed course. Meta's standalone MetaAI app lets users share conversations to a public "Discover" feed, which Google can index and display in search results. Even though Meta now warns users that shared chats are public and visible to anyone, many of the conversations still contain personal details like names, emails, and sensitive topics. While the company says it’s working to reduce accidental disclosures, the feature remains active — meaning your AI chat could show up in a Google search, possibly linked to your social media identity. And unlike OpenAI, which stopped indexing shared ChatGPT conversations, Meta has no plans to change that.

15 hours ago

PagerDuty Named a Leader and Outperformer in the 2025 GigaOm Radar for AIOps for Fourth Consecutive Year

15 hours ago

Asana to Release Second Quarter Fiscal Year 2026 Financial Results on September 3, 2025, with Investor Webcast Scheduled

15 hours ago

Apple is set to significantly expand its investments in artificial intelligence, CEO Tim Cook revealed, underscoring the company’s growing commitment to advancing AI capabilities across its products and services. Speaking in a recent interview, Cook emphasized Apple’s openness to acquiring AI-focused companies as a way to accelerate innovation and strengthen its technological edge. He also addressed broader economic concerns, including the impact of Trump-era tariffs, noting that while trade policies remain a challenge, Apple continues to adapt its global supply chain strategies to maintain efficiency and competitiveness. The comments come as Apple intensifies its efforts to integrate AI more deeply into its ecosystem, from on-device machine learning to enhanced personal assistant features, signaling a pivotal shift in the company’s long-term vision.

16 hours ago

Tesla ordered to pay $329 million in damages after fatal Autopilot crash, jury rules

16 hours ago

OpenAI Warns Students Against Using ChatGPT as an 'Answer Machine' and Pushes for Productive Struggle in AI Education

16 hours ago

Mastering NLP with spaCy – Part 2: Unlocking the structure of language through Part-of-Speech tagging, dependency parsing, and Named Entity Recognition to understand how words function, relate to one another, and represent real-world entities in text.

16 hours ago