Home Blog Page 2

The Stanford AI Index 2026 Is Out. The Capability Numbers Are Historic. The Trust Numbers Are a Crisis.

Stanford AI Index 2026 report

Stanford HAI dropped its 2026 AI Index this morning — 423 pages, nine years of independent data, no lab PR budget behind it. If you only read one document this year to understand where AI actually stands, this is it.

Here’s what the report actually says, past the headlines.

Capabilities are Accelerating, Not Plateauing

Every pundit who called peak AI in 2025 was wrong. The AI industry produced over 90% of notable frontier models in 2025 alone. On SWE-bench Verified, coding performance jumped from 60% to near 100% of the human baseline in a single year. On Humanity’s Last Exam — questions designed by subject-matter experts to represent the hardest problems in their fields — the top score was 8.8% in 2025. It’s now 38.3%, with the best models as of April 2026 crossing 50%.

Organizational adoption reflects this. AI adoption has reached 88% in the tech industry, and 4 in 5 university students now use generative AI.

The Transparency Collapse Nobody is Talking About

Here’s the number that should be the story: the Foundation Model Transparency Index dropped from 58 to 40 this year, with the most capable models disclosing the least. Google, Anthropic, and OpenAI have all abandoned the practice of disclosing their latest model’s dataset sizes and training duration. Eighty of the 95 most notable models launched last year were released without their training code.

The labs have made a deliberate choice: as the models get more powerful, they get less legible. This isn’t a side effect. It’s a competitive strategy.

Also Read: AI Didn’t Kill Writing. It Killed Coding.

The US-China Gap is Nearly Gone

In early 2023, OpenAI had a clear lead with ChatGPT. As of March 2026, Anthropic leads, trailed closely by xAI, Google, and OpenAI. Chinese models like DeepSeek and Alibaba lag only modestly. The US still outputs more top-tier models and higher-impact patents, but China leads in total patent output, model publication volume, and industrial robot installations.

US private AI investment reached $285.9 billion in 2025 — more than 23 times China’s $12.4 billion. And yet the performance gap is measured in single-digit percentage points. That should alarm every American policymaker.

The Talent Cliff

The number of AI researchers and developers relocating to the US has dropped 89% since 2017, with an 80% decline in the last year alone. The US is spending more on AI than any country in history while making itself less attractive to the people who build it. That’s a structural problem no amount of compute spending fixes.

The Public is Not Coming Along for the Ride

Only 10% of Americans say they’re more excited than concerned about AI in daily life. Meanwhile, 56% of AI experts believe it will have a positive impact on the US over the next 20 years. The US also reported the lowest trust in its government to regulate AI among surveyed countries, at 31%.

Employment for software developers aged 22 to 25 has fallen nearly 20% since 2022, and a third of organizations expect AI to shrink their workforce. The industry keeps pointing to benchmark scores. The public is looking at their job offers.

The Bottom Line

The 2026 AI Index is not a victory lap. It’s a stress test. The capabilities are real, the investment is real, the adoption is real. But the transparency is gone, the talent is leaving, and the public trust that makes any of this socially sustainable is at a low. Stanford’s data doesn’t editorialize. It doesn’t have to.

Advertisement

AI Didn’t Kill Writing. It Killed Coding. Here’s Why That Was Always Inevitable.

Karpathy AI coding writing capability gap

Everyone had the same prediction. Writing would be the first casualty of the AI revolution. It was the obvious call: language models generate text, writers generate text, therefore writers go first. Clean logic. Wrong conclusion.

Andrej Karpathy, co-founder of OpenAI and one of the most credible voices on AI capability, published a thread on April 10 that explains precisely why this prediction failed. And his explanation is worth sitting with, because it reframes how we should think about where AI is actually winning.

The Perception Gap Nobody is Talking About

Karpathy’s first observation is about how unevenly AI capability is understood. A large group of people tried the free tier of ChatGPT at some point, saw it fumble basic questions, laughed at the hallucinations, and moved on. That experience formed their worldview on what AI can do.

A much smaller group uses frontier agentic tools like OpenAI Codex and Claude Code professionally, in technical domains, every day. They are watching these models restructure entire codebases, solve problems that would have taken weeks, and do it in an hour. For this second group, the experience borders on disorienting.

These two groups are not disagreeing about AI. They are talking about completely different products and calling them by the same name.

Why Coding Went First, Not Writing

The deeper point in Karpathy’s thread is structural. Coding has verifiable reward functions. When AI writes code, you can run a unit test. The test passes or it fails. That binary signal is exactly what reinforcement learning needs to improve. You can train a model on billions of correct and incorrect outcomes, and the gradient knows which direction to move.

Writing has no equivalent. There is no unit test for a good sentence. There is no pass/fail for whether a paragraph is compelling. Human judgment is the only signal, and human judgment is expensive, inconsistent, and impossible to scale the way automated tests are. RL cannot hillclimb on “this story resonated emotionally.” So it doesn’t.

Add to this the economic reality Karpathy points out: coding is where the B2B value is. The largest fraction of every AI lab’s engineering team is focused on improving coding performance because that is where enterprise revenue comes from. Writing improvements are slower, harder to measure, and lower on the priority list. Not because writing doesn’t matter, but because the incentive structure points elsewhere.

Also Read: How AI Is Replacing Personal Note-Taking

What This Means for Writers

The conclusion some will draw is that writing is safe. That’s the wrong takeaway. AI writing tools have gotten genuinely good at surface-level tasks: summarizing, drafting, formatting, rephrasing. Those tasks are being automated, and that pressure is real.

What is not being automated, and what is structurally difficult to automate, is the thing that makes writing worth reading: voice, narrative arc, the particular way a writer sees the world and chooses what to leave out. That is not surviving because it is artistically special. It is surviving because it cannot be scored. You cannot write a loss function for a great essay the way you can write a test suite for a working API.

This distinction matters. Writers who understand it will focus on developing exactly the capabilities that resist automation: original perspective, specific observation, structural judgment. Writers who miss it will compete against models on tasks the models are already winning.

The Real Lesson from Karpathy’s Thread

The gap in understanding AI capability is not just about technology. It’s about access. The people who have genuinely seen what AI can do are paying $200 a month and using it in highly technical environments. Most people are not. Their mental model of AI is outdated by roughly two to three product generations.

That gap is closing as agentic tools become more visible. But until it does, we will keep having the wrong conversation: debating whether AI is overhyped based on a version of the technology that no longer exists.

The capability is real. It is just unevenly distributed, and unevenly understood.

Advertisement

The AI Boom Is Real — and TSMC’s Record $35.7 Billion Quarter Is the Proof

TSMC Q1 2026 revenue AI chip demand

Everyone is watching the model wars. The benchmark releases. The demo videos. The CEO posts on X. But if you want to know whether the AI boom is actually real — not hyped, not narratively convenient, actually real — you need to read a supply chain, not a press release.

TSMC just gave you one.

Taiwan Semiconductor Manufacturing Company reported Q1 2026 revenue of $35.71 billion, up 35% year over year, beating analyst forecasts and landing at the top of its own guidance range. March alone came in at 45.2% growth year over year, the strongest single month of the quarter. Every dollar of that outperformance came from AI chip demand.

Why TSMC Is the Most Honest Signal in AI

TSMC doesn’t sell narratives. It sells wafers. When Nvidia, AMD, Apple, Google, and Amazon need chips built, they go to TSMC. The company fabricates roughly nine out of every ten advanced AI accelerators on the planet. Its advanced 3-nanometer and 5-nanometer process technologies, critical for energy-efficient AI accelerators, accounted for a growing share of wafer revenue, with gross margins expanding on premium pricing for cutting-edge nodes.

Smartphone and PC end markets took a hit in Q1 due to memory shortages. But the AI segment carried the entire semiconductor industry. That’s not a talking point from a model lab. That’s revenue data from the company building the physical infrastructure every AI product runs on.

The Infrastructure War Nobody Is Talking About

Here’s what makes the TSMC numbers even more significant: they arrived the same week that multiple major AI players moved to reduce their dependence on outside chip suppliers.

Elon Musk’s Terafab project — a joint venture between Tesla, SpaceX, and xAI — announced on March 21, 2026, targets 1 terawatt of annual AI compute capacity from a vertically integrated facility in Austin, Texas, with Intel joining on April 7 to contribute manufacturing expertise. The project carries a $20–25 billion price tag for its pilot phase.

Meanwhile, Reuters reported on April 9 that Anthropic is internally evaluating whether developing proprietary silicon could make sense for its future AI systems, including the Claude family of models. Designing an advanced AI chip could cost roughly half a billion dollars, a significant bet, but one that reflects the strategic logic of controlling your own compute at scale. Meta and OpenAI already have similar chip projects underway.

Every major AI lab, in the same week TSMC posted a record quarter, is racing to own silicon. That convergence is not a coincidence. It is the chip bottleneck becoming the defining constraint of AI’s next phase.

Also Read: Meta Might Actually Pull Off This AI Comeback

What This Means

The AI boom is real. The proof isn’t a benchmark score or a demo video. It’s $35.7 billion in quarterly revenue from the company that physically manufactures the hardware the boom runs on. It’s Nvidia booking TSMC’s most advanced packaging capacity through 2027. It’s Anthropic, OpenAI, Meta, and Musk all independently concluding that depending on someone else for chips is a risk they can no longer accept.

Everyone is asking which AI model will win. The smarter question is: who controls the hardware to build and run it? Right now, there’s only one answer — and it’s a company in Taiwan that most people in the AI conversation aren’t watching closely enough.

TSMC’s full Q1 earnings call is scheduled for April 16, where the company is expected to update its full-year guidance. Watch for any signals on capacity expansion and 2nm node timelines — that’s where the next chapter of this story gets written.

Advertisement

Meta Might Actually Pull Off This AI Comeback. And the Timing Has Nothing to Do With Luck

Meta Muse Spark AI comeback

For most of the past year, writing about Meta’s AI strategy meant writing about disappointment. Llama 4 underwhelmed when it launched in April 2025. The benchmark numbers turned out to be inflated. Developers moved on to OpenAI’s Codex and Anthropic’s Claude Code. Meta spent tens of billions of dollars on infrastructure and talent and had very little to show for it. The dominant question was whether Mark Zuckerberg’s $14.3 billion bet on Alexandr Wang and Scale AI was going to be remembered as a strategic masterstroke or as one of the most expensive corporate mistakes in tech history.

Yesterday, Meta released Muse Spark — the first model out of Meta Superintelligence Labs and Wang’s first deliverable as Chief AI Officer. The model is good. Not the best in the world, and Meta isn’t claiming otherwise, but legitimately competitive with frontier systems from OpenAI, Anthropic, and Google. And it’s free, with no tiered pricing, currently live on the Meta AI app and meta.ai in the US, with rollout to WhatsApp, Instagram, Facebook, and Messenger coming in the next few weeks.

On its own, this would be a solid product launch story. But Muse Spark didn’t arrive in a vacuum. It arrived in the middle of a much larger industry shift, one that’s been building for months, that suddenly makes Meta’s approach look smarter than it had any right to look.

This is an argument for why the $14.3 billion bet might actually work.

The Industry Has Been Quietly Getting More Expensive for a While

Start with what’s happening to the rest of the AI market. The narrative most people carry around — that AI is getting cheaper and more accessible — was true in 2024. It has been getting steadily less true throughout 2025 and into 2026.

Anthropic has been the most visible example. The company has been tightening access to Claude in stages for several months. In February 2026, Anthropic reaffirmed an existing policy forbidding the use of third-party harnesses with Claude subscriptions. In late March, Anthropic changed how subscription usage was calculated so customers burned through their limits faster during peak hours. On April 4, the company moved from policy warnings to billing-based enforcement: subscribers can no longer use their Claude subscription limits for third-party harnesses, including OpenClaw, and instead need to pay through a pay-as-you-go option billed separately from the subscription. The restriction will extend to all third-party harnesses in the coming weeks.

The technical reasoning Anthropic offered is sound. Claude’s first-party tools are optimized for prompt cache reuse. Third-party harnesses bypass those efficiencies, which creates outsized infrastructure strain. Boris Cherny, head of Claude Code at Anthropic, wrote on X that the company’s subscriptions weren’t built for the usage patterns of these third-party tools, and that capacity needs to be managed thoughtfully. From a margin standpoint, this is rational. For one reporter, a $20 monthly Claude subscription enabled about $236 of token usage in March, with others reporting ratios as skewed as 36x when comparing price paid to list price value. Anthropic was bleeding money on heavy users, and it stopped.

Also Read: Google Just Made AI Free on Your Phone — No Internet Needed

But from a developer’s standpoint, the practical impact is the same regardless of the reasoning: a workflow that used to cost $20 a month now costs hundreds. And Anthropic has been clear that more restrictions are coming.

OpenAI is moving in the same direction, just with a different tactic. Today, April 9, 2026, OpenAI introduced a new $100 per month ChatGPT Pro tier, sitting between the $20 Plus plan and the existing $200 Pro plan. The new tier offers 5x more Codex usage than the Plus plan, with OpenAI making no bones that this new pricing tier is to challenge Anthropic, which has long had a $100 per month option for Claude. The free ChatGPT tier now includes ads. And alongside the new $100 plan, OpenAI confirmed it is rebalancing Codex usage on the $20 Plus plan to support more sessions throughout the week, rather than longer sessions in a single day. In other words, the cheap plan got worse, and a new more expensive plan appeared to capture users who can’t function on the cheap plan anymore.

This is not an isolated decision. OpenAI also updated Codex pricing on April 2, 2026, moving from per-message pricing to API token-based pricing for Plus, Pro, Business, and new Enterprise customers. The direction of travel is consistent. AI is becoming more usage-metered, more tier-stratified, and more expensive at the high end.

Google has been doing similar things with Gemini access, layering paid subscriptions over what used to be free features and pushing heavy users toward enterprise contracts.

Step back from any individual change and the pattern is unmistakable. The frontier AI labs are all under pressure to monetize. They have raised enormous amounts of money at enormous valuations, and the path to justifying those valuations runs through extracting more revenue per user. Free tiers are getting worse. Paid tiers are getting more expensive. Power users are getting capped or pushed to higher plans. The all-you-can-eat era of AI is ending, and it’s been ending in slow motion for several months.

Meta Is Walking In the Opposite Direction

This is the context Muse Spark arrived in. And it’s why the launch matters more than the model itself.

Meta is not under the same pressure as OpenAI or Anthropic. It does not need to make money from AI directly. Meta’s advertising business generated over $164 billion in revenue in 2025. Muse Spark is not a product that needs to monetize. It is a feature that makes Meta’s existing products — WhatsApp, Instagram, Facebook, Messenger — more engaging, more useful, and better at capturing the kind of intent and behavior data that feeds ad targeting. Every conversation a user has with Meta AI inside WhatsApp is, from Meta’s perspective, both a user benefit and a data point that improves the actual business.

That structural advantage changes what kinds of decisions Meta can make. Meta can ship a frontier-competitive AI model to billions of users for free, with no usage caps and no tiered pricing, indefinitely, without ever needing to convert any of those users to a paid plan. OpenAI and Anthropic cannot do that. Their entire business depends on making AI users into AI customers.

And Meta can afford the compute. The company is projected to spend $115 billion to $135 billion on AI infrastructure in 2026, roughly double its 2025 spending. That number sounds insane in isolation. It looks more rational when you consider what it actually buys: the ability to run a free, frontier-class model at the scale of WhatsApp, indefinitely, as a feature of the ad business rather than as a product that has to stand on its own.

This asymmetry has existed for a while. What changed yesterday is that Meta finally has a model good enough to make the asymmetry matter.

The Model Itself Is Better Than the Headlines Suggest

Most coverage of Muse Spark has focused on what it can’t do. The model trails GPT-5.4 and Claude Opus 4.6 on coding benchmarks. It scores 42.5 on ARC AGI 2 against Gemini 3.1 Pro’s 76.5 on abstract reasoning. Meta itself acknowledges these gaps. The headline takeaway from a lot of analysts has been that Muse Spark is a credible second-tier model that doesn’t redefine the frontier.

That framing misses what’s interesting. On the benchmarks where Muse Spark wins, it doesn’t win narrowly; it wins by enormous margins. On HealthBench Hard, Muse Spark scored 42.8, roughly triple Gemini 3.1 Pro’s 20.6 and Claude Opus 4.6’s 14.8. Meta collaborated with over 1,000 physicians on the training data, and the result is the most capable medical reasoning model anyone has shipped. On CharXiv Reasoning, figure and chart understanding, Muse Spark scored 86.4, ahead of GPT-5.4’s 82.8 and Gemini 3.1 Pro’s 80.2. On GPQA Diamond, the PhD-level scientific reasoning benchmark, it scored 89.5, in legitimate frontier territory.

On the independent Artificial Analysis Intelligence Index, Muse Spark scored 52, placing it fourth overall behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Llama 4 Maverick scored 18 on the same index. That is a roughly 3x improvement in nine months, achieved by a team that rebuilt Meta’s entire AI stack from scratch.

And then there is the efficiency story, which is the part that actually matters for what Meta is about to do. Meta says Muse Spark matches Llama 4 Maverick’s capabilities with over 10x less compute. That single number is what makes the free, mass-distribution strategy economically possible. A model that costs an order of magnitude less to run is a model you can give away to billions of people without bankrupting the ad business that funds it.

Wang’s first model wasn’t designed to win every benchmark. It was designed to be efficient enough to deploy at WhatsApp scale, and good enough at the things Meta’s users actually do — health questions, visual understanding, shopping research, casual reasoning — to be genuinely useful inside the apps people already use. On those terms, it succeeded.

The Distribution Path Is the Real Story

Muse Spark is currently live on the Meta AI app and meta.ai website in the US. In the 24 hours after launch, the Meta AI app jumped from #57 to #5 on the US App Store. Sensor Tower estimated roughly 46,000 US downloads on launch day alone. Those are good numbers for a standalone app launch. They are not the actual play.

The actual play is the rollout to WhatsApp, Instagram, Facebook, Messenger, and Ray-Ban AI glasses, which Meta has confirmed is coming in the next few weeks. When that rollout completes, a frontier-competitive AI model will be sitting inside the apps that roughly 3 billion people open multiple times every day. WhatsApp alone has over 2 billion monthly users. No other AI company has anything remotely close to this distribution path.

For comparison: OpenAI’s ChatGPT has approximately 400 million monthly active users. Anthropic’s Claude has a small fraction of that. Even Google’s Gemini, despite being integrated across Search and Android, doesn’t have the same kind of habitual engagement that WhatsApp commands. Distribution at Meta’s scale isn’t a marketing advantage. It’s a structural feature of the business that the AI labs cannot replicate without somehow building their own consumer messaging platforms first.

The right way to think about this is not “Meta launched a chatbot.” It’s “Meta is about to add a free, frontier-competitive AI to every WhatsApp chat on the planet, in the same window that Anthropic and OpenAI are making their chatbots more expensive to use.”

Why This Is the Right Window for Meta

This is the window where a Meta Muse Spark AI comeback stops sounding like wishful thinking

Pull these threads together. The frontier AI labs have spent the past several months tightening access. Anthropic has been steadily restricting how subscribers can use Claude. OpenAI has been reshuffling pricing tiers and introducing more expensive plans for the use cases that used to be covered by cheaper ones. Free tiers are getting worse across the board. Heavy users are being pushed onto plans that cost five to ten times what they used to pay.

This is happening because OpenAI and Anthropic are running the business they’re supposed to run. They are pure-play AI companies. They have to extract revenue from their models to justify their valuations and fund their compute. None of these decisions are villainous. They are the entirely rational consequences of a business model that depends on monetizing inference.

But they create a gap. A real one. There is now a substantial population of AI users, like casual consumers, students, small businesses, and people in markets where $20 a month is meaningful money, who are about to find their existing tools getting more expensive or more restricted. These are not the highest-value AI customers. They are not the developers running OpenClaw at $1,000 a day in inference costs. But there are billions of them, and they are exactly the population that Meta is structurally positioned to serve.

If you are a WhatsApp user in India, or Brazil, or Indonesia, or any of dozens of markets where Meta dominates messaging, the calculus is about to look very different. A frontier-competitive AI assistant will be sitting inside the app you already use, for free, with no signup, no payment, no usage limit. That is not the same product as ChatGPT Pro for $100 a month. It doesn’t need to be.

Meta has been waiting for a moment when the rest of the industry made distribution and free access look attractive again. The industry just provided that moment.

What Could Still Go Wrong

This is the optimistic case. It is not a guaranteed outcome.

Meta has been caught manipulating benchmark results before. After Llama 4 launched, the company was found to have published benchmark scores from a fine-tuned variant that wasn’t the model actually available to users. Independent verification of Muse Spark’s benchmarks is still in early days. If those numbers don’t hold up under scrutiny, this becomes the second consecutive Meta AI launch the company can’t defend, and the comeback story dies before it starts.

The coding gap is also genuinely a problem. OpenAI’s Codex has over 3 million weekly users and is growing 70% month over month. Anthropic’s Claude Code is the de facto standard for serious developer use. These are the highest-paying customers in the consumer AI market, and Meta does not have a credible product for them. Shipping a free chat assistant to 3 billion casual users is impressive. It is a different business from selling agentic coding tools to developers, and Meta is conceding the more lucrative half of the market.

Meta’s history with consumer AI is also mixed. The Meta AI assistant has existed inside WhatsApp and Instagram for over a year, and adoption has been modest. Distribution is necessary but not sufficient. If users don’t actually engage with Muse Spark inside WhatsApp the way they engage with ChatGPT in a browser, the distribution advantage becomes academic. The Meta AI app is currently #5 on the App Store, but ChatGPT and Gemini are still ahead, and Meta has historically struggled to convert installs into habitual usage for AI products specifically.

And Wang is a single point of failure. Meta’s entire AI strategy is now built around one person who joined nine months ago and has shipped exactly one model. That model is good. The next ones need to be better. AI talent is being poached at hundred-million-dollar pay packages across the industry, and keeping the Meta Superintelligence Labs team intact and productive over the next two years is an open question.

The Bottom Line

Meta’s AI strategy has been a story of expensive disappointment for most of the past year. The release of Muse Spark doesn’t end that story by itself. But it is the first piece of evidence that the Meta Muse Spark AI comeback is starting to look real. Wang’s team has shipped something credible, the efficiency gains make mass distribution economically viable, and the rest of the industry is helpfully drifting toward the kind of paid, restricted, premium-tiered AI that leaves room for a free competitor.

The $14.3 billion question was never whether Wang could build a good model. It was whether Meta could build a good model in time to matter, in a market that was still hospitable to a free, ad-subsidized approach. Muse Spark suggests the answer to the first question is yes. The behavior of OpenAI and Anthropic over the past few weeks suggests the answer to the second question is also, surprisingly, yes.

Meta might actually pull this off. The interesting thing isn’t that Muse Spark exists. It’s that the timing works.

Advertisement

Anthropic Claude Mythos Preview Finds Thousands of Zero-Day Vulnerabilities, Launches Project Glasswing

Anthropic Claude Mythos

Anthropic has announced Claude Mythos Preview, a new frontier AI model with cybersecurity capabilities so advanced that the company decided not to release it publicly. Instead, Anthropic launched Project Glasswing — a defensive coalition that includes Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.

The goal: use Mythos Preview to find and patch critical software vulnerabilities before attackers develop models with similar capabilities.

What Claude Mythos Preview Can Do

During internal testing over the past few weeks, Anthropic’s research team used Mythos Preview to identify thousands of zero-day vulnerabilities — security flaws previously unknown to software developers — across every major operating system and web browser. In many cases, these bugs had survived decades of human code review and millions of automated security tests.

Three examples stand out. Mythos Preview discovered a 27-year-old denial-of-service vulnerability in OpenBSD, one of the most security-hardened operating systems in the world. It found a 16-year-old flaw in FFmpeg’s H.264 codec that automated fuzzers had run over five million times without catching. And it autonomously identified and fully exploited a 17-year-old remote code execution vulnerability in FreeBSD’s NFS server, granting unauthenticated root access — all without human guidance after the initial prompt.

The model’s exploit development rate is what sets it apart. On Firefox JavaScript engine vulnerabilities, Mythos Preview successfully built working exploits 72.4% of the time. Its predecessor, Claude Opus 4.6, managed close to zero percent on the same tasks. On the CyberGym benchmark, Mythos Preview scored 83.1% compared to Opus 4.6’s 66.6%.

Also Read: OpenAI Raised $122 Billion. The Math Still Doesn’t Close

The Emergent Capability Problem

Anthropic emphasized that these offensive capabilities were not explicitly trained. They emerged as a downstream consequence of improvements in coding, reasoning, and autonomous task execution. The same skills that make the model better at writing and fixing code also make it better at breaking it.

This raises a critical question for the AI industry: if one lab’s general-purpose model can accidentally develop elite-level hacking capabilities, how long before another model does the same — potentially without the same safety culture?

Project Glasswing: the Defensive Play

Rather than releasing Mythos Preview commercially, Anthropic is restricting access to Project Glasswing partners and a group of over 40 additional organizations that maintain critical software infrastructure. The company is committing up to $100 million in usage credits for Mythos Preview across these defensive efforts, along with $4 million in direct donations to open-source security organizations.

Partners use the model exclusively to scan and fix vulnerabilities in their own software and open-source projects they maintain. All discovered vulnerabilities go through coordinated disclosure.

Currently, over 99% of the zero-day vulnerabilities discovered by Mythos Preview remain unpatched.

What This Means For The AI Industry

This is the first time a major AI lab has withheld a frontier model from public release specifically because of its offensive cybersecurity capabilities. It sets a precedent — but also exposes a tension. The defensive advantage Anthropic is building through Project Glasswing is real, but it is inherently temporary. As frontier model capabilities continue to advance, similar abilities will likely emerge in competing models.

Anthropic acknowledged this directly: given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors committed to deploying them safely. The race between AI-powered attackers and AI-powered defenders has officially started.

Advertisement

Google Just Made AI Free on Your Phone — No Internet Needed

Google Gemma 4 free on mobile

Paying for ChatGPT or Claude every month just to ask an AI a few questions? Google just made that expense optional. With the launch of Gemma 4 and the Google AI Edge Gallery app, you can now run a powerful LLM — Google Gemma 4 free on mobile — without spending a single dollar or even needing an internet connection.

Google DeepMind released Gemma 4 under the Apache 2.0 license, making it fully open source. The two models built specifically for smartphones are Gemma 4 E2B (Effective 2 Billion parameters) and Gemma 4 E4B (Effective 4 Billion parameters). These are not watered-down chatbots. They support text, image, and even audio input, handle 128K context windows, support over 140 languages, and can generate code — all running entirely on your device.

Here is what makes this a big deal: once you download the model, every single inference is free. Forever. No API keys, no token limits, no subscription renewals. Your prompts never leave your device, which means complete privacy. You could be on a flight, in a subway, or in a location with zero signal — the AI still works.

How to Get Started Right Now

Download the Google AI Edge Gallery app:

Once installed, open the app, tap AI Chat, and select either Gemma 4 E2B or E4B to download. E2B is lighter and faster, ideal for phones with limited RAM. E4B delivers stronger reasoning and is the sweet spot for modern devices with 8GB or more RAM. After the one-time download, you are completely off the grid.

Why This Should Matter to You

If you are using ChatGPT, Claude, or any paid AI tool for everyday tasks like drafting emails, summarizing text, brainstorming ideas, or writing quick code — you now have a free alternative that runs locally. No cloud dependency, no recurring charges, no data leaving your phone. For developers, students, and anyone experimenting with AI, this is a genuine cost saver.

Google has been quietly building toward this moment. The Gemma model family has already been downloaded over 400 million times, and with Gemma 4, the gap between on-device and cloud AI has shrunk dramatically. The E2B and E4B variants are specifically optimized for mobile hardware, leveraging NPUs found in chips like Qualcomm Snapdragon 8 Gen 2 and Google Tensor.

Run Gemma 4 without internet. Run it free. Run it private. Google just handed everyone a capable AI assistant with no strings attached — and all you need to do is download it.

Advertisement

Marc Andreessen Says AGI Is Already Here. He Said the Same Thing About Google Glass.

Marc Andreessen AGI already here

In April 2013, Marc Andreessen stood in front of a crowd and said Google Glass was going to change the world. “You put it on and you say yep, that’s the future,” he told TechCrunch. He co-founded the Glass Collective — a fund backed by a16z, Kleiner Perkins, and Google Ventures — specifically to seed startups building Glass apps. He said people would feel “naked and lonely” without it.

Google Glass was discontinued in 2015.

On April 6, 2026, Andreessen posted four words on X: “AGI is already here.” The post crossed 1.5 million views within hours, with YC president Garry Tan amplifying it and skeptics immediately firing back. One user, @TheRealJunto, responded by digging up the Glass quote from 2013. The parallel is uncomfortable — and instructive.

The Declaration Without a Definition

The core problem with Andreessen’s AGI claim is the same problem every such claim has: there is no agreed-upon definition of AGI. The term generally refers to an AI system that can match or exceed human cognitive ability across any task — not just coding or chess or writing, but reasoning, creativity, physical dexterity, and judgment under uncertainty.

By that standard, we are not there. GPT-5.4 scored 75% on OSWorld-V, a benchmark simulating desktop productivity tasks. The human baseline on that same benchmark is 72.4%. That is impressive. It is not general intelligence — it is one model, on one benchmark, slightly outperforming an average person at clicking through software.

A synthesis of over 9,800 expert predictions places the consensus arrival of AGI at approximately 2040. This is not a fringe view. It is the median of the world’s most informed forecasters.

The definition shifts because shifting it is useful. When Huang says “we’re already there,” he’s using a looser definition — AI that delivers economic value at scale. When Andreessen says it’s here, he likely means something similar. But the original academic definition of AGI — a system that can learn and perform any intellectual task a human can — is a far higher bar. The goalpost moves because the people moving it have something to gain each time it moves.

The Pattern of Confident Declarations

Andreessen is not unique in making this kind of claim. The pattern is consistent and worth naming directly.

Elon Musk predicted AGI by the end of 2025. When that didn’t happen, he updated the timeline to 2026. Jensen Huang said at the Financial Times Future of AI Summit in November 2025: “We are already there… it doesn’t matter, because at this point it’s a bit of an academic question.” Sam Altman has put the milestone somewhere between 2029 and 2035 depending on which interview you read.

None of these people are defining the term the same way. All of them have enormous financial stakes in you believing the milestone is close — or already achieved.

Musk sells compute through xAI and needs investment. Huang sells the chips that run AI. Andreessen’s firm has hundreds of millions in AI portfolio companies. When you hear an AGI declaration, it is worth asking: what does this person need you to believe, and why today?

What “uneven distribution” Actually Means

Andreessen borrowed William Gibson’s famous line — “the future is already here, it’s just not evenly distributed” — to frame AGI as an access problem rather than a capability problem. This is rhetorically elegant and technically evasive.

The implication is that AGI exists somewhere, for someone, and the rest of us just haven’t caught up. But that’s not how capability works. A model that can outperform a human at coding tasks, or financial modeling, or research summarization, is not AGI — it is narrow intelligence applied at scale. Impressive, economically valuable, and genuinely transformative. But calling it AGI is a category error dressed up as insight.

Why this Matters for the Industry

The AGI declaration game has real consequences. Enterprise buyers accelerate procurement cycles. Boards shift strategy. Governments fast-track regulation. Investors price companies on AGI proximity rather than revenue fundamentals.

For enterprise buyers, this matters in a concrete way. When a figure like Andreessen declares AGI, procurement cycles compress. Boards that were planning 18-month AI roadmaps start asking why they aren’t moving faster. Vendors use the declaration as sales ammunition. Budgets shift. Hiring plans change. None of this is based on a technical assessment — it’s based on a narrative authored by someone with a financial stake in the outcome. That’s not a reason to ignore AI. It’s a reason to separate the signal from the performance.

When someone with Andreessen’s platform declares AGI has arrived, the market moves — even if the claim is unfounded. That’s not nothing. But it’s also not a technical assessment. It’s a narrative.

In 2013, the narrative was that wearable computing was inevitable and imminent. Andreessen was so convinced he created a fund to profit from it. The product was discontinued before most people ever tried it.

The technology wasn’t wrong, exactly. The timeline was. And the conviction was mistaken for evidence.

We’ve seen this before.

Advertisement

Andrej Karpathy’s LLM Knowledge Base: How AI Is Replacing Personal Note-Taking

Andrej Karpathy LLM knowledge base AI workflow

Most people use AI the same way every day. Open a chat, ask a question, get an answer, close the tab. The next day, start over. Every conversation resets to zero. The AI never remembers. You never build anything that compounds.

Andrej Karpathy just showed a different way.

Karpathy is not a casual observer of AI. He co-founded OpenAI, led AI at Tesla, and coined the term “vibe coding” — the practice of describing what you want to an AI agent and letting it build. When he shares a workflow, people pay attention. His April 2 post on X, titled “LLM Knowledge Bases,” has already crossed 1.2 million views and sparked a wave of developers rebuilding their entire research systems from scratch.

The System

The idea is deceptively simple. Instead of chatting with an LLM and forgetting everything, Karpathy feeds raw source material — articles, research papers, GitHub repos, datasets, images — into a folder called raw/. An LLM then incrementally compiles that material into a structured wiki: summaries, concept articles, backlinks, index files. All written and maintained by the AI. Karpathy himself doesn’t manually edit or add anything to the wiki. The LLM writes it, updates it, and runs regular “health checks” — scanning for inconsistencies, filling gaps via web search, and suggesting new articles based on what’s missing.

The frontend is Obsidian, a markdown-based note-taking tool. The LLM writes. You read. His current knowledge base on a recent research topic: roughly 100 articles and 400,000 words. Longer than most PhD dissertations. Built without typing a single word.

Also Read: Former Tesla AI Director Andrej Karpathy rejoins OpenAI

Why This Beats RAG

For the past few years, the standard approach to giving AI access to your own documents has been RAG — Retrieval-Augmented Generation. You chunk documents into pieces, convert them into mathematical vectors, store them in a database, and retrieve relevant chunks when you ask a question. It works, but it’s a black box. You can’t read the embeddings. You can’t audit what the AI found. You can’t trace an answer back to a specific source.

Karpathy’s system rejects all of that complexity. Because the wiki is just markdown files, every claim is traceable. Every article is readable. Every connection is visible. He notes he expected to need complex RAG infrastructure, but at personal knowledge base scale, a well-structured markdown wiki turns out to be something a modern LLM can navigate “fairly easily.”

The Follow-Up Was the Real Signal

After the original tweet went viral, Karpathy did something that quietly said more than the workflow itself. He didn’t share the code. He didn’t release an app. He published a GitHub Gist — an “idea file” — and explained: in the era of LLM agents, there’s less point sharing specific implementations. You share the idea. Each person’s agent builds a version customized for their specific needs.

That’s a meaningful statement about where AI development is going. The product is increasingly the concept, not the code.

Developer Farza built a live example of exactly this — “Farzapedia” — a personal Wikipedia compiled from 2,500 entries across his diary, Apple Notes, and iMessages. The result: 400 articles covering research areas, people, projects, and ideas, all interlinked, all maintained by AI. Karpathy highlighted it as proof of concept.

What It Means for You

If you work in data science, AI research, or any field where staying current matters, this is worth paying attention to. The competitive advantage in the next phase of AI isn’t going to come from knowing how to prompt better. It’s going to come from having better systems — structured, compounding, queryable knowledge that gives your AI agents the context they need to do genuinely useful work.

Karpathy ended his original post with a line that’s worth sitting with: “I think there is room here for an incredible new product instead of a hacky collection of scripts.”

He’s right. That product doesn’t exist yet. But the workflow does. And it’s available to anyone willing to set it up today.

Advertisement

Delve Removed from Y Combinator After Fake Compliance Scandal Rocks AI Startup

Delve removed from Y Combinator after fake compliance scandal

Delve, the AI compliance startup that was one of Y Combinator’s most celebrated portfolio companies, has officially parted ways with the accelerator. Co-founder and COO Selin Kocalar confirmed it on X on April 4: “YC and Delve have parted ways. I still remember the day we took our YC interview at MIT.”

The statement is the most public acknowledgment yet of a collapse that has unfolded rapidly over the past three weeks — and it caps one of the most dramatic falls from grace in recent startup history.

The Rise

Delve was founded in 2023 by Karun Kaushik and Selin Kocalar, both MIT dropouts. The company graduated from YC’s Winter 2024 batch with a clear pitch: use AI agents to automate the painful, time-consuming process of obtaining security and compliance certifications — SOC 2, HIPAA, GDPR. The idea resonated. Delve raised a $3M seed round, then a $32M Series A led by Insight Partners at a $300M valuation in July 2025. YC President Garry Tan called them “a top YC startup.” The founders made Forbes’ 30 Under 30 list for 2026. By every external signal, Delve was a rocketship.

The Unraveling

On March 18, 2026, an anonymous Substack writer called DeepDelver published a detailed investigation alleging that Delve was running what amounted to fake compliance as a service. The core allegation: 493 out of 494 SOC 2 audit reports generated by Delve were 99.8% identical — same paragraphs, same grammatical errors, same exact phrasing across hundreds of client files. Customers were allegedly told they were HIPAA and GDPR compliant when the evidence supporting those certifications had been fabricated.

The fallout was immediate. Hacker News hit 835 points within hours. Insight Partners quietly removed its investment blog post from its website. Delve disabled its demo pipeline.

Delve pushed back, describing itself as an “automation platform” where final reports are issued by independent auditors, not by Delve. Most observers, including prominent engineers like patio11, called it a textbook non-denial denial.

Also Read: The First $1 Billion AI Company With One Employee Is Here

It Got Worse

A second DeepDelver post on March 30 introduced a new allegation: Delve’s enterprise workflow product “Pathways,” sold to customers at $50,000 to $200,000+, was allegedly a lightly modified fork of SimStudio — the open-source tool built by fellow YC startup Sim.ai. The Apache 2.0 license requires attribution. Delve allegedly gave none, contracted an outside dev shop to maintain it, and told prospects they had built it from the ground up.

The detail that drew the most outrage: Sim.ai was an actual Delve customer. Delve had audited Sim.ai, charged them full price, and simultaneously sold their own product back to them — without credit or compensation. Sim.ai CEO Emir Karabeg, who had initially consoled the Delve founders after the first scandal, told TechCrunch he had not heard from them since learning of the allegation.

Where It Stands

As of April 4, Delve has released a statement citing “ongoing cybersecurity and forensics investigations” as the reason it could not comment earlier. Pathways has been scrubbed from the website. The media inquiries email no longer works. No formal regulatory action — no AICPA ruling, no HIPAA enforcement, no SEC investigation — has been initiated. The entire exposure came from one anonymous Substack writer.

That is perhaps the most important detail in this entire story. Seventeen hundred companies paid for security certifications. Many of them handle patient data. The system that was supposed to catch this — investors, auditors, the accelerator — did not. One pseudonymous writer on Substack did.

The Bigger Picture

Delve is being framed as a cautionary tale about Gen-Z founders and AI hype. That framing is too easy. The harder question is about incentive design. In a market where a clean pitch, a YC badge, and a $300M valuation creates enough social proof to skip due diligence, Delve was not an anomaly. It was an outcome.

Advertisement

The First $1 Billion AI Company With One Employee Is Here — And It’s Not Who You Think

Matthew Gallagher Medvi one-person billion-dollar company

For years, the most powerful people in tech have been making a bet.

Sam Altman had a group chat with fellow tech CEOs — a literal betting pool — for the year the first one-person billion-dollar company would appear. Dario Amodei, CEO of Anthropic, was asked directly when it would happen. His answer: 2026. The crowd applauded, awkwardly.

Nobody was thinking about Matthew Gallagher.

Gallagher is 41, self-taught, and grew up in a trailer park in Los Angeles. He is not a Silicon Valley insider. He did not raise venture capital. He did not hire a team of engineers, designers, or marketers. In September 2024, he launched Medvi — a telehealth platform selling GLP-1 weight-loss drugs — from his house, with $20,000 and more than a dozen AI tools.

In its first full year of operation, Medvi generated $401 million in sales. In 2026, it is on track to hit $1.8 billion. The New York Times was given access to Medvi’s financials to verify these numbers. They checked out.

The Matthew Gallagher Medvi one-person billion-dollar company story is not a thought experiment anymore. It has audited revenue figures.

How He Actually Built It

Gallagher used AI for almost everything. He used ChatGPT, Claude, and Grok to write the code powering his platform. AI generated his website copy, his ad images, and his customer-facing videos. He built AI systems to track business performance in real time. When he needed customer service, AI handled it.

What he did not try to build himself was the medical infrastructure. Instead, he partnered with CareValidate and OpenLoop Health — companies that handle licensed physicians, pharmacies, drug shipping, and regulatory compliance. His job was branding, marketing, and growth. AI handled the execution.

In its first month, Medvi had 300 customers. The second month: 1,300. The word Gallagher used to describe early growth was simple: “insane.”

Also Read: Yann LeCun Launches AMI Labs

CareValidate’s own team was baffled. They kept asking if he had people working behind the scenes. He did not. OpenLoop’s CEO Jon Lensing eventually stopped being surprised. “Matthew’s native tongue seems to be AI,” he said.

The Numbers That Change the Argument

The Matthew Gallagher Medvi one-person billion-dollar company story becomes more striking when you compare it to the established players in the same market.

Hims and Hers, one of Medvi’s direct competitors, has over 2,400 employees. In 2025, it posted a net profit margin of 5.5%. Medvi, with two people — Gallagher and his brother Elliot, hired in April 2025 — posted a net profit margin of 16.2%. Total net profit: between $65 million and $80 million on $401 million in revenue.

By the end of 2025, Medvi had 250,000 customers. Gallagher has already reinvested profits into expansion. A men’s health line launched in February 2026 hit 50,000 customers in its first month. Women’s health, hormone therapy, hair growth, and supplements are next.

His only regret about hiring? He tried contractors first. “It just increased my costs, and then it delayed my decision-making because I had more people to deal with,” he said.

The Critics Are Partly Right — And Entirely Missing the Point

The most common pushback on the Medvi story is that the GLP-1 market did the heavy lifting. Americans were desperately seeking affordable weight-loss drugs without a doctor’s appointment. Gallagher timed the wave perfectly.

That is true. And it is irrelevant.

The Matthew Gallagher Medvi one-person billion-dollar company was not built because GLP-1 drugs are popular. It was built because one person could now construct, operate, and scale an entire commercial infrastructure using AI tools that cost a few hundred dollars a month. He didn’t ride the wave. He built the surfboard — alone, in sixty days.

The GLP-1 market has dozens of competitors. Most of them needed teams, funding rounds, and years. Gallagher needed a laptop and $20,000.

What This Means

The prediction that a one-person billion-dollar company was coming was treated, until recently, as a fun thought experiment for tech conferences. Altman’s betting pool was a novelty. Amodei’s answer of “2026” got an awkward applause.

Medvi just made it a case study.

The implications are structural, not inspirational. If one person can build and operate a company at this scale — with better margins than 2,400-person competitors — the assumptions underneath hiring, team-building, and startup fundraising need to be revisited.

Gallagher himself summed up where he is now, emotionally, in a line to the Times: “For the first time, I’m not in survival mode.”

The one-person billion-dollar company is no longer the future of business. It is the present. And it has a name: Medvi.

Advertisement