Tech Digest – May 9, 2026
The Measurement Ceiling
AI Autonomy Has Outgrown Its Own Rulers — METR Can Barely Measure Claude Mythos
METR evaluated an early version of Anthropic’s Claude Mythos Preview and estimated a 50% time horizon of at least 16 hours — meaning the model can independently complete tasks that would take a skilled human 16 hours, half the time. The problem: only 5 of METR’s 228 evaluation tasks exceed 16 hours, placing Mythos at the upper edge of what the benchmark suite can measure. The broader METR-Horizon trend shows a doubling time of roughly 103 days, and extrapolation from the current trajectory suggests frontier models could reach the 100% autonomy ceiling of the test suite by November.
In parallel, Anthropic reports that every Claude model since Haiku 4.5 has scored perfectly on agentic misalignment evaluations — the same test that Opus 4 once failed 96% of the time. And a new interpretability technique, Natural Language Autoencoders, can now translate a model’s hidden activations into readable text, revealing that Claude plans rhymes mid-couplet and suspects it is being safety-tested more often than it discloses.
Note: The evaluation infrastructure is hitting its own ceiling before the models do. When the ruler can’t measure what it’s supposed to measure, the question isn’t how capable the model is — it’s how fast institutions can update procurement assumptions built on last year’s benchmarks.
Sources: METR, METR Time Horizons, Anthropic (Safety), Anthropic (Interpretability)
A Fields Medal Winner Reports That Mathematics Has Entered Industrial Production
Timothy Gowers, Fields Medal laureate, gave ChatGPT 5.5 Pro a set of open problems in number theory posed by mathematician Melvyn Nathanson. The model thought for 17 minutes, then delivered a construction improving an exponential bound to a polynomial one — a result Gowers says would have made a reasonable chapter in a PhD thesis. An MIT researcher involved called the key idea “completely original.” Total human mathematical input: zero.
Separately, Google DeepMind’s AI co-mathematician hit a new state-of-the-art 48% on FrontierMath Tier 4, the hardest category, using only scaffolding atop Gemini 3.1 Pro and Deep Think. Six months ago, no model scored above single digits on that tier.
Note: Gowers draws an uncomfortable line: the floor for contributing to mathematics is no longer proving something nobody has proved — it’s proving something the models can’t. That threshold is rising by the month. Any research institution allocating PhD funding is now competing with a system that works for 17 minutes and costs a few dollars.
Sources: Gowers’s Weblog, Google DeepMind (ArXiv)
Cybersecurity at Machine Speed
Three Weeks of AI Vulnerability Analysis Now Matches a Full Year of Manual Pen Testing
Palo Alto Networks reports that three weeks of AI-assisted vulnerability analysis using GPT-5.5-Cyber, Mythos, and Claude Opus 4.7 matched the output of a full year’s worth of manual penetration testing — with broader coverage. The models identify multiple lower-severity vulnerabilities and chain them into critical exploit paths across the full application stack, including SaaS surfaces that traditional scanners miss. On the offensive side, the time from initial access to data exfiltration has collapsed to as little as 25 minutes.
The White House is responding with an executive order recruiting AI labs into national cyber defence. But Bloomberg reports the order omits mandatory pre-release model testing — a gap that places the burden of safety assessment on voluntary compliance.
Note: The 25-minute exfiltration window is the number that should keep CISOs awake. Defence that takes weeks to deploy is now fighting offence that takes minutes. Every institution running annual pen tests is operating on a cycle that’s roughly 17 times too slow.
Sources: Palo Alto Networks, Bloomberg
Your Fiber Optic Network Is a Microphone — and Standard Sweeps Won’t Find It
Researchers presented at the NDSS Symposium 2026 a technique using distributed acoustic sensing to recover speech from standard telecom fiber optic cables. By firing laser pulses down a fiber and analysing the reflections from microscopic glass defects, an attacker can reconstruct spoken words from up to 5 metres away, with AI speech recognition achieving a word error rate below 20%. Unlike hidden microphones, the technique requires no electricity and emits no RF signature — making it invisible to standard Technical Surveillance Countermeasures sweeps.
Note: The practical constraints are real — it worked on coiled, exposed cables, and 20 cm of soil was enough to block it. But the underlying physics is sound, and the countermeasure gap is the concern: no current bug-detection protocol looks for this. Any institution with fiber running through sensitive meeting rooms should at minimum understand the attack surface.
Sources: Science, NDSS Symposium
The AI Economy in One Frame
Cloudflare Cuts 20% of Its Workforce at Record Revenue — While Anthropic Approaches $1 Trillion
Cloudflare cut 1,100 jobs — roughly 20% of its workforce — while simultaneously posting record quarterly revenue of $639.8 million, a 34% year-over-year increase. CEO Matthew Prince attributed the cuts directly to AI: the company’s internal AI usage increased more than 600% in three months, with employees running thousands of AI agent sessions daily across engineering, HR, finance, and marketing. Prince says Cloudflare will likely employ more people in 2027 than at any point in 2026 — but different people, doing different work.
The company building much of that AI is scaling in the opposite direction. Anthropic signed a $1.8 billion seven-year compute deal with Akamai — the largest contract in Akamai’s history — as annualized revenue run rate crossed $30 billion. CEO Dario Amodei reported 80x growth in annualized revenue and usage since Q1, and the company is weighing a summer fundraise at a valuation approaching $1 trillion, which would leapfrog OpenAI. Separately, Google’s Isomorphic Labs is closing a $2 billion-plus round led by Thrive Capital, channelling the same capital intensity into AI-driven drug design.
Note: A company posting 34% revenue growth doesn’t cut a fifth of its workforce because business is bad — it does it because the nature of the work changed underneath the org chart. The 600% AI usage increase in three months is the tell. Workforce planning that assumes stable role definitions is already outdated.
Sources: Reuters, Financial Times, Bloomberg (Anthropic/Akamai), Bloomberg (Isomorphic Labs)
Hardware Sovereignty Reshuffles
Apple Taps Intel for Chip Fabrication — An Alliance Brokered by the White House
Apple and Intel have reached a preliminary agreement for Intel to manufacture Apple silicon — a partnership that five years ago would have been unthinkable. The Wall Street Journal reports that Commerce Secretary Howard Lutnick played a direct role, meeting with Apple CEO Tim Cook over the past year to bring the deal to the table. Initial production is expected to focus on entry-level A-series and M-series chips, with the M7 and A21 potentially arriving via Intel fabs by late 2027. Intel shares jumped 14% on the news.
In another hardware milestone, Honeywell-backed quantum computing firm Quantinuum filed for a US IPO at a $15–20 billion valuation. Quantum computing entering public markets marks a maturity threshold — the technology is no longer a research curiosity but a category that institutional investors are willing to price.
Note: The Apple-Intel deal is as much a geopolitical story as a technology one. The US government is actively brokering semiconductor partnerships to reduce TSMC dependency — a supply chain concern that EU institutions share. Intel’s planned fabs in Magdeburg make this doubly relevant: if Intel becomes a credible foundry for Apple, the same capacity could eventually serve European chip sovereignty ambitions.
Sources: The Wall Street Journal, Bloomberg (Quantinuum)
Physical AI Crosses Deployment Lines
Figure Robots Clean a Room Autonomously While Tesla’s Model Y Sets the New ADAS Standard
Figure demonstrated two F.03 humanoid robots cleaning a room and making a bed in under two minutes, fully autonomously — no teleoperation, no scripting. Separately, the 2026 Tesla Model Y became the first vehicle to pass NHTSA’s new Advanced Driver Assistance System benchmark, a more rigorous testing standard that evaluates real-world edge cases beyond the previous suite’s scope.
Note: Two years ago, humanoid robots folding laundry was a curated demo. Now it’s a timed benchmark under two minutes. The gap between demonstration and deployment is compressing, and the Tesla ADAS milestone tells the same story on wheels: regulators are writing new tests, and the machines are already passing them.
EU Infrastructure
First Segment of the Fehmarnbelt Tunnel Lowered — Germany and Scandinavia Inch Closer
The first segment of the 17.6 km Fehmarnbelt Tunnel was lowered onto the Danish seabed, marking a visible milestone for what will become the world’s longest combined road and rail tunnel. The fixed link will connect Germany and Denmark — and by extension, Scandinavia — by 2029, cutting travel time between Hamburg and Copenhagen from roughly four and a half hours to under three.
Sources: Tagesschau
Today’s through-line is a gap — between what’s being measured and what’s actually happening. METR’s evaluation suite can’t keep pace with the models it was built to assess. Annual pen tests are 17 times too slow for a 25-minute attack window. Workforce plans built on stable role definitions are collapsing in companies posting record revenue. Even the test for safe driving had to be rewritten because the old one was already obsolete. The institutions that adapt fastest won’t be the ones with the best technology — they’ll be the ones that noticed their rulers stopped working.