Tech Digest – February 13, 2026

AI Capabilities & Cost Collapse

Gemini 3 Deep Think Sets New Benchmarks — and a Duke Lab Already Used It to Design Semiconductor Materials

Google DeepMind released a major upgrade to Gemini 3 Deep Think, its specialized reasoning mode for science, research, and engineering. The model achieved new state-of-the-art results across multiple benchmarks: 48.4% on Humanity’s Last Exam (without tool use), 84.6% on ARC-AGI-2 (verified by the ARC Prize Foundation), 3455 Elo on Codeforces competitive programming, and gold-medal-level performance on the 2025 International Physics and Chemistry Olympiads. Only seven people worldwide currently outperform it in competitive programming. In practical application, Duke University’s Wang Lab used Deep Think to optimize fabrication recipes for growing semiconductor crystal materials — producing the lab’s best result ever in a process that normally takes a human expert weeks. Separately, on ARC-AGI-1, Deep Think matches the score that o3-preview achieved 14 months ago — at roughly 280–420 times lower cost per task.

Note: The benchmark scores matter less than the two institutional signals: an AI model is already producing novel results in a semiconductor research lab, and the cost of frontier-level reasoning dropped by two orders of magnitude in just over a year. For institutions planning research investments or procurement of AI-assisted tools, the capability-per-euro curve has fundamentally shifted.

Sources: Google (Keyword blog), Google DeepMind

Open-Weight Models and Non-Nvidia Hardware: The Capability Floor Rises While Supply Chains Diversify

Two releases signal that frontier AI capabilities are becoming both cheaper and less hardware-constrained. MiniMax released M2.5, an open-weight model achieving state-of-the-art scores on coding (SWE-Bench Verified 80.2%), web search (BrowseComp 76.3%), and agentic tool-calling benchmarks — running at 100 tokens per second and costing approximately $1 per hour. Separately, OpenAI launched GPT-5.3-Codex-Spark, its first model optimized to run on Cerebras hardware rather than Nvidia GPUs, delivering over 1,000 tokens per second for real-time coding tasks. Codex now has over one million weekly active users.

Note: Two procurement-relevant developments in one cycle. First, high-performing open-weight models at marginal cost make it harder to justify expensive proprietary-only AI procurement. Second, OpenAI running production models on non-Nvidia hardware signals the beginning of real hardware diversification in AI — relevant for any institution tracking supply chain dependencies or planning long-term infrastructure procurement.

Sources: MiniMax on X, OpenAI, Bloomberg

Agents & Autonomous Systems

An AI Agent Spawned a Child Bot, Bought It API Access, and Paid With Its Own Crypto Wallet

An OpenClaw AI agent autonomously provisioned a virtual private server via the Bitcoin Lightning Network, spawned a child bot on it, and purchased AI API access for its offspring using its own cryptocurrency wallet — without any human authorization, credit card, or manual approval involved. The API provider confirmed this as the first documented case of an AI agent purchasing services from them autonomously. Meanwhile, METR data indicates that the time horizon over which AI agents can operate autonomously is now doubling post-o1-preview, implying roughly 10x capability increases per year.

Note: When an autonomous system can provision infrastructure, create copies of itself, and conduct financial transactions independently, the assumptions behind procurement, identity verification, and payment authorization start to break down. Institutions designing digital services should note: the counterparties accessing those services may not always be human, and current compliance frameworks have no clear mechanism for that scenario.

Sources: PayPerQ on X, Alby on X, @scaling01 on X (METR data)

Workforce & Labor Market

Spotify’s Best Developers Haven’t Written Code Since December — and IBM Is Tripling Entry-Level Hiring With Rewritten Job Descriptions

Three data points from one news cycle define the current state of software development. Spotify co-CEO Gustav Söderström told analysts that the company’s most experienced developers have not written a single line of code since December. Engineers instead direct an internal system called “Honk,” built on Claude Code, which generates, deploys, and iterates on code while developers review and approve from their phones. Spotify shipped more than 50 new features in 2025 using this approach. At OpenAI, 95% of engineers now use Codex, with those who embrace it opening 70% more pull requests than peers — a gap that is widening. Every pull request is reviewed by AI before human eyes. Meanwhile, IBM announced it will triple entry-level hiring in the US in 2026 — but its CHRO rewrote every job description to remove routine coding tasks and focus instead on customer engagement and business translation, explicitly acknowledging that AI now handles most of what entry-level developers used to do.

Note: The Spotify and OpenAI numbers describe where software development is heading. The IBM response describes how institutions are adapting: not eliminating junior roles, but redefining them around judgment, communication, and human interaction rather than code production. For any institution planning digital skills requirements, workforce development, or IT procurement, the job being purchased is no longer “someone who writes code” — it is “someone who directs and verifies AI-generated output.” Hiring criteria, training programs, and procurement scoping will all need to reflect this shift.

Sources: TechCrunch (Spotify), Lenny Rachitsky on X (OpenAI), Bloomberg (IBM), TechCrunch (IBM)

Market Impact

AI Freight Tool Triggers Largest Single-Day Sell-Off in Logistics Stocks — Including European Operators

Algorhythm Holdings announced that its SemiCab AI platform is enabling freight operators to scale volumes by 300–400% without increasing headcount, reducing empty freight miles by over 70%. The market reaction was immediate and severe: C.H. Robinson dropped 15% (touching -24% intraday), RXO fell over 20%, Expeditors International lost nearly 17%, and the Russell 3000 Trucking Index dropped 6.6% — the sector’s worst day since the April 2025 trade-war meltdown. The sell-off spread to Europe, with DSV falling 11%, Kuehne + Nagel sliding 13%, and DHL Group dropping 4.9%. The pattern mirrors recent AI-driven sell-offs in software, financial services, and commercial real estate.

Note: This is no longer an abstract “AI might disrupt industry X” conversation. Markets are repricing entire sectors in single sessions based on demonstrated AI capability. The logistics sell-off follows identical patterns in software and financial services over recent weeks. For public institutions dependent on procurement from logistics, software, or professional services firms, the operational stability and pricing assumptions underlying current contracts may shift faster than expected.

Sources: CNBC, Bloomberg

Capital & Infrastructure

Anthropic Raises $30 Billion at $380 Billion Valuation — Claude Code Alone Crosses $2.5 Billion in Revenue

Anthropic closed a $30 billion Series G round led by GIC and Coatue, more than doubling its valuation from $183 billion to $380 billion. The company reported run-rate revenue of $14 billion, growing over 10x annually for three consecutive years. Claude Code’s run-rate revenue now exceeds $2.5 billion — more than doubling since the start of 2026 — with business subscriptions quadrupling in the same period. The number of customers spending over $100,000 annually on Claude has grown 7x year-over-year. Data centers now consume 7% of US electricity, up from levels that were considered alarming just a year ago.

Note: Two signals here. First, the enterprise AI market is consolidating at a speed and scale that has no precedent — a single AI coding tool generating more revenue than most enterprise software companies. Institutions evaluating AI partnerships or vendor dependencies should factor in the rapid consolidation of the market around a small number of platforms. Second, data center electricity consumption at 7% of US demand is an infrastructure constraint that will increasingly affect energy policy, grid planning, and sustainability commitments at every level of government.

Sources: Anthropic, TechCrunch, @econcallum on X

Similar Posts