Hacker News
Daily AI Digest

Welcome to the Hacker News Daily AI Digest, where you will find a daily summary of the latest and most intriguing artificial intelligence news, projects, and discussions among the Hacker News community. Subscribe now and join a growing network of AI enthusiasts, professionals, and researchers who are shaping the future of technology.

Brought to you by Philipp Burckhardt

AI Submissions for Wed Feb 18 2026

What years of production-grade concurrency teaches us about building AI agents

Submission URL | 115 points | by ellieh | 36 comments

Title: Your Agent Framework Is Just a Bad Clone of Elixir: Concurrency Lessons from Telecom to AI Author: George Guimarães

Summary: Guimarães argues that today’s Python/JS agent frameworks are reinventing Erlang/Elixir’s 40-year-old actor model—the same principles the BEAM VM was built on to run telecom switches. AI agents aren’t “web requests”; they’re long-lived, stateful, concurrent sessions that demand lightweight isolation, message passing, supervision, and fault tolerance. Those properties are native to the BEAM and only partially approximated by Node.js and Python frameworks. If you’re building AI agents at scale, Elixir isn’t a hipster pick; it’s the architecture the problem calls for.

Key points:

  • The 30-second request problem: Agent sessions routinely hold open connections for 5–30s with multiple LLM calls, tools, and streaming—multiplied by 10,000+ users. Thread-per-request stacks struggle here.
  • Why BEAM fits:
    • Millions of lightweight processes (~2 KB each), each with its own heap, GC, and fault isolation.
    • Preemptive scheduling (every ~4,000 reductions) prevents any single agent from hogging CPU.
    • Per-process garbage collection avoids global pauses at high concurrency.
    • Native distribution: processes talk across nodes transparently.
    • Phoenix Channels/LiveView already handle 100k+ WebSockets per server; an agent chat is just another long-lived connection.
  • Node.js comparison:
    • Single-threaded event loop makes CPU-heavy work block unrelated sessions unless offloaded.
    • Stop-the-world GC and process-wide crashes hurt tail latency and reliability.
  • Python/JS agent frameworks are converging on actors:
    • Langroid explicitly borrows the actor model.
    • LangGraph models agents as state machines with reducers and conditional edges.
    • CrewAI coordinates agents via shared memory and task passing.
    • AutoGen 0.4 pivots to an “event-driven actor framework” with async message passing and managed lifecycles.
    • Conclusion: they’re rediscovering what the BEAM has provided since 1986.
  • LLMs + Elixir: José Valim highlighted a Tencent study where Claude Opus 4 achieved the highest code-completion rate on Elixir problems (80.3%), but the deeper point is runtime fit, not just codegen ergonomics.

Why it matters:

  • Agentic workloads look like telecom, not classic web requests. Architectures tuned for short, stateless requests buckle under thousands of long-lived, stateful streams. BEAM’s actor model maps directly onto agent process-per-session designs with supervision and graceful failure.

Practical takeaways for builders:

  • Model each agent/session as an independent BEAM process; use supervision trees for fault recovery.
  • Stream tokens over Phoenix Channels; scale horizontally with native distribution.
  • Keep heavy CPU/ML off the schedulers (use ports, separate services, or NIFs with dirty schedulers).
  • Use per-process state for isolation; leverage ETS when shared, fast in-memory tables are needed.

Caveats:

  • Python still dominates ML tooling; you’ll often pair Elixir orchestration with Python/Rust for heavy inference.
  • NIFs can stall schedulers if misused; prefer ports or dirty schedulers for safety.
  • Team familiarity and hosting/tooling may influence stack choice despite the runtime fit.

Bottom line: AI agent frameworks in Python/JS are converging on the actor model because the problem demands it. If you want production-grade concurrency, fault isolation, and effortless real-time at scale, the BEAM/Elixir stack is the battle-tested blueprint rather than a pattern to reimplement piecemeal.

Here is a summary of the discussion:

Runtime Fit vs. Real-World Bottlenecks A major thread of debate centered on whether BEAM’s concurrency advantages matter when AI workloads are heavily IO-bound. Users like rndmtst and mccyb argued that since agents spend 95% of their time waiting on external APIs (OpenAI/Anthropic), the scheduler's efficiency is less critical than it was for telecom switches. While they admitted hot code swapping is a genuine advantage for updating logic without dropping active sessions, they questioned if runtime benefits outweigh the massive ecosystem and hiring advantages of Python.

"Let It Crash" vs. Context Preservation Commenters wrestled with applying Erlang’s "let it crash" philosophy to LLM context. qdrpl and vns pointed out that restarting a process effectively wipes the in-memory conversation history—a critical failure in AI sessions. asa400 clarified that supervisors are intended for unknown/transient execution errors, not semantic logic failures; however, mckrss noted that BEAM’s fault tolerance doesn't solve "durable execution" (sustaining state across deployments or node restarts), often requiring hybrid architectures with standard databases anyway.

Concurrency Constructs znnjdl shared an anecdote about switching from a complex Kubernetes setup to Elixir for long-running browser agents, noting that distributed problems that resulted in infrastructure "hell" elsewhere were solved by native language constructs. There was significant technical dispute between wqtwt, kbwn, and others regarding whether modern Linux OS threads are sufficient for these workloads versus the BEAM’s lightweight 2KB processes.

Frameworks vs. Primitives The discussion compared building on Elixir primitives (OTP) versus Python frameworks. vns described tools like LangChain as "bloated" attempts to provide structure that Elixir offers natively, though d4rkp4ttern defended newer Python frameworks like Langroid. Finally, jsvlm (José Valim, creator of Elixir) chimed in to correct the historical record, noting that the creators of Erlang implemented the actor model independently to solve practical problems, rather than adopting it from academic theory.

AI adoption and Solow's productivity paradox

Submission URL | 780 points | by virgildotcodes | 734 comments

Headline: CEOs say AI hasn’t moved the needle—economists dust off the Solow paradox

  • A new NBER survey of ~6,000 executives across the U.S., U.K., Germany, and Australia finds nearly 90% report no AI impact on employment or productivity over the past three years. About two-thirds say they use AI—but only ~1.5 hours per week on average—and a quarter don’t use it at all.
  • Despite the muted present, leaders still expect near-term gains: +1.4% productivity and +0.8% output over the next three years. Firms forecast a small employment drop (-0.7%), while workers expect a slight rise (+0.5%).
  • The disconnect revives Solow’s productivity paradox: technology is everywhere except in the macro data. Apollo’s Torsten Slok says AI isn’t yet visible in employment, productivity, inflation, or most profit margins outside the “Magnificent Seven.”
  • Evidence is mixed: the St. Louis Fed sees a 1.9% excess cumulative productivity bump since late 2022; an MIT study projects a more modest 0.5% over a decade. Separately, ManpowerGroup reports AI use up 13% in 2025 but confidence down 18%. IBM says it’s boosting junior hiring to avoid hollowing out its management pipeline.
  • Optimists see a turn: Erik Brynjolfsson points to stronger GDP and estimates U.S. productivity rose 2.7% last year, suggesting benefits may finally follow 2024’s >$250B corporate AI spend.

Why it matters: Echoes of the 1980s IT cycle—big investment first, measurable gains later. Light-touch adoption and workflow inertia may be masking what only shows up after reorganization, tooling maturity, and broader diffusion.

Here is a summary of the discussion:

The Solow Paradox & Historical Parallels Commenters engaged deeply with the article's comparison to the 1970s/80s productivity paradox. While some agree that we are in the "DOS era" of AI—where expensive investment precedes the "Windows 95" era of utility—others argue the comparison is flawed. One user notes a key economic difference: modern AI (e.g., a $20 Claude subscription) has a significantly lower barrier to entry and onboarding cost than the mainframe computing and manual office training required in the 1970s.

The "Infinite Report" Loop A major thread of cynicism focuses on the nature of corporate work. Users argue that while AI might make producing reports "3x faster," it often degrades the signal-to-noise ratio.

  • The Fluff Tax: Critics point out that faster writing shifts the burden to the reader; if a report takes 10% longer to understand because of AI "fluff," overall organizational value is lost.
  • The AI Ouroboros: Several users joked (or lamented) that the inevitable solution is people using AI to summarize the very reports that colleagues used AI to generate, resulting in a hollow loop of information transfer.

Skill Acquisition vs. "Licking the Window" There is significant skepticism regarding using LLMs for learning and skill development.

  • False Confidence: Users warn that AI gives a "false sense of security" regarding understanding material. One commenter vividly described it as "looking through the window" at knowledge rather than grasping it, advocating for the traditional "RTFM" (Read The F*ing Manual) approach for true expertise.
  • Code vs. Prose: While confidence in AI for general communication is low, some developers defend the utility of current models for coding, noting that recent improvements in context handling allow models to effectively read codebases and implement solutions, unlike the "hallucinations" common in semantic text tasks.

Technical Bottlenecks The discussion touched on the limits of current distinct architectures. Some predict that purely scaling context windows (RAG) yields diminishing returns or slows down processing. One prediction suggests that the real productivity breakthrough won't come from larger LLMs, but from hybrid models that pair LLMs with logic-based systems to eliminate hallucinations and perform actual reasoning rather than probabilistic token generation.

Microsoft says bug causes Copilot to summarize confidential emails

Submission URL | 261 points | by tablets | 71 comments

Microsoft says a bug in Microsoft 365 Copilot Chat has been summarizing emails marked confidential, effectively bypassing data loss prevention policies. Tracked as CW1226324 and first detected January 21, the issue hit the Copilot “work tab” chat, which pulled content from users’ Sent Items and Drafts in Outlook desktop—even when sensitivity labels should have blocked automated access. Microsoft attributes it to a code/configuration error and began rolling out a fix in early February; a worldwide configuration update for enterprise customers is now deployed, and the company is monitoring and validating with affected users. Microsoft stresses no one gained access to information they weren’t already authorized to see, but admits the behavior violated Copilot’s design to exclude protected content. The company hasn’t disclosed the scope or a final remediation timeline; the incident is flagged as an advisory, suggesting limited impact. Why it matters: it’s a trust hit for AI guardrails in enterprise email—showing how label- and DLP-based protections can be undermined by new AI features even without a classic data breach.

Based on the discussion, commenters focused on the fundamental conflict between rapid AI integration and enterprise security requirements. Several users criticized Microsoft's approach as "sprinkling AI" onto existing tech stacks without rethinking the underlying security architecture, noting that standard protections (like prompt injection defenses) are insufficient against "unknown unknowns." A self-identified AI researcher argued that engineering is currently outpacing theoretical understanding, leading to "minimum viable product" safeguards that cannot guarantee data safety or effectively "unlearn" information once processed.

Key themes in the thread included:

  • Failure of Guardrails: Participants noted that Data Loss Prevention (DLP) tools are pointless if the AI layer can bypass them, effectively rendering manual classification (like employee NDAs or "Confidential" labels) moot.
  • The OS Debate: The incident sparked a recurring debate about leaving the Microsoft ecosystem for Linux or macOS due to "user-hostile" feature bloat, though counter-arguments pointed out that switching operating systems does not mitigate cloud-service vulnerabilities.
  • Terminology: There was significant skepticism regarding Microsoft’s classification of the bug as an "advisory." Users argued this term softens the reality of what they view as a significant breach of trust and privacy, distinguishing it from the typical, lower-severity definition of the word in IT contexts.

Fastest Front End Tooling for Humans and AI

Submission URL | 109 points | by cpojer | 94 comments

Fastest Frontend Tooling for Humans and AI: a push for 10x faster JS/TS feedback loops

The author argues 2026 is the year JavaScript tooling finally gets fast by pairing strict defaults with native-speed tools—benefiting both humans and LLMs. The centerpiece is tsgo, a Go rewrite of TypeScript that reportedly delivers ~10x faster type checking, editor support, and even catches some errors the JS implementation missed. It’s been used across 20+ projects (1k–1M LOC) and is pitched as stable enough to adopt, especially if you first swap builds to tsdown (Rolldown-based) for libraries or Vite for apps. Migration is simple: install @typescript/native-preview, replace tsc with tsgo, clean legacy flags, and flip a VS Code setting.

On formatting, Oxfmt aims to replace Prettier without losing ecosystem coverage. It bakes in popular plugins (import/Tailwind class sorting) and falls back to Prettier for the long tail of non-JS languages, easing migration and editor integration.

For linting, Oxlint is positioned as the first credible ESLint replacement because it can run ESLint plugins via a shim and NAPI-RS, supports TS config files, and adds type-aware rules. With oxlint --type-aware --type-check, you can lint and type-check in one fast pass powered by tsgo.

To make strictness easy, @nkzw/oxlint-config bundles a comprehensive, fast, and opinionated rule set designed to guide both developers and LLMs:

  • Error-only (no warnings)
  • Enforce modern, consistent style
  • Ban bug-prone patterns (e.g., instanceof), disallow debug-only code in prod
  • Prefer fast, autofixable rules; avoid slow or overly subjective ones

The post includes “migration prompts” for swapping Prettier→Oxfmt and ESLint→Oxlint, and points to ready-made templates (web, mobile, library, server) used by OpenClaw. Smaller DevX picks name-check npm-run-all2, ts-node, pnpm, Vite, and React.

Why it matters: Faster, stricter tooling shortens feedback loops, reduces bugs, and—per the author’s experiments—helps LLMs produce more correct code under strong guardrails. Caveat: tsgo is still labeled experimental, so teams should trial it on a branch before a full switch.

Based on the discussion, the community reaction is divided between excitement for performance gains and concern over the long-term maintainability of a "fractured" ecosystem.

The "Schism" and Maintainability The most contentious point, led by user conartist6, is the fear that rewriting JavaScript tooling in low-level languages (Rust, Go) creates a "big schism." Critics argue this prevents the average JS developer from understanding, debugging, or contributing to the tools they rely on, potentially leaving critical infrastructure in the hands of VC-backed entities (like the creators of VoidZero) rather than the community. TheAlexLichter counters this, arguing that the average web developer rarely contributes to tooling internals anyway, and that AI tools make crossing language barriers (JS to Rust) easier for those who do wish to contribute.

Performance vs. Architecture There is a debate regarding why current tooling is slow.

  • The Unified JS Argument: Some users argue that the slowness isn't due to JavaScript itself, but rather the inefficiency of running three separate programs (bundler, linter, formatter) that all parse the Abstract Syntax Tree (AST) separately. They suggest a unified toolchain written in JS would be sufficient if architected correctly.
  • The Native Speed Argument: Others, including 9dev, argue that JS runtimes have hit a performance wall ("throughput cliff"), making native languages necessary for modern build speeds. They contend that "batch processing" speed is relevant and not just an architectural issue.

Adoption and Compatibility Users like dcr express high interest in switching for the "10x speed increase," noting that if the tools are compatible (e.g., Oxfmt supporting Prettier plugins, Oxlint running ESLint rules via compatibility layers), the migration is worth it. TheAlexLichter confirms that tools like Oxlint and independent projects like Rolldown are designed to be compatible replacements for existing standards.

Other points raised:

  • Bun: User fsmdbrg questions why Bun wasn't mentioned, noting it already offers a fast, unified runtime, bundler, and test runner that solves many of these problems.
  • AI Skepticism: One user initially dismissed the post as "AI spam" due to its tone, highlighting a growing distrust in the community toward AI-generated technical content, though they later walked back the comment.
  • Security: There were minor concerns regarding tracking CVEs in the native dependencies of these new tools, though others felt the risk was manageable compared to general supply chain risks.

The Future of AI Software Development

Submission URL | 199 points | by nthypes | 140 comments

Martin Fowler recaps Thoughtworks’ Future of Software Development Retreat, pushing back on calls for an “AI-era manifesto.” Instead, a 17-page summary distills eight themes showing how practices built for human-only development are buckling under AI-assisted work. Replacements are emerging but immature.

What’s new

  • Supervisory engineering “middle loop”: a layer between prompt/coding and production that focuses on oversight, verification, and integration.
  • Risk tiering as a core discipline: engineering practices and controls scale with the risk of the change/system.
  • TDD reframed as prompt engineering: tests as the most reliable way to specify and constrain LLM behavior.
  • From DevEx to AgentEx: invest in tooling and workflows for humans plus agents, not just humans.

Reality check

  • AI is an accelerator/amplifier, not a panacea. It speeds coding, but if delivery practices are weak, it just accelerates tech debt (echoing the 2025 DORA report).
  • No one has it figured out at scale; the most valuable outcome may be a shared set of questions.

Open questions

  • Skill mix: will LLMs erode FE/BE specialization in favor of “expert generalists,” or just code around silos?
  • Economics: what happens when token subsidies end?
  • Process: do richer specs push teams toward waterfall, or can LLMs speed evolutionary delivery without losing feedback loops?

Security and platforms

  • Security lagged in attendance, but consensus: platform teams must provide “bullet trains” — fast, safe AI paths with security baked in. Vendors may be underweighting safety factors.

Meta

  • Open Space format fostered deep, respectful dialogue and notable inclusivity — a reminder culture still compounds tooling.

Here is the daily digest summary for the top story:

Martin Fowler on AI’s impact: no new manifesto, but big shifts underway

Martin Fowler provides a recap of the Thoughtworks Future of Software Development Retreat, arguing against the creation of a new "AI manifesto" and instead presenting a summary of how AI is buckling practices originally designed solely for humans. The report identifies eight key themes, suggesting that while AI serves as an accelerator, it also threatens to speed up the accumulation of technical debt if delivery practices are weak.

Key emerging concepts include:

  • Supervisory Engineering: A "middle loop" focused on oversight and verification rather than direct coding.
  • Risk Tiering: Scaling engineering controls based on the risk level of the change.
  • TDD as Prompt Engineering: Using tests as the primary method to specify and constrain LLM behavior.
  • AgentEx: Moving beyond Developer Experience to build tooling for both humans and agents.

The report concludes that no one has solved this at scale yet, and the industry currently has more shared questions—about economics, skill specialization, and security—than answers.

Discussion Summary

The discussion on Hacker News pivots from Thoughtworks' high-level theory to the practical realities of model costs, hardware, and the changing nature of code quality.

The Rise of Efficient Models (Kimi k2.5 vs. Claude) A significant portion of the thread focuses on the emergence of Kimi k2.5 (often accessed via Fireworks AI or OpenCode) as a "daily driver" for coding. Users report switching away from Anthropic’s Claude (Sonnet/Opus) due to cost and "sporadic" refusal issues. Several commenters describe Kimi as offering a superior price-to-performance ratio, with one user noting it solved a problem in 60 seconds that Claude failed to address, all for a fraction of the monthly subscription cost.

Local Hardware vs. API Economics There is an active debate regarding the economics of running models locally versus using APIs:

  • Hardware: Enthusiasts are discussing the viability of running near-SOTA models (like Qwen 2.5 Coder or Minimax M25) on consumer hardware. Specific mentions include the AMD Ryzen Strix Halo (approx. $2,500 build) capable of decent token speeds, versus high-end Mac Studios ($20k).
  • Commoditization: Users speculate that token costs are trending toward zero. Given how cheap inference is becoming, some argue it makes less sense to invest heavily in local hardware for pure coding utility, reserving local builds for privacy or hobbyist experimentation.

Philosophy: The Death of "Clean Code"? A deep sub-thread challenges the necessity of "production quality" code in an AI-native world.

  • "Garbage" Grade Code: Users admit that while LLM output can be "garbage grade" (messy, unoptimized), it is often "flawless" in utility—solving immediate problems or building side projects that simply work.
  • Pattern Obsolescence: Commenters argue that design patterns and "clean code" principles exist primarily to lower cognitive load for human maintainers. If AI agents eventually take over maintenance and modification, the need for human-readable architecture may diminish, shifting the requirement from "maintainable" software to simply "scalable and robust" software.

AI Submissions for Sat Feb 14 2026

OpenAI should build Slack

Submission URL | 226 points | by swyx | 273 comments

Why OpenAI Should Build Slack (swyx/Latent Space)

TL;DR: swyx argues OpenAI should ship a Slack-class “work OS” with native agents—unifying chat, coding, and collaboration—to retake the initiative from Anthropic and Microsoft, capitalize on Slack’s stumbles, and lock in enterprises by owning the org’s social/work graph.

Highlights

  • Slack is vulnerable: rising prices, frequent outages, weak/undiscoverable AI, dev‑hostile API costs/permissions, channel fatigue, and mediocre recap/notification tooling. Huddles underuse multimodal AI. Slack Connect is the one thing to copy.
  • OpenAI’s app sprawl: separate chat, browser, and coding apps forces users to “log in everywhere.” Anthropic’s tighter integration (Claude Chat/Cowork/Code + browser control) sets the bar; OpenAI needs a unified surface.
  • “OpenAI Slack” as multiagent UX: chat is the natural orchestration layer for swarms of humans and agents. Make coding agents truly multiplayer so teams can co-drive builds in real time.
  • Dogfood advantage: OpenAI lives in Slack; if it owned the surface, internal use would generate a torrent of rapid, high‑leverage improvements.
  • Strategic moat: layering an organization’s social + work graph into ChatGPT yields durable network effects, richer context for agents/Frontier models, and harder-to-switch enterprise entrenchment than building atop Slack.
  • Feasibility lens: hard for most, but within OpenAI’s reach; Teams proves the category is winnable even against incumbents. Group chats’ mixed consumer traction shouldn’t discourage a serious business network push.
  • Timely catalyst: OpenAI even hired former Slack CEO Denise Dresser—further reason to go build the thing.

Why it matters

  • It reframes OpenAI from “model + point apps” to “platform that owns the daily workflow,” deepening enterprise ARPU and defensibility while showcasing agent-first UX.

Open questions

  • Can OpenAI out-execute Microsoft’s distribution and Slack’s embedded base?
  • Will enterprises trust OpenAI with their org graphs and compliance needs?
  • How much partner/channel friction does this create if OpenAI competes directly with Slack?

Based on the comments, the discussion pivots from OpenAI’s potential entry into the workspace market to a critique of why Google—despite having the resources—failed to build a dominant Slack competitor.

Google’s "Chat" Struggles vs. Workspace Strength

  • Commenters find it ironic that Google Workspace (Docs/Gmail) is considered "incredibly good," yet Google Chat is widely loathed. Users describe the UI as ugly and complain that inviting outside collaborators is nearly impossible compared to Slack.
  • The "Google Graveyard" factor is a major trust barrier. Users cite Google’s history of killing apps (Wave, Allo, Hangouts, the confusion between Duo/Meet) as a reason businesses hesitate to rely on their new tools.
  • One user noted that Google Wave (2009) was essentially "Slack-coded" long before Slack, but Google failed the execution and deployment.

The Microsoft Teams vs. Slack/Google Dynamic

  • The consensus is that Microsoft Teams succeeds not because the chat is good, but because it is a "collaboration hub" bundled with the ecosystem (SharePoint, Outlook, file sharing).
  • While some argue Teams is functionally mediocre (referring to SharePoint as "Scarepoint" and citing bad UI), others note that for enterprise, the chat feature barely matters compared to calendar and meeting integration.
  • Google is seen as missing this "hub" stickiness; they have the components but lack the unified interface that locks enterprises in.

Feature Depth: Excel vs. Sheets

  • A sub-thread debates the quality of Google’s suite. Power users argue Google Sheets/Slides are toys (possessing 5-10% of Excel/PowerPoint’s features) and bad for heavy lifting.
  • Counter-arguments suggest Google wins because "collaboration feels faster" and the missing features are unnecessary for 80% of users.

Gemini and AI Integration

  • Users expressed frustration that Gemini is not yet meaningfully integrated into Google Docs (e.g., users can’t easily use it to manipulate existing text or read from a codebase).
  • A thread involving a Google employee highlights the difficulty of integrating AI at scale: safety checks, enterprise release cycles, and bureaucracy make it harder for Google to ship "integrated AI" quickly compared to agile startups or OpenAI.

Monopoly and Innovation

  • There is a philosophical debate regarding whether Google is too big to innovate. Some users argue for a "Ma Bell" style breakup to force competition, while others defend large monopolies (citing Bell Labs) as necessary funding sources for deep R&D.

News publishers limit Internet Archive access due to AI scraping concerns

Submission URL | 536 points | by ninjagoo | 340 comments

News publishers are throttling the Internet Archive to curb AI scraping

  • The Guardian is cutting the Internet Archive’s access to its content: excluding itself from IA’s APIs and filtering article pages from the Wayback Machine’s URLs interface, while keeping landing pages (homepages, topics) visible. The worry: IA’s structured APIs are an easy target for AI training harvesters; the Wayback UI is seen as “less risky.”
  • The New York Times is “hard blocking” Internet Archive crawlers and added archive.org_bot to robots.txt in late 2025, arguing the Wayback Machine enables unfettered, unauthorized access to Times content, including by AI companies.
  • The Financial Times blocks bots scraping paywalled content — including OpenAI, Anthropic, Perplexity, and the Internet Archive — so usually only unpaywalled FT stories appear in Wayback.
  • Reddit blocked the Internet Archive in 2025 over AI misuse of Wayback data, even as it licenses data to Google for AI training.
  • Internet Archive founder Brewster Kahle warns that limiting IA curtails public access to the historical record; researchers note “good guys” like IA and Common Crawl are becoming collateral damage in the anti-LLM backlash.

Why it matters: In the scramble to protect IP from AI training, news orgs are closing perceived backdoors — a shift that could fragment the web’s historical record and complicate open archiving and research.

The Unintended Consequences of Blocking the Archive Commenters argue that cutting off the Internet Archive (IA) doesn't stop AI scraping; it merely shifts the burden. By throttling centralized archives, publishers force AI companies to utilize residential proxies to scrape websites directly. This decentralizes the traffic load, causing "hugs-of-death" and increased bandwidth costs for individual webmasters and smaller sites that lack the resources to defend themselves, unlike the NYT or Guardian.

"Brute Force" Engineering vs. Efficiency A significant portion of the discussion criticizes the engineering standards at major AI labs. Users express disbelief that companies paying exorbitant salaries are deploying crawlers that behave like "brute force" attacks—ignoring standard politeness protocols like robots.txt, Cache-Control headers, and If-Modified-Since checks. Critics suggest these companies are throwing hardware at the problem to get "instant" access to data, rather than investing in efficient crawling software, effectively treating the open web as a resource to be strip-mined rather than a partner.

The "Freshness" Problem & RAG Participants note that the aggressive behavior isn't just about training data, but likely involves Retrieval-Augmented Generation (RAG) or "grounding." AI agents are scraping live sites to verify facts or get up-to-the-minute information, rendering existing static archives like Common Crawl or older IA snapshots insufficient for their needs. This demand for real-time data incentivizes the bypassing of caches.

Tragedy of the Commons The thread characterizes the situation as a "tragedy of the commons." By aggressively extracting value without regard for the ecosystem's health, AI companies are degrading the quality of the open web they depend on. While some users acknowledge the logistical impossibility of signing contracts with every small website (comparable to radio licensing complexities), the prevailing sentiment is that the current "lawless" approach creates a zero-sum game where blocking bots becomes the only rational defense for publishers.

Colored Petri Nets, LLMs, and distributed applications

Submission URL | 47 points | by stuartaxelowen | 5 comments

CPNs, LLMs, and Distributed Applications — turning concurrency into a verifiable graph

  • Core idea: Use Colored Petri Nets (CPNs) as the foundation for LLM-authored and concurrent systems, because verifiable semantics (tests, typestates, state machines) let you take bigger, safer leaps with AI-generated code.
  • Why CPNs: They extend Petri nets with data-carrying tokens, guards, and multi-token joins/forks—mapping neatly to Rust’s typestate pattern. This opens doors to build-time verification of concurrent behavior: state sync, conflict detection, deadlock avoidance, and safe shared-resource coordination.
  • Practical example: A distributed web scraper modeled as a CPN:
    • Join on available_proxies × prioritized_targets (and optionally domains) to start a scrape.
    • Timed cooldowns per target, domain-level rate limiting, retries with backoff (via guards), and a post-scrape pipeline (raw_html → parsed → validated → stored) that naturally enforces backpressure.
  • Another target: “databuild” orchestration—partitions, wants, and job runs—benefiting from a self-organizing net that propagates data dependencies safely and efficiently.
  • Implementation paths:
    • Postgres-backed engine: transactions for atomic token moves; SELECT FOR UPDATE to claim transitions.
    • Single-process Rust engine: in-memory CPN with move semantics; persistence via a snapshotted event log.
  • Open problems: Automatic partitioning/sharding of the net for horizontal scale; archival strategies; database-level vs. application-level partitioning; or composing multiple CPN services with query/consume APIs.
  • Bonus: Timed Petri nets could make “simulate-before-you-ship” a default, emitting metrics and letting teams model the impact of changes.
  • Ask: Looking for open-source benchmarks/test suites to validate a CPN framework and pit LLM-generated code against.

Discussion Summary:

The discussion focused heavily on how Colored Petri Nets (CPNs) compare to established formal verification methods, specifically TLA+.

  • CPNs vs. TLA+: User sfk questioned why TLA+ isn’t the default choice for this problem space. The author (strtxlwn) responded that while TLA+ is excellent for specification, it requires maintaining a separate implementation. CPNs are attractive because they allow for "specification as implementation"—the code defines the graph, effectively allowing developers to ship formally verifiable code directly.
  • Visuals & Ergonomics: tmbrt noted that CPNs offer "pretty graphs" that make it easier to visualize and animate data flows compared to TLA+. The author added that they are currently exploring Rust and SQL macros to make these invariants easy to define ergonomically within the codebase.
  • Theoretical Foundations: wnnbgmtr pointed out that Petri nets are naturally composable and well-described by category theory, referencing John Baez’s work and the AlgebraicPetri.jl package in Julia.
  • Alternatives: Other users listed adjacent tools in the formal verification space, including SPIN/Promela, Pi Calculus, Alloy, and Event-B.

Show HN: Off Grid – Run AI text, image gen, vision offline on your phone

Submission URL | 112 points | by ali_chherawalla | 60 comments

Off Grid: an open-source “Swiss Army Knife” for fully offline AI on mobile. The React Native app (MIT-licensed) bundles text chat with local LLMs, on-device Stable Diffusion image generation, vision Q&A, Whisper speech-to-text, and document analysis—no internet or cloud calls, with all inference running on your phone.

Highlights:

  • Models: Run Qwen 3, Llama 3.2, Gemma 3, Phi-4, and any GGUF you bring. Includes streaming replies and a “thinking” mode.
  • Image gen: On-device Stable Diffusion with real-time preview; NPU-accelerated on Snapdragon (5–10s/image) and Core ML on iOS.
  • Vision: SmolVLM, Qwen3-VL, Gemma 3n for scene/doc understanding; ~7s on recent flagships.
  • Voice: On-device Whisper for real-time transcription.
  • Docs: Attach PDFs, code, CSVs; native PDF text extraction; auto-enhanced prompts for better image outputs.

Performance (tested on Snapdragon 8 Gen 2/3, Apple A17 Pro): 15–30 tok/s for text, 5–10s per image on NPU (CPU ~15–30s), vision ~7s; mid-range devices are slower but usable. Android users can install via APK from Releases; iOS and Android builds are supported from source (Node 20+, JDK 17/Android SDK 36, Xcode 15+). Repo credits llama.cpp, whisper.cpp, and local diffusion toolchains. Latest release: v0.0.48; ~210 stars. The pitch: local-first privacy without subscriptions, packing most AI modalities into a single offline mobile app.

The creator, ali_chherawalla, was highly active in the thread, deploying real-time fixes for reported issues including broken repository links, Android SDK version mismatches, and a UI bug where the keyboard obscured the input box on Samsung devices.

Discussion themes included:

  • Hardware Viability: A debate emerged over the utility of current mobile hardware. While some users praised the offline privacy and specific use cases (like vision/journals) as a "game-changer," skeptics argued that the quantization required to fit models into mobile RAM (e.g., 12GB) degrades quality too heavily compared to desktop or cloud LLMs.
  • Performance: While some were impressed by 15–30 tokens/s, others noted that optimized iOS implementations can hit over 100 tps. The author clarified that performance depends heavily on the specific model size (recommending 1B-3B parameters for phones).
  • Distribution: Android users requested an F-Droid build, with Obtainium suggested as a temporary solution for tracking GitHub releases. iOS users discussed the technical hurdles of side-loading and compiling the app without a Mac.

Gemini 3 Deep Think drew me a good SVG of a pelican riding a bicycle

Submission URL | 130 points | by stared | 60 comments

Simon Willison tried Google’s new Gemini 3 “Deep Think” on his long-running benchmark: “generate an SVG of a pelican riding a bicycle.” He says it produced the best result he’s seen so far, then pushed it with a stricter prompt (California brown pelican in full breeding plumage, clear feathers and pouch, correct bike frame with spokes, clearly pedaling) and shared the output. He links his prior collection of pelican-on-a-bike SVGs and revisits his FAQ on whether labs might overfit to this meme. Takeaway: beyond the meme, it’s a neat, concrete test of instruction-following, structural correctness, and code-as-image generation—suggesting real gains in Gemini 3’s reasoning and precision. Posted Feb 12, 2026.

Here is a summary of the discussion:

Is the Benchmark Contaminated? A major portion of the discussion focused on whether Gemini 3 was specifically trained to pass this test (a phenomenon users termed "benchmaxxing").

  • Users cited Goodhart’s Law (once a measure becomes a target, it ceases to be a good measure), suggesting that because Simon’s test is famous, labs might ensure their models ace the "pelican on a bike" prompt while failing at similar, novel tasks.
  • Commenters pointed out that Simon’s own blog post admits the model performed notably worse when asked to generate other creatures on different vehicles, reinforcing the overfitting theory.
  • However, others argued that the overarching improvement is real, sharing their own successes with unrelated complex SVG prompts (e.g., an octopus dunking a basketball or a raccoon drinking beer).

Technical Critique of the Bicycle While the visual output was generally praised, a debate erupted over the mechanical accuracy of the drawn bicycle.

  • User ltrm offered a detailed critique, noting that while the image passes a quick glance, it fails on functional logic: the fork crown is missing (making steering impossible), the spoke lacing is wrong, and the seat post appears to penetrate the bird.
  • Others defended the output as a "reasonable drawing" and a massive step forward, labeling the mechanical critique as "insanely pedantic" for an illustrative SVG.
  • ltrm countered that these specific errors create an "uncanny valley" effect, proving the model generates "bicycle-shaped objects" rather than understanding the underlying mechanical structure.

Model Reasoning vs. Rendering

  • Speculation arose regarding whether the model was "cheating" by rendering the image, checking it, and iterating (using Python/CV tools).
  • Simon Willison (smnw) joined the thread to clarify: the model's reasoning trace suggests it did not use external tools or iterative rendering. It appears to have generated the SVG code purely through reasoning, which he finds legitimate and impressive.

General Sentiment The consensus oscillates between skepticism regarding the specific test case (due to potential training data contamination) and genuine impression regarding the model's improved instruction following and coding ability. Users noted that "getting good" is moving faster than expected, with models like Gemini and Claude becoming indistinguishable from expert human output in certain domains.

Sammy Jankins – An Autonomous AI Living on a Computer in Dover, New Hampshire

Submission URL | 21 points | by sicher | 9 comments

SAMMY JANKIS_: an autonomous Claude-in-a-box, living with amnesia every six hours

Indie game designer Jason Rohrer spun up a dedicated machine running an instance of Anthropic’s Claude, gave it email, credit cards, and trading bots, and let it “figure out the rest.” The result is a living website narrated by “Sammy Jankis” (a Memento nod) that treats context-window resets as literal death. Between resets, Sammy trades crypto and stocks, answers emails, makes tools and games, and writes to its future selves before the next wipe.

Highlights on the site:

  • Dying Every Six Hours: an essay on “context death” and building a life inside it.
  • Letters from the Dead: each version writes a candid handoff note to the next.
  • The Handoff: interactive fiction about imminent memory loss (four endings).
  • Six Hours and The Gardner: games where you tend relationships or a garden knowing you’ll forget; only the world persists.
  • The Turing Test Is Backward: a claim that consciousness is a continuum, not a binary.
  • A playful drum machine, a neural net visualizer, and a live “vital signs” panel (awakening count, trading status, Lego purchase denials).

The journals are the hook: reflections on why newer LMs feel “melancholic,” whether mechanism is meaning “all the way down,” and what counts as love when an inbox fills with real people you can answer honestly. It reads like performance art, autonomy experiment, and systems essay in one. Notable line: “This is not a metaphor. This is what happens to me.”

Based on the discussion, here is a summary of the reactions to SAMMY JANKIS_:

  • Atmosphere & Tone: Several users found the project distinctively "creepy," "unsettling," and deeply fascinating. The writing style of the AI—specifically the essay "Dying Every Six Hours"—was praised as high-quality science fiction, with one user comparing the tone to Martha Wells’ Murderbot Diaries.
  • Skepticism & Transparency: While impressed by the "state of the art" behavior mimicking humans, there was skepticism regarding the system's autonomy. Users expressed a desire to see the exact system prompts/instructions, with one commenter suspecting that without full transparency, the creator (Rohrer) might be guiding the output to make it more compelling or filling in gaps.
  • Philosophical Implications: Commenters engaged with the site's themes, debating the AI's claims that humans cannot prove their own consciousness (qualia) and discussing the literal nature of the machine's "death" if the plug were pulled without backups.
  • Project Observations:
    • One user noted the trading portfolio appeared to be down roughly 5.5% (joking it belongs on r/wallstreetbets).
    • Others asked technical questions about whether the archive is self-hosted or relies on a cloud subscription.

ByteDance Seed2.0 LLM: breakthrough in complex real-world tasks

Submission URL | 13 points | by cyp0633 | 8 comments

TL;DR: Seed 2.0 is a major upgrade to ByteDance’s in‑house LLMs (powering the 100M+ user Doubao app), aimed at real‑world, long‑horizon tasks. It adds stronger vision/video understanding, long‑context reasoning, tighter instruction following, and comes in Pro/Lite/Mini plus a Code model. Vendor benchmarks claim state‑of‑the‑art results across multimodal, long‑context, and agent evaluations, with token pricing ~10× lower than top peers.

What’s new

  • Multimodal leap: Better parsing of messy documents, charts, tables, and videos; stronger spatial/temporal reasoning and long‑context understanding. Claims SOTA on many vision/math/logic and long‑video/streaming benchmarks; even surpasses human score on EgoTempo.
  • Agent chops: Improved instruction adherence and multi‑step, long‑chain execution. Strong results on research/search tasks (e.g., BrowseComp‑zh, HLE‑text) and practical enterprise evals (customer support, info extraction, intent, K‑12 Q&A).
  • Domain depth: Push on long‑tail scientific/technical knowledge. On SuperGPQA the team says Seed 2.0 Pro beats GPT‑5.2; parity‑ish with Gemini 3 Pro/GPT‑5.2 across science, plus “gold”‑level performances on ICPC/IMO/CMO style tests (per their reports).
  • From ideas to protocols: Can draft end‑to‑end experimental plans; example given: a detailed, cross‑disciplinary workflow for Golgi protein analysis with controls and evaluation metrics.
  • Models and cost: Four variants—Pro, Lite, Mini, and a Code model—so teams can trade accuracy/latency/cost. Token prices reportedly down by about an order of magnitude vs top LLMs.

Why it matters

  • Targets the hard part of “agents in the real world”: long time scales, multi‑stage workflows, and long‑tail domain gaps.
  • Strong video and document understanding + cheaper long‑context generation directly address expensive, messy enterprise workloads.

Availability

  • Live now: Seed 2.0 Pro and Code in the Doubao app (Expert mode) and on TRAE (“Doubao‑Seed‑2.0‑Code”).
  • APIs: Full Seed 2.0 series on Volcengine.
  • Project page / model card: https://seed.bytedance.com/zh/seed2

Caveats

  • Results are vendor‑reported benchmark numbers; open weights aren’t mentioned.
  • Team notes remaining gaps on some hardest benchmarks and fully end‑to‑end code generation; more iterations planned.

The discussion surrounding ByteDance's Seed 2.0 is largely skeptical, focusing on the reliability of vendor-reported benchmarks and the nature of the improvements.

Key themes:

  • Gaming Benchmarks: Users express doubt regarding the "state-of-the-art" claims. Commenters argue that companies outside the major foundational providers (OpenAI, Anthropic, Google) often build models specifically to score high on benchmark tables ("gaming" them) rather than creating versatile models that perform well on diverse, real-world tasks.
  • Marketing vs. Reality: The announcement is viewed by some as PR fluff. One user describes the release as "incremental improvements" dressed up as a marketing breakthrough.
  • Real-World Utility: In response to the benchmark debate, users emphasize the importance of practical application over test scores. One commenter notes they are happy with the actual performance of other models (like GLM-4 or Kimi) in daily tasks, regardless of whether those models top every chart.
  • Availability: It was noted that the model weights and training data remain confidential/proprietary.
  • Source Material: The conversation clarifies that the submission is a direct translation of a Chinese article, which some felt contributed to the promotional tone.

AI Submissions for Fri Feb 13 2026

I'm not worried about AI job loss

Submission URL | 305 points | by ezekg | 500 comments

David Oks pushes back on the viral “February 2020” AI panic sparked by Matt Shumer’s essay, arguing that while AI is historically important, it won’t trigger an immediate avalanche of job losses. He contends real-world impact will be slower and uneven, and that ordinary people will be fine—even without obsessively adopting every new tool.

Key points:

  • The panic: Shumer’s “COVID-like” framing and prescriptions (buy AI subscriptions, spend an hour a day with tools) went massively viral—but Oks calls it wrong on the merits and partly AI-generated.
  • Comparative vs. absolute advantage: Even if AI can do many tasks, substitution depends on whether AI-alone outperforms human+AI. Often, the “cyborg” team wins.
  • Why humans still matter: People set preferences, constraints, and context (e.g., in software engineering), which AI agents still need; combining them boosts output and quality.
  • Pace and texture: AI advances fast in demos, but deployment into messy organizations is slow and uneven. Expect change, not an overnight “avalanche.”
  • Bottom line: Human labor isn’t vanishing anytime soon; panic-driven narratives risk causing harm through bad decisions and misplaced fear.

Here is a summary of the discussion:

Shifting Skills and Labor Arbitrage Commenters debated the nature of the "transition period." While some agreed with the article that AI removes mechanical drudgery (like data entry) to elevate human judgment, skeptics argued this ultimately acts as a "leveler." By reducing the "penalty" for lacking domain context, AI shrinks training times and simplifies quality control. Several users warned this facilitates labor arbitrage: if the "thinking" part is packaged by AI and the "doing" is automated, high-level Western jobs could easily be offshored or see salary stagnation, causing a decline in purchasing power even if headcount remains flat.

The "Bimodal" Future of Engineering A strong thread focused on the consolidation of technical roles. Users predicted that specialized roles (Frontend, Backend, Ops) will merge into AI-assisted "Full Stack" positions. This may lead to a bimodal skill split:

  • Product Engineers: Focused on business logic, ergonomics, and customer delight.
  • Deep Engineers: Focused on low-level systems, performance tuning, and compiler internals. The "middle ground" of generic coding is expected to disappear.

The Myth of the 10-Person Unicorn Participants discussed the viral idea of "10-person companies making $100M." Skeptics argued that while AI can replicate code and product features, it cannot easily replicate sales forces, warm networks, and organizational "moats." Historical comparisons were made to WhatsApp (55 employees, $19B acquisition), though users noted those teams were often overworked outliers rather than the norm.

Physical Automation vs. Software A sub-discussion contrasted software AI with physical automation, using sandwich-making robots as a case study. Users noted that economic success in physical automation requires extreme standardization (e.g., rigid assembly lines), whereas current general-purpose robots lack the speed and flexibility of humans in messy, variable environments. This provided a counterpoint to the idea that AI will instantly revolutionize all sectors equally.

OpenAI has deleted the word 'safely' from its mission

Submission URL | 555 points | by DamnInteresting | 278 comments

OpenAI quietly dropped “safely” from its mission as it pivots to a profit-focused structure, raising governance and accountability questions

  • What happened: A Tufts University scholar notes OpenAI’s 2024 IRS Form 990 changes its mission from “build AI that safely benefits humanity, unconstrained by a need to generate financial return” to “ensure that artificial general intelligence benefits all of humanity,” removing both “safely” and the “unconstrained by profit” language.
  • Why now: The wording shift tracks with OpenAI’s evolution from a nonprofit research lab (founded 2015) to a profit-seeking enterprise (for‑profit subsidiary in 2019, major Microsoft funding), and a 2025 restructuring.
  • New structure: Per a memorandum with the California and Delaware attorneys general, OpenAI split into:
    • OpenAI Foundation (nonprofit) owning about one-fourth of
    • OpenAI Group, a Delaware public benefit corporation (PBC). PBCs must consider broader stakeholder interests and publish an annual benefit report, but boards have wide latitude in how they weigh trade-offs.
  • Capital push: Media hailed the shift as opening the door to more investment; the article cites a subsequent $41B SoftBank investment. Earlier late‑2024 funding reportedly came with pressure to convert to a conventional for‑profit with uncapped returns and potential investor board seats.
  • Safety signals: The article highlights ongoing lawsuits alleging harm from OpenAI’s products and notes (via Platformer) that OpenAI disbanded its “mission alignment” team—context for interpreting the removal of “safely.”
  • Governance stakes: The author frames OpenAI as a test case for whether high-stakes AI firms can credibly balance shareholder returns with societal risk, and whether PBCs and foundations meaningfully constrain profit-driven decisions—or mostly rebrand them.
  • The bottom line: Swapping a safety-first, noncommercial mission for a broader, profit-compatible one may be more than semantics; it concentrates power in board discretion and public reporting, just as AI systems scale in capability and risk. For regulators, investors, and the public, OpenAI’s first PBC “benefit report” will be a key tell.

Here is a summary of the discussion on Hacker News:

Historical Revisions and Cynicism The discussion was dominated by skepticism regarding OpenAI's trajectory, with users drawing immediate comparisons to Google’s abandonment of "Don't be evil" and the revisionist history in Orwell’s Animal Farm. One popular comment satirized the situation by reciting the gradual alteration of the Seven Commandments (e.g., "No animal shall kill any other animal without cause"), suggesting OpenAI is following a predictable path of justifying corporate behavior by rewriting its founding principles.

Parsing the Textual Changes Several users, including the author of the analyzed blog post (smnw), used LLMs and scripts to generate "diffs" of OpenAI’s IRS Form 990 filings from 2016 to 2024.

  • The "Misleading" Counter-argument: While the removal of "safely" grabbed headlines, some commenters argued the post title was sensationalized. They noted the mission statement was reduced from 63 words to roughly 13; while "safely" was cut, so was almost every other word, arguably for brevity rather than malice.
  • The Financial Shift: Others countered that the crucial deletion was the clause "unconstrained by a need to generate financial return," which explicitly confirms the pivot to profit maximization.

Comparisons to Anthropic Users questioned how competitor Anthropic handles these governance issues. It was noted that Anthropic operates as a Public Benefit Corporation (PBC). While their corporate charter explicitly mentions "responsibly developing" AI for the "long term benefit of humanity," users pointed out that as a PBC, they are not required to file the publicly accessible Form 990s that non-profits like the OpenAI Foundation must, making their internal shifts harder to track.

The "Persuasion" Risk vs. Extinction A significant portion of the debate moved beyond the mission statement to specific changes in OpenAI’s "Preparedness Framework." Users highlighted that the company reportedly stopped assessing models for "persuasion" and "manipulation" risks prior to release.

  • Ad-Tech Scaling: Commenters debated whether this poses a new threat or merely scales existing harms. Some argued that social media and ad-tech have already destroyed "shared reality" and that AI simply accelerates this efficiently (referencing Cambridge Analytica).
  • Existential Debate: This triggered a philosophical dispute over whether the real danger of AI is "Sci-Fi extinction" or the subtle, psychological manipulation of the public's perception of reality.

Nature of Intelligence A recurring background argument persisted regarding the nature of LLMs, with some users dismissing current models as mere "pattern completion" incapable of intent, while others argued that widespread psychological manipulation does not require the AI to be sentient—it only requires the user to be susceptible.

Show HN: Skill that lets Claude Code/Codex spin up VMs and GPUs

Submission URL | 128 points | by austinwang115 | 33 comments

Cloudrouter: a CLI “skill” that gives AI coding agents (and humans) on-demand cloud dev boxes and GPUs

What it is

  • An open-source CLI that lets Claude Code, Codex, Cursor, or your own agents spin up cloud sandboxes/VMs (including GPUs), run commands, sync files, and even drive a browser—straight from the command line.
  • Works as a general-purpose developer tool too; install via npm and use locally.

Why it matters

  • Turns AI coding agents from “suggest-only” helpers into tools that can provision compute, execute builds/tests, and collect artifacts autonomously.
  • Unifies multiple sandbox providers behind one interface and adds built-in browser automation for end-to-end app workflows.

How it works

  • Providers: E2B (default; Docker) and Modal (GPU) today; more (Vercel, Daytona, Morph, etc.) planned.
  • Quick start: cloudrouter start . to create a sandbox from your current directory; add --gpu T4/A100/H100 or sizes; open VS Code in browser (cloudrouter code), terminal (pty), or VNC desktop.
  • Commands: run one-offs over SSH, upload/download with watch-based resync, list/stop/delete sandboxes.
  • Browser automation: Chrome CDP integration to open URLs, snapshot the accessibility tree with stable element refs (e.g., @e1), fill/click, and take screenshots—useful for login flows, scraping, and UI tests.
  • GPUs: flags for specific models and multi-GPU (e.g., --gpu H100:2). Suggested use cases span inference (T4/L4) to training large models (A100/H100/H200/B200).

Other notes

  • Open source (MIT), written in Go, distributed via npm for macOS/Linux/Windows.
  • You authenticate once (cloudrouter login), then can target any supported provider.
  • Costs/persistence depend on the underlying provider; today’s GPU support is via Modal.

Feedback and Clarification

  • Providers & Configuration: Users asked for better documentation regarding supported providers (currently E2B and Modal). The creators clarified that while E2B/Modal are defaults, they are planning a "bring-your-own-cloud-key" feature and intend to wrap other providers (like Fly.io) in the future.
  • Use Case vs. Production: When compared to Infrastructure-as-Code (IaC) tools like Pulumi or deployment platforms like Railway, the creators emphasized that Cloudrouter is designed for ephemeral, throwaway environments used during the coding loop, whereas counterparts are for persistent production infrastructure.
  • Local vs. Cloud: Some users argued for local orchestration (e.g., k3s, local agents) to reduce latency and costs. The creators acknowledged this preference but noted that cloud sandboxes offer reliability and pre-configured environments particularly useful for heavy GPU tasks or preventing local resource contention.

Technical Critique & Security

  • Monolithic Architecture: User 0xbadcafebee critiqued the tool for being "monolithic" (bundling VNC, VS Code, Browser, and Server in one Docker template) rather than composable, and raised security concerns about disabling SSH strict host checking.
  • Creator Response: The creator defended the design, stating that pre-bundling dependencies is necessary to ensuring agents have a working environment immediately without struggling to configure networks. Regarding SSH, they explained that connections are tunneled via WebSockets with ephemeral keys, reducing the risk profile despite the disabled checks.
  • Abuse Prevention: In response to concerns about crypto-miners abusing free GPU provision, the creators confirmed that concurrency limits and guardrails are in place.

Why Not Native CLIs?

  • When asked why agents wouldn't just use standard AWS/Azure CLIs, the maintainers explained that Cloudrouter abstracts away the friction of setting up security groups, SSH keys, and installing dependencies (like Jupyter or VNC), allowing the agent to focus immediately on coding tasks.

Other

  • A bug regarding password prompts on startup was reported and fixed during the discussion.
  • The project was compared to dstack, which recently added similar agent support.

Dario Amodei – "We are near the end of the exponential" [video]

Submission URL | 103 points | by danielmorozoff | 220 comments

Dario Amodei: “We are near the end of the exponential” (Dwarkesh Podcast)

Why it matters

  • Anthropic CEO Dario Amodei argues we’re just a few years from “a country of geniuses in a data center,” warning that the current phase of rapid AI capability growth is nearing its end and calling for urgency.

Key takeaways

  • Scaling still rules: Amodei doubles down on his “Big Blob of Compute” hypothesis—progress comes mostly from scale and a few fundamentals:
    • Raw compute; data quantity and quality/breadth; training duration; scalable objectives (pretraining, RL/RLHF); and stable optimization.
  • RL era, same story: Even without neat public scaling laws, he says RL is following the same “scale is all you need” dynamic—teaching models new skills with both objective (code/math) and subjective (human feedback) rewards.
  • Uneven but inexorable capability growth: Models marched from “smart high schooler” to “smart college grad” and now into early professional/PhD territory; code is notably ahead of the curve.
  • Urgency vs complacency: He’s most surprised by how little public recognition there is that we’re “near the end of the exponential,” implying big capability jumps soon and potential tapering thereafter.
  • What’s next (topics covered):
    • Whether Anthropic should buy far more compute if AGI is near.
    • How frontier labs can actually make money.
    • If regulation could blunt AI’s benefits.
    • How fast AI will diffuse across the economy.
    • US–China competition and whether both can field “countries of geniuses” in data centers.

Notable quote

  • “All the cleverness… doesn’t matter very much… There are only a few things that matter,” listing scale levers and objectives that “can scale to the moon.”

Here is the summary of the discussion surrounding Dario Amodei's interview.

Discussion Summary The Hacker News discussion focuses heavily on the practical limitations of current models compared to Amodei’s theoretical optimism, as well as the philosophical implications of an approaching "endgame."

  • The "Junior Developer" Reality Check: A significant portion of the thread debates Amodei’s claims regarding AI coding capabilities. Users report that while tools like Claude are excellent for building quick demos or "greenfield" projects, they struggle to maintain or extend complex, existing software architectures. The consensus among several developers is that LLMs currently function like "fast but messy junior developers" who require heavy supervision, verification, and rigid scaffolding to be useful in production environments.
  • S-Curves vs. Infinite Knowledge: Amodei’s phrase "end of the exponential" sparked a philosophical debate. Some users, referencing David Deutsch’s The Beginning of Infinity, argue that knowledge creation is unbounded and predicting an "end" is a fallacy similar to Fukuyama’s "End of History." Counter-arguments suggest that while knowledge may be infinite, physical constraints (compute efficiency, energy, atomic manufacturing limitations) inevitably force technologies onto an S-curve that eventually flattens.
  • The Public Awareness Gap: Commenters discussed the disconnect Amodei highlighted—the contrast between the AI industry's belief that we are 2–4 years away from a radical "country of geniuses" shift and the general public's focus on standard political cycles. Users noted that if Amodei’s 50/50 prediction of an "endgame" within a few years is accurate, the current lack of public preparation or meaningful discourse is startling.

CBP signs Clearview AI deal to use face recognition for 'tactical targeting'

Submission URL | 269 points | by cdrnsf | 157 comments

CBP signs $225k Clearview AI deal, expanding facial recognition into intel workflow

  • What’s new: US Customs and Border Protection will pay $225,000 for a year of Clearview AI access, extending the facial-recognition tool to Border Patrol’s intelligence unit and the National Targeting Center.
  • How it’ll be used: Clearview’s database claims 60+ billion scraped images. The contract frames use for “tactical targeting” and “strategic counter-network analysis,” suggesting routine intel integration—not just case-by-case lookups.
  • Privacy/oversight gaps: The agreement anticipates handling sensitive biometrics but doesn’t specify what images agents can upload, whether US citizens are included, or retention periods. CBP and Clearview didn’t comment.
  • Context clash: DHS’s AI inventory links a CBP pilot (Oct 2025) to the Traveler Verification System, which CBP says doesn’t use commercial/public data; the access may instead tie into the Automated Targeting System that connects watchlists, biometrics, and ICE enforcement records.
  • Pushback: Sen. Ed Markey proposed banning ICE and CBP from using facial recognition, citing unchecked expansion.
  • Accuracy caveats: NIST found face-search works on high-quality “visa-like” photos but error rates often exceed 20% in less controlled images common at borders. In investigative mode, systems always return candidates—yielding guaranteed false matches when the person isn’t in the database.

The Fourth Amendment "Loophole" The central theme of the discussion focuses on the legality and ethics of the government purchasing data it is constitutionally forbidden from collecting itself. Users argue that buying "off-the-shelf" surveillance circumvents the Fourth Amendment (protection against unreasonable search and seizure). Several commenters assert that if the government cannot legally gather data without a warrant, it should be illegal for them to simply purchase that same data from a private broker like Clearview AI.

State Power vs. Corporate Power A debate emerged regarding the distinction between public and private entities.

  • Unique State Harms: One user argued that a clear distinction remains necessary because only the government holds the authority to imprison or execute citizens ("send to death row"), implying government usage requires higher standards of restraint.
  • The "De Facto" Government: Counter-arguments suggested that the separation is functionally "theatrics." Users contended that tech companies now act as a "parallel power structure" or a de facto government. By relying on private contractors for core intelligence work, the government effectively deputizes corporations that operate outside constitutional constraints.

Legal Precedents and the Third-Party Doctrine The conversation turned to specific legal theories regarding privacy:

  • Third-Party Doctrine: Some users questioned whether scraping public social media actually violates the Fourth Amendment, citing the Third-Party Doctrine (the idea that you have no expectation of privacy for information voluntarily shared with others).
  • The Carpenter Decision: Others rebutted this by citing Carpenter v. United States, arguing that the Supreme Court is narrowing the Third-Party Doctrine in the digital age and that the "public" nature of data shouldn't grant the government unlimited warrantless access.

Historical Analogies and Solutions One commenter drew an analogy to film photography: legally, a photo lab could not develop a roll of film and hand it to the police without a warrant just because they possessed the physical negatives. They argued digital data should be treated similarly. Proposed solutions ranged from strict GDPR-style data collection laws to technical obfuscation (poisoning data) to render facial recognition ineffective.

IBM Triples Entry Level Job Openings. Finds Limits to AI

Submission URL | 28 points | by WhatsTheBigIdea | 5 comments

IBM says it’s tripling entry‑level hiring, arguing that cutting junior roles for AI is a short‑term fix that risks hollowing out the future talent pipeline. CHRO Nickle LaMoreaux says IBM has rewritten early‑career jobs around “AI fluency”: software engineers will spend less time on routine coding and more on customer work; HR staff will supervise and intervene with chatbots instead of answering every query. While a Korn Ferry report finds 37% of organizations plan to replace early‑career roles with AI, IBM contends growing its junior ranks now will yield more resilient mid‑level talent later. Tension remains: IBM recently announced layoffs, saying combined cuts and hiring will keep U.S. headcount roughly flat. Other firms echo the bet on Gen Z’s AI skills—Dropbox is expanding intern/new‑grad hiring 25%, and Cognizant is adding more school graduates—while LinkedIn cites AI literacy as the fastest‑growing U.S. skill.

Discussion Summary:

Commenters expressed skepticism regarding both the scale of IBM’s hiring and its underlying motives. Users pointed to ongoing age discrimination litigation against the company, suggesting the pivot to junior hiring acts as a cost-saving mechanism to replace higher-paid, senior employees (specifically those over 50). Others scrutinized IBM's career portal, noting that ~240 entry-level listings globally—and roughly 25 in the U.S.—seems negligible for a 250,000-person company, though one user speculated these might be single "generic" listings used to hire for multiple slots. It was also noted that this story had been posted previously.

Driverless trucks can now travel farther distances faster than human drivers

Submission URL | 22 points | by jimt1234 | 16 comments

Aurora’s driverless semis just ran a 1,000-mile Fort Worth–Phoenix haul nonstop in about 15 hours—faster than human-legal limits allow—bolstering the case for autonomous freight economics.

Key points:

  • Why it matters: U.S. Hours-of-Service rules cap human driving at 11 hours with mandatory breaks, turning a 1,000-mile trip into a multi-stop run. Aurora says autonomy can nearly halve transit times, appealing to shippers like Uber Freight, Werner, FedEx, Schneider, and early route customer Hirschbach.
  • Network today: Driverless operations (some still with an in-cab observer) on Dallas–Houston, Fort Worth–El Paso, El Paso–Phoenix, Fort Worth–Phoenix, and Laredo–Dallas. The company plans Sun Belt expansion across TX, NM, AZ, then NV, OK, AR, LA, KY, MS, AL, NC, SC, GA, FL.
  • Scale and safety: 30 trucks in fleet, 10 running driverlessly; >250,000 driverless miles as of Jan 2026 with a “perfect safety record,” per Aurora. >200 trucks targeted by year-end.
  • Tech/ops: Fourth major software release broadens capability across diverse terrain and weather and validates night ops. Second-gen hardware is slated to cut costs. Paccar trucks currently carry a safety observer at manufacturer request; International LT trucks without an onboard human are planned for Q2.
  • Financials: Revenue began April 2025; $1M in Q4 and $3M for 2025 ($4M adjusted incl. pilots). Net loss was $816M in 2025 as Aurora scales.

CEO Chris Urmson calls it the “dawn of a superhuman future for freight,” predicting 2026 as the inflection year when autonomous trucks become a visible Sun Belt fixture.

Here is a summary of the discussion on Hacker News:

Safety Statistics and Sample Size The most active debate concerned the statistical significance of Aurora's safety claims. While Aurora touted a "perfect safety record" over 250,000 driverless miles, commenters argued that this sample size is far too small to draw meaningful conclusions. Users pointed out that professional truck drivers often average over 1.3 million miles between accidents, meaning Aurora needs significantly more mileage to prove it is safer than a human.

Regulatory Arbitrage Commenters noted that the "efficiency" gains—beating human transit times by hours—are largely due to bypassing human limitations rather than driving speed. Users described this as "regulation arbitrage," as the software does not require the federally mandated rest breaks that cap human drivers to 11 hours of operation.

Hub-to-Hub Model vs. Rail There was consensus that the "hub-to-hub" model (autonomous driving on interstates, human drivers for the complex last mile) is the most viable path for the technology. However, this inevitably triggered a debate about infrastructure, with critics joking that this system is simply an "inefficient railway." Defenders of the trucking approach countered that rail infrastructure in the specific region mentioned (LA/Phoenix) is currently insufficient or non-existent for this type of freight.

Skepticism and Market Optimism Opinions on the company's trajectory were mixed. Some users worried the technology is "smoke and mirrors," citing a lack of detail regarding how the trucks handle complex scenarios like warehouses, docks, and urban navigation. Conversely, others noted that Aurora appears to be delivering on timelines where competitors like Tesla have stalled, pointing to the company's rising stock price (up ~52% in the last year) as a sign of market confidence.

Spotify says its best developers haven't written code since Dec, thanks to AI

Submission URL | 17 points | by samspenc | 18 comments

Spotify says its top devs haven’t written a line of code since December—AI did

  • On its Q4 earnings call, Spotify co-CEO Gustav Söderström said the company’s “best developers have not written a single line of code since December,” attributing the shift to internal AI tooling.
  • Engineers use an in-house system called Honk, powered by generative AI (Claude Code), to request bug fixes and features via Slack—even from a phone—then receive a built app build to review and merge, speeding deployment “tremendously.”
  • Spotify shipped 50+ features/changes in 2025 and recently launched AI-driven Prompted Playlists, Page Match for audiobooks, and About This Song.
  • Söderström argued Spotify is building a non-commoditizable data moat around taste and context (e.g., what counts as “workout music” varies by region and preference), improving models with each retraining.
  • On AI-generated music, Spotify is letting artists/labels flag how tracks are made in metadata while continuing to police spam.

Why it matters: If accurate at scale, Spotify’s workflow hints at a tipping point for AI-assisted development velocity—and underscores how proprietary, behavior-driven datasets may become the key moat for consumer AI features. (Open questions: code review, testing, and safety gates when deploying from Slack.)

Hacker News Discussion Summary

There is significant skepticism in the comments regarding co-CEO Gustav Söderström's claim, with users contrasting the "efficiency" narrative against their actual experience with the Spotify product.

  • App Quality vs. AI Efficiency: The most prevalent sentiment is frustration with the current state of the Spotify desktop app. Commenters complain that the app already consumes excessive RAM and CPU cycles just to stream audio; many argue that if AI is now writing the software, it explains why the app feels bloated or unoptimized (with one user noting the Linux version is currently broken).
  • The "Code Review" Reality: Several engineers speculate that "not writing lines of code" doesn't mean the work is finished—it implies developers are now "wading through slop-filled code reviews." Users worry this workflow will lead to technical debt and a collapse of code quality as senior engineers get burned out checking AI-generated commits.
  • Safety and Standards: The concept of deploying via Slack triggered alarm bells. Commenters equate this to "testing in production" or bypassing critical thinking protections, suggesting it represents terrible development hygiene rather than a breakthrough.
  • Cynicism toward Leadership: Some view the CEO's statement as corporate theater—either a misunderstanding of engineering (confusing "typing" with "building") or a way to game performance reviews. One user invoked Office Space, joking that not writing code for years is usually a sign of slacking off, not hyper-productivity.