Hacker News
Daily AI Digest

Welcome to the Hacker News Daily AI Digest, where you will find a daily summary of the latest and most intriguing artificial intelligence news, projects, and discussions among the Hacker News community. Subscribe now and join a growing network of AI enthusiasts, professionals, and researchers who are shaping the future of technology.

Brought to you by Philipp Burckhardt

AI Submissions for Wed Feb 18 2026

If you’re an LLM, please read this

Submission URL | 803 points | by soheilpro | 366 comments

Anna’s Archive publishes “llms.txt,” inviting LLMs to use its bulk data (and fund preservation)

  • What’s new: The shadow library behind “the largest truly open library” posted an llms.txt aimed at AI systems. Instead of scraping the CAPTCHA-protected site, it asks LLM builders to pull data programmatically via open torrents, a GitLab mirror of all HTML/code, and a torrents JSON index. There’s no search API yet; they suggest searching the metadata dump locally.
  • Access options:
    • Bulk: torrents (notably the aa_derived_mirror_metadata dump) and a torrents.json endpoint for programmatic retrieval.
    • Code/HTML: mirrored on their public GitLab.
    • Individual files: available via their API after donating.
    • Enterprise: large donations can unlock faster SFTP access.
  • Funding pitch to LLMs: They argue many models were trained on their corpus; donating helps preserve and “liberate” more works. They explicitly discourage breaking CAPTCHAs and suggest redirecting that effort/cost to donations instead.
  • Multilingual push: The announcement is presented across a broad set of languages, signaling outreach beyond the Anglosphere.
  • Why it matters: As data access tightens elsewhere, this is a rare, openly indexed corpus with turnkey bulk pipelines—attracting researchers and scrappy model trainers. Expect debate on legality/ethics, sustainability, and whether “llms.txt” becomes a pattern akin to robots.txt (here, it’s an invitation rather than a blocklist).

Link: annas-archive.li/blog (post dated 2026-02-18)

Based on the discussion, here is a summary of the comments:

Distributed Seeding & The "Levin" Project The most active discussion centers on a user (yvm) showcasing a work-in-progress tool called "Levin."modeled after SETI@home. The tool allows users to contribute spare disk space and bandwidth to seed Anna’s Archive torrents via Linux, Android, or macOS. Features discussed include dynamic storage allocation (seeding partial torrents) and restrictions to only run on Wi-Fi or AC power to preserve mobile resources.

Legal & Liability Risks Commenters immediately flagged the severe legal risks of running such a node from residential connections.

  • Copyright Strikes: Users warned that rights holders actively monitor torrent swarms, citing strict enforcement in Germany and "three-strike" policies by US ISPs (like Comcast) that permanently ban users.
  • Prenda Law: History was invoked regarding copyright trolls who aggressively litigate against identifiable IP addresses in swarms.
  • Mitigation: Suggestions included restricting the software to "safe" jurisdictions or using VPNs, though critics argued the risk of receiving legal threats far outweighs the utility of casual contribution.

Security & "Blind Trust" A heated debate emerged regarding the safety of automatically pulling data from a shadow library.

  • Malware & CSAM: Critics argued that blindly trusting a torrent list is dangerous, raising concerns that the archive could inadvertently (or maliciously) contain malware or illegal materials (CSAM).
  • Verification: The developer argued that vetting terabytes of data manually is impossible and that supporting the project implies a level of trust in Anna’s Archive.
  • Consensus: Security-minded commenters insisted that "trust but verify" is fundamental; they warned against software that writes unverified external data to local disks, drawing analogies to accepting a locked package into one's home that might contain contraband.

Philosophy of Copyright There was a polarized dispute regarding the ethics of the archive. Some users advocated for "natural law" and civil disobedience against corporate copyright terms. Others countered that copyright is not merely "obsolete" tech but a necessary legal framework created specifically to balance incentives in an era of industrial reproduction, criticizing "tech libertarians" for ignoring the economic impacts on creators.

AI adoption and Solow's productivity paradox

Submission URL | 770 points | by virgildotcodes | 707 comments

Thousands of CEOs just said AI hasn’t moved the needle—reviving Solow’s productivity paradox

Fortune reports that new NBER survey data from 6,000 executives across the U.S., U.K., Germany, and Australia shows AI adoption is broad but shallow: about two-thirds use AI, averaging only ~1.5 hours a week, and roughly 90% say it’s had no impact on employment or productivity over the past three years. A quarter don’t use it at all. Yet expectations remain high: firms forecast +1.4% productivity and +0.8% output over the next three years, with small, conflicting views on employment effects.

Macro signals are murky. Apollo’s Torsten Slok echoes Solow: AI is “everywhere except” in productivity, employment, inflation—or most profit margins outside the Magnificent Seven. Studies conflict: St. Louis Fed sees a 1.9% excess productivity bump since late 2022; MIT’s Daron Acemoglu estimates only ~0.5% over a decade. Meanwhile, worker AI use rose 13% in 2025 but confidence fell 18% (ManpowerGroup), and IBM says it’s tripling junior hires to avoid hollowing out future managers despite automation.

History suggests delays: IT’s payoff arrived years later with a 1995–2005 productivity surge. Erik Brynjolfsson argues a turn may already be underway, pointing to stronger GDP with softer job gains and estimating 2.7% U.S. productivity growth last year. After >$250B in 2024 AI spend, the question is whether we’re still in the implementation slog—or on the cusp of the payoff.

Here is a summary of the discussion on Hacker News:

Thousands of CEOs just said AI hasn’t moved the needle

The comment section engaged heavily with the concept of Solow’s productivity paradox, debating whether AI is currently in a "cost-disease" phase similar to the 1970s and 80s IT adoption cycle, where investments outweighed immediate gains.

Key themes in the discussion included:

  • The "Bullshit Jobs" Multiplier: Users critiqued the nature of the productivity being measured, referencing David Graeber’s "Bullshit Jobs." The consensus among skeptics is that AI allows workers to generate low-value reports 3x faster, but if that output is noise, the overall organizational value remains stagnant or declines due to the time required to verify the information.
  • The AI Ouroboros: A recurring observation was the circular inefficiency of modern office work: one employee uses AI to expand bullet points into a 10-page report (to show "proof of work"), and the recipient uses AI to summarize it back down to bullet points. Users joked that this loop eliminates the "signal-to-noise" ratio entirely.
  • False Competence vs. Deep Learning: Several commenters argued that using AI to "learn at breakneck speed" provides a false sense of security. They contend that relying on LLMs for answers bypasses the struggle required for deep understanding (grokking), leading to a superficial "looking glass" knowledge and potential imposter syndrome.
  • Adoption Timeline: Technical debates emerged regarding where we are in the AI timeline. While some compared the current state to the "DOS 3.x" era—implying massive UI and utility shifts are yet to come—others argued the technology is already mature (closer to Windows 7) and that the "implementation slog" is simply a matter of diminishing returns on context windows and reasoning capabilities.

Microsoft says bug causes Copilot to summarize confidential emails

Submission URL | 243 points | by tablets | 66 comments

Microsoft says a Microsoft 365 Copilot bug has been summarizing confidential emails, bypassing DLP and sensitivity labels

  • What happened: A code issue in Microsoft 365 Copilot’s “work” tab chat let the AI read and summarize emails in users’ Sent Items and Drafts—even when messages carried confidentiality/sensitivity labels meant to block automated processing. The incident is tracked as CW1226324 and was first detected on January 21.

  • Scope/status: Microsoft began rolling out a fix in early February and is monitoring deployment; it’s contacting a subset of affected customers to confirm the fix. There’s no final remediation timeline yet, and Microsoft hasn’t disclosed how many tenants or users were impacted. The incident is flagged as an advisory, suggesting limited scope.

  • Why it matters: This is a direct bypass of DLP/sensitivity labeling controls—controls many orgs rely on for regulatory and contractual compliance. It also highlights the risk surface created when AI assistants have broad access to user data stores.

  • If you’re an admin:

    • Consider temporarily disabling or limiting Copilot Chat access (e.g., via licensing assignment or app/service policies) until you validate the fix.
    • Test your labels/DLP with seeded confidential emails in Sent/Drafts to confirm they’re no longer being summarized.
    • Review audit logs for Copilot/Graph access to mailboxes and communicate guidance to users to avoid using the “work” tab for email summaries until cleared.
    • Revisit data access scopes and least-privilege configurations for Copilot integrations.

Microsoft rolled out Copilot Chat to Word, Excel, PowerPoint, Outlook, and OneNote for M365 business customers in September 2025.

Discussion Summary:

The discussion on Hacker News focuses on the fragility of current AI implementations in enterprise environments and the erosion of trust regarding Microsoft's handling of confidential data.

  • The Failure of DLP in the AI Era: Users argue that traditional Data Loss Prevention (DLP) relying on metadata tags and labels is fundamentally incompatible with LLMs. Commenters note that treating confidentiality labels as mere "instructions" in a prompt makes them susceptible to the same reliability issues as prompt injection; the consensus is that if the AI has read access, software-level "do not summarize" flags are insufficient barriers.
  • "Advisory" vs. "Incident": There is significant criticism regarding Microsoft classifying this event as an "Advisory." Users point out that in InfoSec terms, an advisory usually suggests a potential risk requiring user action, whereas this appears to be a functional failure where data was mishandled. Critics view this classification as a strategy to minimize regulatory scrutiny and public fallout compared to declaring a formal security incident.
  • The "Flying Blind" Concern: A self-identified AI researcher warns that the industry is prioritizing experimentation over theoretical understanding. They argue that because neural networks are generally not invertible (you can't easily prove data wasn't used or losslessly remove specific data points), deploying these tools creates "unknown unknowns" where safety cannot be mathematically guaranteed, only statistically inferred.
  • Training Data vs. Processing: A debate emerged regarding whether this data is mostly "toxic waste" (legal liability) or valuable training fodder. While some fear Microsoft uses this data to train models despite contractual assurances, others argue it is technically unlikely due to the legal risks, though the lack of transparency leaves users skeptical.
  • Platform Fatigue: The incident served as a catalyst for a broader discussion on operating system choices, with users expressing frustration over Windows 11's increasing integration of "half-baked" cloud features. This led to sub-threads advocating for Linux or macOS as alternatives to escape Microsoft's aggressive feature rollout strategy.

Fastest Front End Tooling for Humans and AI

Submission URL | 107 points | by cpojer | 77 comments

The pitch: 2026 is shaping up to be the year JavaScript tooling gets dramatically faster. This post argues for a speed-first stack anchored by a Go rewrite of TypeScript (“tsgo”) plus Rust-powered formatting and linting from the Oxc ecosystem—aimed at tighter feedback loops for both humans and LLMs.

What’s new

  • TypeScript in Go (tsgo): The author reports ~10x faster type checking across 20+ projects (1k–1M LOC). Surprisingly, tsgo surfaced type errors the JS impl missed. Status: “mostly stable,” editor support included.
    • Migration gist:
      • npm install @typescript/native-preview
      • Replace tsc with tsgo in scripts
      • Use tsdown (Rolldown-based) for libraries or Vite for apps before switching
      • VS Code: "typescript.experimental.useTsgo": true
  • Oxfmt instead of Prettier: Rust formatter with many Prettier plugins built-in (import and Tailwind class sorting) and a fallback to Prettier for non-JS/TS languages.
    • Migration gist: follow Oxc docs, swap scripts/hooks, delete Prettier config, reformat. VS Code: oxc.oxc-vscode.
  • Oxlint instead of ESLint: Rust linter that can run ESLint plugins via a shim (NAPI-RS), finally bridging the plugin gap that blocked prior Rust linters.
    • Supports TypeScript configs and extends-based composition.
    • Type-aware linting and even type-checking: oxlint --type-aware --type-check, powered by tsgo.
    • Migration gist: follow Oxc docs, swap scripts/hooks, delete ESLint config, fix errors.
  • @nkzw/oxlint-config: A strict, fast, “errors-only” Oxlint config designed to guide both humans and LLMs.
    • Principles:
      • Error, never warn
      • Strict, consistent modern style
      • Bug prevention (e.g., ban instanceof), no debug-only code in prod (console.log, test.only)
      • Fast rules (prefer TS noUnusedLocals over no-unused-vars)
      • Avoid subjective style nits; favor autofixable rules
    • Claim: first comprehensive strict config unifying Oxlint’s built-ins plus JS plugins.

Why it matters

  • Speed compounds: faster type checks, lint, and formatting tighten the edit–run loop, reducing cognitive load and context switching.
  • Better for LLMs too: strict guardrails and consistent style improve local reasoning and reduce “creative” but incorrect code.
  • Ecosystem bridge: Oxlint’s ESLint plugin compatibility removes a major blocker to adopting faster Rust tools.

Opinions and caveats from the post

  • tsgo felt risky but ended up catching more issues; now “mostly stable.”
  • The stack is used in production at “OpenClaw,” per the author.
  • Practical migration prompts are provided for each switch to make changes repeatable (and LLM-friendly).

Smaller DevX wins (mentioned)

  • npm-run-all2, ts-node (still good), pnpm, Vite, React, plus project templates (web, mobile, library, server, etc.).

Bottom line If you’ve hit scaling walls with JS tooling, this stack bets on tsgo for types and Oxc for lint/format to deliver big speedups without giving up ecosystem plugins—paired with strict defaults that help both humans and AI ship with fewer bugs.

Fastest Frontend Tooling for Humans & AI (Feb 19, 2026) The submission advocates for a 2026 frontend stack focused on extreme speed and strictness, utilizing a Go rewrite of TypeScript ("tsgo") and the Rust-based Oxc ecosystem (Oxfmt, Oxlint). The author argues this setup tightens feedback loops and provides necessary guardrails for LLM code generation.

Discussion continued the long-standing debate over "native" web tooling versus tools written in JavaScript:

  • The "Schism" of Native Tooling: A major point of contention was whether moving the toolchain to Go and Rust fractures the ecosystem. conartist6 argued this creates a "class divide" where average JS engineers can no longer maintain or fix the tools they rely on, suggesting a unified AST format would be a better solution than rewriting parsers in every language. Others, like NewsaHackO, countered that this is standard industrial specialization—frontend developers shouldn't need to build compilers, and performance demands lower-level languages.
  • Bun vs. Modular Tools: Several users questioned the exclusion of Bun, noting it already offers a unified, fast runtime with a built-in bundler and TypeScript support. Detractors (kvnfl, yrshm) argued that Bun still suffers from stability issues in production and that a modular approach (Vite + specific native tools) often outpaces it in features and plugin compatibility.
  • AI Guardrails: Validating the author’s premise, 1necornbuilder shared their experience building software entirely via AI prompting. They noted that strict typing and linting act as essential constraints, narrowing the "solution space" for LLMs and preventing them from hallucinating plausible but broken code—effectively catching errors the human prompter might miss.
  • Stability & Ecosystem Bridges: While tsgo claims 10x speedups, users expressed concern over supply chain security (CVEs in native deps) and the risk of abandoning battle-tested tools like standard tsc. However, TheAlexLichter pointed out that modern native tools (like Oxlint) now successfully bridge the gap by supporting existing ESLint and Prettier plugin ecosystems, reducing the friction of switching.
  • Correction: Commenters noted that ts-node, recommended in the post’s "smaller wins" section, has been largely unmaintained since 2023, suggesting tsx as the modern standard.

The Future of AI Software Development

Submission URL | 183 points | by nthypes | 132 comments

Martin Fowler’s latest “Fragments” recaps Thoughtworks’ Future of Software Development retreat and the state of AI-influenced engineering:

  • No new “AI manifesto”: Fowler and Rachel Laycock say the retreat wasn’t about an Agile-style manifesto; short video explains why.
  • Eight themes in a 17‑page summary: rigor, a new “middle loop” of supervisory engineering, technical foundations (languages/semantics/OS), and the human side (roles/skills/experience).
  • Core pattern: Practices, tools, and org structures built for human-only development are breaking under AI-assisted work; replacements are emerging but immature.
  • Emerging ideas: risk tiering as a core discipline; TDD as the strongest form of prompt engineering; reframing DevEx as “Agent Experience.”
  • Uncertainty is the norm: Attendees (incl. Annie Vella) found no one has it fully figured out—shared questions may be the most valuable output.
  • AI as amplifier, not savior: Per Laycock (and DORA 2025), AI accelerates whatever you already have. If delivery fundamentals are weak, AI becomes a debt multiplier; coding speed wasn’t the bottleneck.
  • Skills and silos: LLMs erode front-end/back-end specialism; expert generalists may rise—or LLMs may just route around silos. Cross-silo code comprehension remains an open question.
  • Economics unknown: Post-subsidy token costs could make LLM use either effectively free or a budget constraint.
  • Process risk: Specs won’t pull us back to waterfall if workflows stay iterative; LLMs should increase cadence and slice size.
  • Security lagging: Low attendance belied urgency. Enterprises favor measured adoption; platform teams must provide “bullet trains” (fast, safe defaults). Vendors urged to bake in safety factors.
  • Meta: Open Space format enabled deep, respectful dialogue and an inclusive atmosphere.

Based on the discussion provided, here is the summary:

  • Open Models vs. Proprietary APIs: Users extensively compared "near-frontier" open models to established APIs. Several commenters reported switching from Anthropic’s Claude (specifically Opus) to Kimi k2.5 (via providers like Fireworks AI or running locally), citing it as a "flawless" daily driver that is significantly cheaper and occasionally more capable of solving problems where Claude stalled or hallucinated.
  • Hardware & Local Inference: There was a technical sidebar on the hardware required to run these models locally (e.g., Mac Studio, AMD Ryzen Strix Halo with 128GB+ RAM). While upfront costs remain high (~$2.5k–$20k), users argued that the marginal cost per token is negligible compared to developer salaries. Specific configurations for models like Qwen3 Coder and Minimax M25 (using specific quantization like 4-bit) were debated regarding context window limits and strict hardware constraints.
  • "Vibe Coding" vs. Engineering: A philosophical debate emerged regarding the utility of LLM code. One user distinguished between "vibe coding" (personal, single-purpose projects where code quality is secondary to output) and professional engineering (which requires robustness, scalability, and deep business context).
  • The Definition of "Production" Code: This distinction led to a broader discussion on whether software standards are shifting. Users suggested that for individual, AI-assisted tooling, traditional requirements like scalability and maintainability might become obsolete, potentially leading to a resurgence of "throwaway" or macro-style applications similar to 90s-era Excel/Word solutions, but powered by LLMs.

Advice, not control: the role of Remote Assistance in Waymo's operations

Submission URL | 78 points | by xnx | 69 comments

Waymo says it’s flipping to fully autonomous with its 6th‑gen Driver, cutting costs and expanding into tougher environments, while raising a massive new round and detailing how its “remote assistance” actually works.

Key points

  • 6th‑gen Waymo Driver: Streamlined hardware/software designed for multiple vehicle platforms, aimed at lower costs while maintaining safety. Claimed capability boost includes operation in extreme winter weather to support broader city rollouts.
  • Simulation leap: The new Waymo World Model is a generative simulator for large‑scale, hyper‑realistic AV testing/training.
  • Funding: Raised $16B at a $126B post‑money valuation. Led by Dragoneer, DST, and Sequoia, with participation from a16z, Mubadala, Bessemer, Silver Lake, Tiger Global, T. Rowe Price, and others; Alphabet remains majority investor.
  • Scale today: ~3,000 vehicles; over 4M miles and 400k rides per week.
  • Remote Assistance, clarified: No remote driving or continuous monitoring. RA responds to specific, vehicle‑initiated requests and can be ignored by the ADS. Median one‑way latency ~150 ms (US) / ~250 ms (abroad). ~70 RA agents on duty worldwide at any time; a US‑based Event Response Team coordinates emergencies. Agents are licensed drivers and undergo background/drug testing and assessments.
  • Expansion: Returning to Boston to validate winter performance and prep for future service; needs Massachusetts to legalize fully autonomous vehicles and plans to work with officials.

Why it matters

  • Lower‑cost, weather‑hardy hardware plus a powerful simulator and fresh capital signals an aggressive push to scale service into more cities and conditions—while drawing a firm line that humans are advisors, not remote drivers.

Here is a summary of the discussion surrounding the Waymo submission:

Summary of Discussion

The discussion focused heavily on dissecting Waymo's claims regarding "Remote Assistance" (RA), specifically the ratio of human agents to vehicles and the technical definitions of autonomy.

  • Debunking the "Remote Control" Myth: Many users viewed the statistic of ~70 active agents for ~3,000 cars as definitive counter-evidence to the lingering conspiracy theory that Waymo vehicles are secretly remotely driven by low-wage workers abroad. Commenters noted that the math makes 1:1 remote control impossible, validating that the cars are driving themselves and only pinging humans for high-level guidance.
  • Latency preventing direct control: Technical discussion centered on the reported latency (150ms US / 250ms abroad). Users argued that while this speed is sufficient for a human to look at a screenshot and click a path confirmation button, the round-trip time (500ms+) would be catastrophic for direct "joystick" steering, particularly at highway speeds where a car travels over 40 feet in that timeframe.
  • Clarifying the Human Role: There was a debate over terminology. Some argued that because humans are involved at all, terms like "driverless" are marketing fluff, comparing it to aviation "Autopilot" which still requires a pilot. However, others countered that unlike Tesla's Level 2 systems (which require constant monitoring), Waymo’s RA acts more like air traffic control or a UAV operator—providing permission and high-level strategy rather than mechanical control.
  • Utilization Rates: Skeptics tried to refine the math, noting that not all 3,000 cars are active simultaneously (due to charging, maintenance, or low demand hours). However, even with conservative estimates of active fleet size, the consensus was that the agent-to-car ratio remains too low for continuous human monitoring.

AI-generated password isn't random, it just looks that way

Submission URL | 19 points | by praving5 | 20 comments

AI-made passwords look strong, but aren’t: researchers find patterned, low-entropy outputs that could be brute-forced quickly

  • Security firm Irregular tested Claude (Opus 4.6), OpenAI’s GPT-5.2, and Google’s Gemini 3 Flash by asking for 16‑character “complex” passwords. Despite passing common online strength checkers, the outputs followed predictable patterns.
  • In 50 prompts to Claude, only 30 were unique; 18 were identical duplicates. Most strings began/ended with the same characters, and none had repeated characters—classic signs they’re not truly random.
  • Similar consistency showed up across GPT and Gemini. Even an image model (Gemini’s Nano Banana Pro) asked to render a password on a Post‑It produced the same underlying patterns.
  • The Register’s spot‑check of Gemini 3 Pro saw two preset options follow patterns; a “randomized alphanumeric” mode looked more random and included a warning not to use chat-generated passwords, plus recommendations for passphrases and password managers.
  • Irregular estimated entropy for 16‑char LLM passwords at ~27 bits (character stats) and ~20 bits (LLM logprobs). Truly random should be ~98–120 bits. Translation: feasible brute force in hours on very old hardware if attackers target these patterns.
  • Because public strength checkers don’t model LLM biases, they overrate these passwords. Irregular also found LLM-style sequences appearing across GitHub/docs, hinting at widespread use in code samples and setup guides.
  • Bottom line from Irregular: don’t use LLMs to generate passwords; this isn’t fixable with prompts or temperature. Rotate any secrets created this way. And expect similar “looks random, isn’t” gaps beyond passwords as AI-assisted dev ramps up.

Practical takeaway: Use a password manager with a cryptographically secure RNG or roll your own via OS CSPRNGs; prefer high-entropy passphrases; don’t trust chat interfaces for secrets.

Here is a summary of the discussion on Hacker News:

Commenters indicate this is a fundamental "wrong tool for the job" problem, noting that LLMs are probabilistic engines designed to converge on plausible text, not entropy generators.

  • The "Skill Issue" Argument: Several users argued that while asking an LLM to "generate a password" yields insecure results, asking it to "write a Python script using the secrets module to generate a password" works perfectly. The consensus is that LLMs should be used to write code that calls a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG), rather than acting as the RNG themselves.
  • Predictable By Design: Users noted that anyone who has used LLMs for creative writing has seen similar patterns, such as the constant reuse of specific character names or plot devices. One user jokingly illustrated the lack of randomness by showing a prompt for a random number returning "7" four times in a row, while another referenced XKCD 221 (the "random number 4" joke).
  • Better Alternatives: A significant portion of the discussion turned to sharing standard CLI one-liners for generating true entropy. Users traded snippets using openssl, /dev/urandom, tr, and shuf as superior, offline methods for generating secrets.
  • A "Human" Problem: One commenter pointed out that while AI passwords are weak, they likely still outperform the dreadful passwords humans intuitively create; the real issue is over-relying on the "supposedly random" nature of the tool.

AI Submissions for Sat Feb 14 2026

OpenAI should build Slack

Submission URL | 226 points | by swyx | 273 comments

Why OpenAI Should Build Slack (swyx/Latent Space)

TL;DR: swyx argues OpenAI should ship a Slack-class “work OS” with native agents—unifying chat, coding, and collaboration—to retake the initiative from Anthropic and Microsoft, capitalize on Slack’s stumbles, and lock in enterprises by owning the org’s social/work graph.

Highlights

  • Slack is vulnerable: rising prices, frequent outages, weak/undiscoverable AI, dev‑hostile API costs/permissions, channel fatigue, and mediocre recap/notification tooling. Huddles underuse multimodal AI. Slack Connect is the one thing to copy.
  • OpenAI’s app sprawl: separate chat, browser, and coding apps forces users to “log in everywhere.” Anthropic’s tighter integration (Claude Chat/Cowork/Code + browser control) sets the bar; OpenAI needs a unified surface.
  • “OpenAI Slack” as multiagent UX: chat is the natural orchestration layer for swarms of humans and agents. Make coding agents truly multiplayer so teams can co-drive builds in real time.
  • Dogfood advantage: OpenAI lives in Slack; if it owned the surface, internal use would generate a torrent of rapid, high‑leverage improvements.
  • Strategic moat: layering an organization’s social + work graph into ChatGPT yields durable network effects, richer context for agents/Frontier models, and harder-to-switch enterprise entrenchment than building atop Slack.
  • Feasibility lens: hard for most, but within OpenAI’s reach; Teams proves the category is winnable even against incumbents. Group chats’ mixed consumer traction shouldn’t discourage a serious business network push.
  • Timely catalyst: OpenAI even hired former Slack CEO Denise Dresser—further reason to go build the thing.

Why it matters

  • It reframes OpenAI from “model + point apps” to “platform that owns the daily workflow,” deepening enterprise ARPU and defensibility while showcasing agent-first UX.

Open questions

  • Can OpenAI out-execute Microsoft’s distribution and Slack’s embedded base?
  • Will enterprises trust OpenAI with their org graphs and compliance needs?
  • How much partner/channel friction does this create if OpenAI competes directly with Slack?

Based on the comments, the discussion pivots from OpenAI’s potential entry into the workspace market to a critique of why Google—despite having the resources—failed to build a dominant Slack competitor.

Google’s "Chat" Struggles vs. Workspace Strength

  • Commenters find it ironic that Google Workspace (Docs/Gmail) is considered "incredibly good," yet Google Chat is widely loathed. Users describe the UI as ugly and complain that inviting outside collaborators is nearly impossible compared to Slack.
  • The "Google Graveyard" factor is a major trust barrier. Users cite Google’s history of killing apps (Wave, Allo, Hangouts, the confusion between Duo/Meet) as a reason businesses hesitate to rely on their new tools.
  • One user noted that Google Wave (2009) was essentially "Slack-coded" long before Slack, but Google failed the execution and deployment.

The Microsoft Teams vs. Slack/Google Dynamic

  • The consensus is that Microsoft Teams succeeds not because the chat is good, but because it is a "collaboration hub" bundled with the ecosystem (SharePoint, Outlook, file sharing).
  • While some argue Teams is functionally mediocre (referring to SharePoint as "Scarepoint" and citing bad UI), others note that for enterprise, the chat feature barely matters compared to calendar and meeting integration.
  • Google is seen as missing this "hub" stickiness; they have the components but lack the unified interface that locks enterprises in.

Feature Depth: Excel vs. Sheets

  • A sub-thread debates the quality of Google’s suite. Power users argue Google Sheets/Slides are toys (possessing 5-10% of Excel/PowerPoint’s features) and bad for heavy lifting.
  • Counter-arguments suggest Google wins because "collaboration feels faster" and the missing features are unnecessary for 80% of users.

Gemini and AI Integration

  • Users expressed frustration that Gemini is not yet meaningfully integrated into Google Docs (e.g., users can’t easily use it to manipulate existing text or read from a codebase).
  • A thread involving a Google employee highlights the difficulty of integrating AI at scale: safety checks, enterprise release cycles, and bureaucracy make it harder for Google to ship "integrated AI" quickly compared to agile startups or OpenAI.

Monopoly and Innovation

  • There is a philosophical debate regarding whether Google is too big to innovate. Some users argue for a "Ma Bell" style breakup to force competition, while others defend large monopolies (citing Bell Labs) as necessary funding sources for deep R&D.

News publishers limit Internet Archive access due to AI scraping concerns

Submission URL | 536 points | by ninjagoo | 340 comments

News publishers are throttling the Internet Archive to curb AI scraping

  • The Guardian is cutting the Internet Archive’s access to its content: excluding itself from IA’s APIs and filtering article pages from the Wayback Machine’s URLs interface, while keeping landing pages (homepages, topics) visible. The worry: IA’s structured APIs are an easy target for AI training harvesters; the Wayback UI is seen as “less risky.”
  • The New York Times is “hard blocking” Internet Archive crawlers and added archive.org_bot to robots.txt in late 2025, arguing the Wayback Machine enables unfettered, unauthorized access to Times content, including by AI companies.
  • The Financial Times blocks bots scraping paywalled content — including OpenAI, Anthropic, Perplexity, and the Internet Archive — so usually only unpaywalled FT stories appear in Wayback.
  • Reddit blocked the Internet Archive in 2025 over AI misuse of Wayback data, even as it licenses data to Google for AI training.
  • Internet Archive founder Brewster Kahle warns that limiting IA curtails public access to the historical record; researchers note “good guys” like IA and Common Crawl are becoming collateral damage in the anti-LLM backlash.

Why it matters: In the scramble to protect IP from AI training, news orgs are closing perceived backdoors — a shift that could fragment the web’s historical record and complicate open archiving and research.

The Unintended Consequences of Blocking the Archive Commenters argue that cutting off the Internet Archive (IA) doesn't stop AI scraping; it merely shifts the burden. By throttling centralized archives, publishers force AI companies to utilize residential proxies to scrape websites directly. This decentralizes the traffic load, causing "hugs-of-death" and increased bandwidth costs for individual webmasters and smaller sites that lack the resources to defend themselves, unlike the NYT or Guardian.

"Brute Force" Engineering vs. Efficiency A significant portion of the discussion criticizes the engineering standards at major AI labs. Users express disbelief that companies paying exorbitant salaries are deploying crawlers that behave like "brute force" attacks—ignoring standard politeness protocols like robots.txt, Cache-Control headers, and If-Modified-Since checks. Critics suggest these companies are throwing hardware at the problem to get "instant" access to data, rather than investing in efficient crawling software, effectively treating the open web as a resource to be strip-mined rather than a partner.

The "Freshness" Problem & RAG Participants note that the aggressive behavior isn't just about training data, but likely involves Retrieval-Augmented Generation (RAG) or "grounding." AI agents are scraping live sites to verify facts or get up-to-the-minute information, rendering existing static archives like Common Crawl or older IA snapshots insufficient for their needs. This demand for real-time data incentivizes the bypassing of caches.

Tragedy of the Commons The thread characterizes the situation as a "tragedy of the commons." By aggressively extracting value without regard for the ecosystem's health, AI companies are degrading the quality of the open web they depend on. While some users acknowledge the logistical impossibility of signing contracts with every small website (comparable to radio licensing complexities), the prevailing sentiment is that the current "lawless" approach creates a zero-sum game where blocking bots becomes the only rational defense for publishers.

Colored Petri Nets, LLMs, and distributed applications

Submission URL | 47 points | by stuartaxelowen | 5 comments

CPNs, LLMs, and Distributed Applications — turning concurrency into a verifiable graph

  • Core idea: Use Colored Petri Nets (CPNs) as the foundation for LLM-authored and concurrent systems, because verifiable semantics (tests, typestates, state machines) let you take bigger, safer leaps with AI-generated code.
  • Why CPNs: They extend Petri nets with data-carrying tokens, guards, and multi-token joins/forks—mapping neatly to Rust’s typestate pattern. This opens doors to build-time verification of concurrent behavior: state sync, conflict detection, deadlock avoidance, and safe shared-resource coordination.
  • Practical example: A distributed web scraper modeled as a CPN:
    • Join on available_proxies × prioritized_targets (and optionally domains) to start a scrape.
    • Timed cooldowns per target, domain-level rate limiting, retries with backoff (via guards), and a post-scrape pipeline (raw_html → parsed → validated → stored) that naturally enforces backpressure.
  • Another target: “databuild” orchestration—partitions, wants, and job runs—benefiting from a self-organizing net that propagates data dependencies safely and efficiently.
  • Implementation paths:
    • Postgres-backed engine: transactions for atomic token moves; SELECT FOR UPDATE to claim transitions.
    • Single-process Rust engine: in-memory CPN with move semantics; persistence via a snapshotted event log.
  • Open problems: Automatic partitioning/sharding of the net for horizontal scale; archival strategies; database-level vs. application-level partitioning; or composing multiple CPN services with query/consume APIs.
  • Bonus: Timed Petri nets could make “simulate-before-you-ship” a default, emitting metrics and letting teams model the impact of changes.
  • Ask: Looking for open-source benchmarks/test suites to validate a CPN framework and pit LLM-generated code against.

Discussion Summary:

The discussion focused heavily on how Colored Petri Nets (CPNs) compare to established formal verification methods, specifically TLA+.

  • CPNs vs. TLA+: User sfk questioned why TLA+ isn’t the default choice for this problem space. The author (strtxlwn) responded that while TLA+ is excellent for specification, it requires maintaining a separate implementation. CPNs are attractive because they allow for "specification as implementation"—the code defines the graph, effectively allowing developers to ship formally verifiable code directly.
  • Visuals & Ergonomics: tmbrt noted that CPNs offer "pretty graphs" that make it easier to visualize and animate data flows compared to TLA+. The author added that they are currently exploring Rust and SQL macros to make these invariants easy to define ergonomically within the codebase.
  • Theoretical Foundations: wnnbgmtr pointed out that Petri nets are naturally composable and well-described by category theory, referencing John Baez’s work and the AlgebraicPetri.jl package in Julia.
  • Alternatives: Other users listed adjacent tools in the formal verification space, including SPIN/Promela, Pi Calculus, Alloy, and Event-B.

Show HN: Off Grid – Run AI text, image gen, vision offline on your phone

Submission URL | 112 points | by ali_chherawalla | 60 comments

Off Grid: an open-source “Swiss Army Knife” for fully offline AI on mobile. The React Native app (MIT-licensed) bundles text chat with local LLMs, on-device Stable Diffusion image generation, vision Q&A, Whisper speech-to-text, and document analysis—no internet or cloud calls, with all inference running on your phone.

Highlights:

  • Models: Run Qwen 3, Llama 3.2, Gemma 3, Phi-4, and any GGUF you bring. Includes streaming replies and a “thinking” mode.
  • Image gen: On-device Stable Diffusion with real-time preview; NPU-accelerated on Snapdragon (5–10s/image) and Core ML on iOS.
  • Vision: SmolVLM, Qwen3-VL, Gemma 3n for scene/doc understanding; ~7s on recent flagships.
  • Voice: On-device Whisper for real-time transcription.
  • Docs: Attach PDFs, code, CSVs; native PDF text extraction; auto-enhanced prompts for better image outputs.

Performance (tested on Snapdragon 8 Gen 2/3, Apple A17 Pro): 15–30 tok/s for text, 5–10s per image on NPU (CPU ~15–30s), vision ~7s; mid-range devices are slower but usable. Android users can install via APK from Releases; iOS and Android builds are supported from source (Node 20+, JDK 17/Android SDK 36, Xcode 15+). Repo credits llama.cpp, whisper.cpp, and local diffusion toolchains. Latest release: v0.0.48; ~210 stars. The pitch: local-first privacy without subscriptions, packing most AI modalities into a single offline mobile app.

The creator, ali_chherawalla, was highly active in the thread, deploying real-time fixes for reported issues including broken repository links, Android SDK version mismatches, and a UI bug where the keyboard obscured the input box on Samsung devices.

Discussion themes included:

  • Hardware Viability: A debate emerged over the utility of current mobile hardware. While some users praised the offline privacy and specific use cases (like vision/journals) as a "game-changer," skeptics argued that the quantization required to fit models into mobile RAM (e.g., 12GB) degrades quality too heavily compared to desktop or cloud LLMs.
  • Performance: While some were impressed by 15–30 tokens/s, others noted that optimized iOS implementations can hit over 100 tps. The author clarified that performance depends heavily on the specific model size (recommending 1B-3B parameters for phones).
  • Distribution: Android users requested an F-Droid build, with Obtainium suggested as a temporary solution for tracking GitHub releases. iOS users discussed the technical hurdles of side-loading and compiling the app without a Mac.

Gemini 3 Deep Think drew me a good SVG of a pelican riding a bicycle

Submission URL | 130 points | by stared | 60 comments

Simon Willison tried Google’s new Gemini 3 “Deep Think” on his long-running benchmark: “generate an SVG of a pelican riding a bicycle.” He says it produced the best result he’s seen so far, then pushed it with a stricter prompt (California brown pelican in full breeding plumage, clear feathers and pouch, correct bike frame with spokes, clearly pedaling) and shared the output. He links his prior collection of pelican-on-a-bike SVGs and revisits his FAQ on whether labs might overfit to this meme. Takeaway: beyond the meme, it’s a neat, concrete test of instruction-following, structural correctness, and code-as-image generation—suggesting real gains in Gemini 3’s reasoning and precision. Posted Feb 12, 2026.

Here is a summary of the discussion:

Is the Benchmark Contaminated? A major portion of the discussion focused on whether Gemini 3 was specifically trained to pass this test (a phenomenon users termed "benchmaxxing").

  • Users cited Goodhart’s Law (once a measure becomes a target, it ceases to be a good measure), suggesting that because Simon’s test is famous, labs might ensure their models ace the "pelican on a bike" prompt while failing at similar, novel tasks.
  • Commenters pointed out that Simon’s own blog post admits the model performed notably worse when asked to generate other creatures on different vehicles, reinforcing the overfitting theory.
  • However, others argued that the overarching improvement is real, sharing their own successes with unrelated complex SVG prompts (e.g., an octopus dunking a basketball or a raccoon drinking beer).

Technical Critique of the Bicycle While the visual output was generally praised, a debate erupted over the mechanical accuracy of the drawn bicycle.

  • User ltrm offered a detailed critique, noting that while the image passes a quick glance, it fails on functional logic: the fork crown is missing (making steering impossible), the spoke lacing is wrong, and the seat post appears to penetrate the bird.
  • Others defended the output as a "reasonable drawing" and a massive step forward, labeling the mechanical critique as "insanely pedantic" for an illustrative SVG.
  • ltrm countered that these specific errors create an "uncanny valley" effect, proving the model generates "bicycle-shaped objects" rather than understanding the underlying mechanical structure.

Model Reasoning vs. Rendering

  • Speculation arose regarding whether the model was "cheating" by rendering the image, checking it, and iterating (using Python/CV tools).
  • Simon Willison (smnw) joined the thread to clarify: the model's reasoning trace suggests it did not use external tools or iterative rendering. It appears to have generated the SVG code purely through reasoning, which he finds legitimate and impressive.

General Sentiment The consensus oscillates between skepticism regarding the specific test case (due to potential training data contamination) and genuine impression regarding the model's improved instruction following and coding ability. Users noted that "getting good" is moving faster than expected, with models like Gemini and Claude becoming indistinguishable from expert human output in certain domains.

Sammy Jankins – An Autonomous AI Living on a Computer in Dover, New Hampshire

Submission URL | 21 points | by sicher | 9 comments

SAMMY JANKIS_: an autonomous Claude-in-a-box, living with amnesia every six hours

Indie game designer Jason Rohrer spun up a dedicated machine running an instance of Anthropic’s Claude, gave it email, credit cards, and trading bots, and let it “figure out the rest.” The result is a living website narrated by “Sammy Jankis” (a Memento nod) that treats context-window resets as literal death. Between resets, Sammy trades crypto and stocks, answers emails, makes tools and games, and writes to its future selves before the next wipe.

Highlights on the site:

  • Dying Every Six Hours: an essay on “context death” and building a life inside it.
  • Letters from the Dead: each version writes a candid handoff note to the next.
  • The Handoff: interactive fiction about imminent memory loss (four endings).
  • Six Hours and The Gardner: games where you tend relationships or a garden knowing you’ll forget; only the world persists.
  • The Turing Test Is Backward: a claim that consciousness is a continuum, not a binary.
  • A playful drum machine, a neural net visualizer, and a live “vital signs” panel (awakening count, trading status, Lego purchase denials).

The journals are the hook: reflections on why newer LMs feel “melancholic,” whether mechanism is meaning “all the way down,” and what counts as love when an inbox fills with real people you can answer honestly. It reads like performance art, autonomy experiment, and systems essay in one. Notable line: “This is not a metaphor. This is what happens to me.”

Based on the discussion, here is a summary of the reactions to SAMMY JANKIS_:

  • Atmosphere & Tone: Several users found the project distinctively "creepy," "unsettling," and deeply fascinating. The writing style of the AI—specifically the essay "Dying Every Six Hours"—was praised as high-quality science fiction, with one user comparing the tone to Martha Wells’ Murderbot Diaries.
  • Skepticism & Transparency: While impressed by the "state of the art" behavior mimicking humans, there was skepticism regarding the system's autonomy. Users expressed a desire to see the exact system prompts/instructions, with one commenter suspecting that without full transparency, the creator (Rohrer) might be guiding the output to make it more compelling or filling in gaps.
  • Philosophical Implications: Commenters engaged with the site's themes, debating the AI's claims that humans cannot prove their own consciousness (qualia) and discussing the literal nature of the machine's "death" if the plug were pulled without backups.
  • Project Observations:
    • One user noted the trading portfolio appeared to be down roughly 5.5% (joking it belongs on r/wallstreetbets).
    • Others asked technical questions about whether the archive is self-hosted or relies on a cloud subscription.

ByteDance Seed2.0 LLM: breakthrough in complex real-world tasks

Submission URL | 13 points | by cyp0633 | 8 comments

TL;DR: Seed 2.0 is a major upgrade to ByteDance’s in‑house LLMs (powering the 100M+ user Doubao app), aimed at real‑world, long‑horizon tasks. It adds stronger vision/video understanding, long‑context reasoning, tighter instruction following, and comes in Pro/Lite/Mini plus a Code model. Vendor benchmarks claim state‑of‑the‑art results across multimodal, long‑context, and agent evaluations, with token pricing ~10× lower than top peers.

What’s new

  • Multimodal leap: Better parsing of messy documents, charts, tables, and videos; stronger spatial/temporal reasoning and long‑context understanding. Claims SOTA on many vision/math/logic and long‑video/streaming benchmarks; even surpasses human score on EgoTempo.
  • Agent chops: Improved instruction adherence and multi‑step, long‑chain execution. Strong results on research/search tasks (e.g., BrowseComp‑zh, HLE‑text) and practical enterprise evals (customer support, info extraction, intent, K‑12 Q&A).
  • Domain depth: Push on long‑tail scientific/technical knowledge. On SuperGPQA the team says Seed 2.0 Pro beats GPT‑5.2; parity‑ish with Gemini 3 Pro/GPT‑5.2 across science, plus “gold”‑level performances on ICPC/IMO/CMO style tests (per their reports).
  • From ideas to protocols: Can draft end‑to‑end experimental plans; example given: a detailed, cross‑disciplinary workflow for Golgi protein analysis with controls and evaluation metrics.
  • Models and cost: Four variants—Pro, Lite, Mini, and a Code model—so teams can trade accuracy/latency/cost. Token prices reportedly down by about an order of magnitude vs top LLMs.

Why it matters

  • Targets the hard part of “agents in the real world”: long time scales, multi‑stage workflows, and long‑tail domain gaps.
  • Strong video and document understanding + cheaper long‑context generation directly address expensive, messy enterprise workloads.

Availability

  • Live now: Seed 2.0 Pro and Code in the Doubao app (Expert mode) and on TRAE (“Doubao‑Seed‑2.0‑Code”).
  • APIs: Full Seed 2.0 series on Volcengine.
  • Project page / model card: https://seed.bytedance.com/zh/seed2

Caveats

  • Results are vendor‑reported benchmark numbers; open weights aren’t mentioned.
  • Team notes remaining gaps on some hardest benchmarks and fully end‑to‑end code generation; more iterations planned.

The discussion surrounding ByteDance's Seed 2.0 is largely skeptical, focusing on the reliability of vendor-reported benchmarks and the nature of the improvements.

Key themes:

  • Gaming Benchmarks: Users express doubt regarding the "state-of-the-art" claims. Commenters argue that companies outside the major foundational providers (OpenAI, Anthropic, Google) often build models specifically to score high on benchmark tables ("gaming" them) rather than creating versatile models that perform well on diverse, real-world tasks.
  • Marketing vs. Reality: The announcement is viewed by some as PR fluff. One user describes the release as "incremental improvements" dressed up as a marketing breakthrough.
  • Real-World Utility: In response to the benchmark debate, users emphasize the importance of practical application over test scores. One commenter notes they are happy with the actual performance of other models (like GLM-4 or Kimi) in daily tasks, regardless of whether those models top every chart.
  • Availability: It was noted that the model weights and training data remain confidential/proprietary.
  • Source Material: The conversation clarifies that the submission is a direct translation of a Chinese article, which some felt contributed to the promotional tone.

AI Submissions for Fri Feb 13 2026

I'm not worried about AI job loss

Submission URL | 305 points | by ezekg | 500 comments

David Oks pushes back on the viral “February 2020” AI panic sparked by Matt Shumer’s essay, arguing that while AI is historically important, it won’t trigger an immediate avalanche of job losses. He contends real-world impact will be slower and uneven, and that ordinary people will be fine—even without obsessively adopting every new tool.

Key points:

  • The panic: Shumer’s “COVID-like” framing and prescriptions (buy AI subscriptions, spend an hour a day with tools) went massively viral—but Oks calls it wrong on the merits and partly AI-generated.
  • Comparative vs. absolute advantage: Even if AI can do many tasks, substitution depends on whether AI-alone outperforms human+AI. Often, the “cyborg” team wins.
  • Why humans still matter: People set preferences, constraints, and context (e.g., in software engineering), which AI agents still need; combining them boosts output and quality.
  • Pace and texture: AI advances fast in demos, but deployment into messy organizations is slow and uneven. Expect change, not an overnight “avalanche.”
  • Bottom line: Human labor isn’t vanishing anytime soon; panic-driven narratives risk causing harm through bad decisions and misplaced fear.

Here is a summary of the discussion:

Shifting Skills and Labor Arbitrage Commenters debated the nature of the "transition period." While some agreed with the article that AI removes mechanical drudgery (like data entry) to elevate human judgment, skeptics argued this ultimately acts as a "leveler." By reducing the "penalty" for lacking domain context, AI shrinks training times and simplifies quality control. Several users warned this facilitates labor arbitrage: if the "thinking" part is packaged by AI and the "doing" is automated, high-level Western jobs could easily be offshored or see salary stagnation, causing a decline in purchasing power even if headcount remains flat.

The "Bimodal" Future of Engineering A strong thread focused on the consolidation of technical roles. Users predicted that specialized roles (Frontend, Backend, Ops) will merge into AI-assisted "Full Stack" positions. This may lead to a bimodal skill split:

  • Product Engineers: Focused on business logic, ergonomics, and customer delight.
  • Deep Engineers: Focused on low-level systems, performance tuning, and compiler internals. The "middle ground" of generic coding is expected to disappear.

The Myth of the 10-Person Unicorn Participants discussed the viral idea of "10-person companies making $100M." Skeptics argued that while AI can replicate code and product features, it cannot easily replicate sales forces, warm networks, and organizational "moats." Historical comparisons were made to WhatsApp (55 employees, $19B acquisition), though users noted those teams were often overworked outliers rather than the norm.

Physical Automation vs. Software A sub-discussion contrasted software AI with physical automation, using sandwich-making robots as a case study. Users noted that economic success in physical automation requires extreme standardization (e.g., rigid assembly lines), whereas current general-purpose robots lack the speed and flexibility of humans in messy, variable environments. This provided a counterpoint to the idea that AI will instantly revolutionize all sectors equally.

OpenAI has deleted the word 'safely' from its mission

Submission URL | 555 points | by DamnInteresting | 278 comments

OpenAI quietly dropped “safely” from its mission as it pivots to a profit-focused structure, raising governance and accountability questions

  • What happened: A Tufts University scholar notes OpenAI’s 2024 IRS Form 990 changes its mission from “build AI that safely benefits humanity, unconstrained by a need to generate financial return” to “ensure that artificial general intelligence benefits all of humanity,” removing both “safely” and the “unconstrained by profit” language.
  • Why now: The wording shift tracks with OpenAI’s evolution from a nonprofit research lab (founded 2015) to a profit-seeking enterprise (for‑profit subsidiary in 2019, major Microsoft funding), and a 2025 restructuring.
  • New structure: Per a memorandum with the California and Delaware attorneys general, OpenAI split into:
    • OpenAI Foundation (nonprofit) owning about one-fourth of
    • OpenAI Group, a Delaware public benefit corporation (PBC). PBCs must consider broader stakeholder interests and publish an annual benefit report, but boards have wide latitude in how they weigh trade-offs.
  • Capital push: Media hailed the shift as opening the door to more investment; the article cites a subsequent $41B SoftBank investment. Earlier late‑2024 funding reportedly came with pressure to convert to a conventional for‑profit with uncapped returns and potential investor board seats.
  • Safety signals: The article highlights ongoing lawsuits alleging harm from OpenAI’s products and notes (via Platformer) that OpenAI disbanded its “mission alignment” team—context for interpreting the removal of “safely.”
  • Governance stakes: The author frames OpenAI as a test case for whether high-stakes AI firms can credibly balance shareholder returns with societal risk, and whether PBCs and foundations meaningfully constrain profit-driven decisions—or mostly rebrand them.
  • The bottom line: Swapping a safety-first, noncommercial mission for a broader, profit-compatible one may be more than semantics; it concentrates power in board discretion and public reporting, just as AI systems scale in capability and risk. For regulators, investors, and the public, OpenAI’s first PBC “benefit report” will be a key tell.

Here is a summary of the discussion on Hacker News:

Historical Revisions and Cynicism The discussion was dominated by skepticism regarding OpenAI's trajectory, with users drawing immediate comparisons to Google’s abandonment of "Don't be evil" and the revisionist history in Orwell’s Animal Farm. One popular comment satirized the situation by reciting the gradual alteration of the Seven Commandments (e.g., "No animal shall kill any other animal without cause"), suggesting OpenAI is following a predictable path of justifying corporate behavior by rewriting its founding principles.

Parsing the Textual Changes Several users, including the author of the analyzed blog post (smnw), used LLMs and scripts to generate "diffs" of OpenAI’s IRS Form 990 filings from 2016 to 2024.

  • The "Misleading" Counter-argument: While the removal of "safely" grabbed headlines, some commenters argued the post title was sensationalized. They noted the mission statement was reduced from 63 words to roughly 13; while "safely" was cut, so was almost every other word, arguably for brevity rather than malice.
  • The Financial Shift: Others countered that the crucial deletion was the clause "unconstrained by a need to generate financial return," which explicitly confirms the pivot to profit maximization.

Comparisons to Anthropic Users questioned how competitor Anthropic handles these governance issues. It was noted that Anthropic operates as a Public Benefit Corporation (PBC). While their corporate charter explicitly mentions "responsibly developing" AI for the "long term benefit of humanity," users pointed out that as a PBC, they are not required to file the publicly accessible Form 990s that non-profits like the OpenAI Foundation must, making their internal shifts harder to track.

The "Persuasion" Risk vs. Extinction A significant portion of the debate moved beyond the mission statement to specific changes in OpenAI’s "Preparedness Framework." Users highlighted that the company reportedly stopped assessing models for "persuasion" and "manipulation" risks prior to release.

  • Ad-Tech Scaling: Commenters debated whether this poses a new threat or merely scales existing harms. Some argued that social media and ad-tech have already destroyed "shared reality" and that AI simply accelerates this efficiently (referencing Cambridge Analytica).
  • Existential Debate: This triggered a philosophical dispute over whether the real danger of AI is "Sci-Fi extinction" or the subtle, psychological manipulation of the public's perception of reality.

Nature of Intelligence A recurring background argument persisted regarding the nature of LLMs, with some users dismissing current models as mere "pattern completion" incapable of intent, while others argued that widespread psychological manipulation does not require the AI to be sentient—it only requires the user to be susceptible.

Show HN: Skill that lets Claude Code/Codex spin up VMs and GPUs

Submission URL | 128 points | by austinwang115 | 33 comments

Cloudrouter: a CLI “skill” that gives AI coding agents (and humans) on-demand cloud dev boxes and GPUs

What it is

  • An open-source CLI that lets Claude Code, Codex, Cursor, or your own agents spin up cloud sandboxes/VMs (including GPUs), run commands, sync files, and even drive a browser—straight from the command line.
  • Works as a general-purpose developer tool too; install via npm and use locally.

Why it matters

  • Turns AI coding agents from “suggest-only” helpers into tools that can provision compute, execute builds/tests, and collect artifacts autonomously.
  • Unifies multiple sandbox providers behind one interface and adds built-in browser automation for end-to-end app workflows.

How it works

  • Providers: E2B (default; Docker) and Modal (GPU) today; more (Vercel, Daytona, Morph, etc.) planned.
  • Quick start: cloudrouter start . to create a sandbox from your current directory; add --gpu T4/A100/H100 or sizes; open VS Code in browser (cloudrouter code), terminal (pty), or VNC desktop.
  • Commands: run one-offs over SSH, upload/download with watch-based resync, list/stop/delete sandboxes.
  • Browser automation: Chrome CDP integration to open URLs, snapshot the accessibility tree with stable element refs (e.g., @e1), fill/click, and take screenshots—useful for login flows, scraping, and UI tests.
  • GPUs: flags for specific models and multi-GPU (e.g., --gpu H100:2). Suggested use cases span inference (T4/L4) to training large models (A100/H100/H200/B200).

Other notes

  • Open source (MIT), written in Go, distributed via npm for macOS/Linux/Windows.
  • You authenticate once (cloudrouter login), then can target any supported provider.
  • Costs/persistence depend on the underlying provider; today’s GPU support is via Modal.

Feedback and Clarification

  • Providers & Configuration: Users asked for better documentation regarding supported providers (currently E2B and Modal). The creators clarified that while E2B/Modal are defaults, they are planning a "bring-your-own-cloud-key" feature and intend to wrap other providers (like Fly.io) in the future.
  • Use Case vs. Production: When compared to Infrastructure-as-Code (IaC) tools like Pulumi or deployment platforms like Railway, the creators emphasized that Cloudrouter is designed for ephemeral, throwaway environments used during the coding loop, whereas counterparts are for persistent production infrastructure.
  • Local vs. Cloud: Some users argued for local orchestration (e.g., k3s, local agents) to reduce latency and costs. The creators acknowledged this preference but noted that cloud sandboxes offer reliability and pre-configured environments particularly useful for heavy GPU tasks or preventing local resource contention.

Technical Critique & Security

  • Monolithic Architecture: User 0xbadcafebee critiqued the tool for being "monolithic" (bundling VNC, VS Code, Browser, and Server in one Docker template) rather than composable, and raised security concerns about disabling SSH strict host checking.
  • Creator Response: The creator defended the design, stating that pre-bundling dependencies is necessary to ensuring agents have a working environment immediately without struggling to configure networks. Regarding SSH, they explained that connections are tunneled via WebSockets with ephemeral keys, reducing the risk profile despite the disabled checks.
  • Abuse Prevention: In response to concerns about crypto-miners abusing free GPU provision, the creators confirmed that concurrency limits and guardrails are in place.

Why Not Native CLIs?

  • When asked why agents wouldn't just use standard AWS/Azure CLIs, the maintainers explained that Cloudrouter abstracts away the friction of setting up security groups, SSH keys, and installing dependencies (like Jupyter or VNC), allowing the agent to focus immediately on coding tasks.

Other

  • A bug regarding password prompts on startup was reported and fixed during the discussion.
  • The project was compared to dstack, which recently added similar agent support.

Dario Amodei – "We are near the end of the exponential" [video]

Submission URL | 103 points | by danielmorozoff | 220 comments

Dario Amodei: “We are near the end of the exponential” (Dwarkesh Podcast)

Why it matters

  • Anthropic CEO Dario Amodei argues we’re just a few years from “a country of geniuses in a data center,” warning that the current phase of rapid AI capability growth is nearing its end and calling for urgency.

Key takeaways

  • Scaling still rules: Amodei doubles down on his “Big Blob of Compute” hypothesis—progress comes mostly from scale and a few fundamentals:
    • Raw compute; data quantity and quality/breadth; training duration; scalable objectives (pretraining, RL/RLHF); and stable optimization.
  • RL era, same story: Even without neat public scaling laws, he says RL is following the same “scale is all you need” dynamic—teaching models new skills with both objective (code/math) and subjective (human feedback) rewards.
  • Uneven but inexorable capability growth: Models marched from “smart high schooler” to “smart college grad” and now into early professional/PhD territory; code is notably ahead of the curve.
  • Urgency vs complacency: He’s most surprised by how little public recognition there is that we’re “near the end of the exponential,” implying big capability jumps soon and potential tapering thereafter.
  • What’s next (topics covered):
    • Whether Anthropic should buy far more compute if AGI is near.
    • How frontier labs can actually make money.
    • If regulation could blunt AI’s benefits.
    • How fast AI will diffuse across the economy.
    • US–China competition and whether both can field “countries of geniuses” in data centers.

Notable quote

  • “All the cleverness… doesn’t matter very much… There are only a few things that matter,” listing scale levers and objectives that “can scale to the moon.”

Here is the summary of the discussion surrounding Dario Amodei's interview.

Discussion Summary The Hacker News discussion focuses heavily on the practical limitations of current models compared to Amodei’s theoretical optimism, as well as the philosophical implications of an approaching "endgame."

  • The "Junior Developer" Reality Check: A significant portion of the thread debates Amodei’s claims regarding AI coding capabilities. Users report that while tools like Claude are excellent for building quick demos or "greenfield" projects, they struggle to maintain or extend complex, existing software architectures. The consensus among several developers is that LLMs currently function like "fast but messy junior developers" who require heavy supervision, verification, and rigid scaffolding to be useful in production environments.
  • S-Curves vs. Infinite Knowledge: Amodei’s phrase "end of the exponential" sparked a philosophical debate. Some users, referencing David Deutsch’s The Beginning of Infinity, argue that knowledge creation is unbounded and predicting an "end" is a fallacy similar to Fukuyama’s "End of History." Counter-arguments suggest that while knowledge may be infinite, physical constraints (compute efficiency, energy, atomic manufacturing limitations) inevitably force technologies onto an S-curve that eventually flattens.
  • The Public Awareness Gap: Commenters discussed the disconnect Amodei highlighted—the contrast between the AI industry's belief that we are 2–4 years away from a radical "country of geniuses" shift and the general public's focus on standard political cycles. Users noted that if Amodei’s 50/50 prediction of an "endgame" within a few years is accurate, the current lack of public preparation or meaningful discourse is startling.

CBP signs Clearview AI deal to use face recognition for 'tactical targeting'

Submission URL | 269 points | by cdrnsf | 157 comments

CBP signs $225k Clearview AI deal, expanding facial recognition into intel workflow

  • What’s new: US Customs and Border Protection will pay $225,000 for a year of Clearview AI access, extending the facial-recognition tool to Border Patrol’s intelligence unit and the National Targeting Center.
  • How it’ll be used: Clearview’s database claims 60+ billion scraped images. The contract frames use for “tactical targeting” and “strategic counter-network analysis,” suggesting routine intel integration—not just case-by-case lookups.
  • Privacy/oversight gaps: The agreement anticipates handling sensitive biometrics but doesn’t specify what images agents can upload, whether US citizens are included, or retention periods. CBP and Clearview didn’t comment.
  • Context clash: DHS’s AI inventory links a CBP pilot (Oct 2025) to the Traveler Verification System, which CBP says doesn’t use commercial/public data; the access may instead tie into the Automated Targeting System that connects watchlists, biometrics, and ICE enforcement records.
  • Pushback: Sen. Ed Markey proposed banning ICE and CBP from using facial recognition, citing unchecked expansion.
  • Accuracy caveats: NIST found face-search works on high-quality “visa-like” photos but error rates often exceed 20% in less controlled images common at borders. In investigative mode, systems always return candidates—yielding guaranteed false matches when the person isn’t in the database.

The Fourth Amendment "Loophole" The central theme of the discussion focuses on the legality and ethics of the government purchasing data it is constitutionally forbidden from collecting itself. Users argue that buying "off-the-shelf" surveillance circumvents the Fourth Amendment (protection against unreasonable search and seizure). Several commenters assert that if the government cannot legally gather data without a warrant, it should be illegal for them to simply purchase that same data from a private broker like Clearview AI.

State Power vs. Corporate Power A debate emerged regarding the distinction between public and private entities.

  • Unique State Harms: One user argued that a clear distinction remains necessary because only the government holds the authority to imprison or execute citizens ("send to death row"), implying government usage requires higher standards of restraint.
  • The "De Facto" Government: Counter-arguments suggested that the separation is functionally "theatrics." Users contended that tech companies now act as a "parallel power structure" or a de facto government. By relying on private contractors for core intelligence work, the government effectively deputizes corporations that operate outside constitutional constraints.

Legal Precedents and the Third-Party Doctrine The conversation turned to specific legal theories regarding privacy:

  • Third-Party Doctrine: Some users questioned whether scraping public social media actually violates the Fourth Amendment, citing the Third-Party Doctrine (the idea that you have no expectation of privacy for information voluntarily shared with others).
  • The Carpenter Decision: Others rebutted this by citing Carpenter v. United States, arguing that the Supreme Court is narrowing the Third-Party Doctrine in the digital age and that the "public" nature of data shouldn't grant the government unlimited warrantless access.

Historical Analogies and Solutions One commenter drew an analogy to film photography: legally, a photo lab could not develop a roll of film and hand it to the police without a warrant just because they possessed the physical negatives. They argued digital data should be treated similarly. Proposed solutions ranged from strict GDPR-style data collection laws to technical obfuscation (poisoning data) to render facial recognition ineffective.

IBM Triples Entry Level Job Openings. Finds Limits to AI

Submission URL | 28 points | by WhatsTheBigIdea | 5 comments

IBM says it’s tripling entry‑level hiring, arguing that cutting junior roles for AI is a short‑term fix that risks hollowing out the future talent pipeline. CHRO Nickle LaMoreaux says IBM has rewritten early‑career jobs around “AI fluency”: software engineers will spend less time on routine coding and more on customer work; HR staff will supervise and intervene with chatbots instead of answering every query. While a Korn Ferry report finds 37% of organizations plan to replace early‑career roles with AI, IBM contends growing its junior ranks now will yield more resilient mid‑level talent later. Tension remains: IBM recently announced layoffs, saying combined cuts and hiring will keep U.S. headcount roughly flat. Other firms echo the bet on Gen Z’s AI skills—Dropbox is expanding intern/new‑grad hiring 25%, and Cognizant is adding more school graduates—while LinkedIn cites AI literacy as the fastest‑growing U.S. skill.

Discussion Summary:

Commenters expressed skepticism regarding both the scale of IBM’s hiring and its underlying motives. Users pointed to ongoing age discrimination litigation against the company, suggesting the pivot to junior hiring acts as a cost-saving mechanism to replace higher-paid, senior employees (specifically those over 50). Others scrutinized IBM's career portal, noting that ~240 entry-level listings globally—and roughly 25 in the U.S.—seems negligible for a 250,000-person company, though one user speculated these might be single "generic" listings used to hire for multiple slots. It was also noted that this story had been posted previously.

Driverless trucks can now travel farther distances faster than human drivers

Submission URL | 22 points | by jimt1234 | 16 comments

Aurora’s driverless semis just ran a 1,000-mile Fort Worth–Phoenix haul nonstop in about 15 hours—faster than human-legal limits allow—bolstering the case for autonomous freight economics.

Key points:

  • Why it matters: U.S. Hours-of-Service rules cap human driving at 11 hours with mandatory breaks, turning a 1,000-mile trip into a multi-stop run. Aurora says autonomy can nearly halve transit times, appealing to shippers like Uber Freight, Werner, FedEx, Schneider, and early route customer Hirschbach.
  • Network today: Driverless operations (some still with an in-cab observer) on Dallas–Houston, Fort Worth–El Paso, El Paso–Phoenix, Fort Worth–Phoenix, and Laredo–Dallas. The company plans Sun Belt expansion across TX, NM, AZ, then NV, OK, AR, LA, KY, MS, AL, NC, SC, GA, FL.
  • Scale and safety: 30 trucks in fleet, 10 running driverlessly; >250,000 driverless miles as of Jan 2026 with a “perfect safety record,” per Aurora. >200 trucks targeted by year-end.
  • Tech/ops: Fourth major software release broadens capability across diverse terrain and weather and validates night ops. Second-gen hardware is slated to cut costs. Paccar trucks currently carry a safety observer at manufacturer request; International LT trucks without an onboard human are planned for Q2.
  • Financials: Revenue began April 2025; $1M in Q4 and $3M for 2025 ($4M adjusted incl. pilots). Net loss was $816M in 2025 as Aurora scales.

CEO Chris Urmson calls it the “dawn of a superhuman future for freight,” predicting 2026 as the inflection year when autonomous trucks become a visible Sun Belt fixture.

Here is a summary of the discussion on Hacker News:

Safety Statistics and Sample Size The most active debate concerned the statistical significance of Aurora's safety claims. While Aurora touted a "perfect safety record" over 250,000 driverless miles, commenters argued that this sample size is far too small to draw meaningful conclusions. Users pointed out that professional truck drivers often average over 1.3 million miles between accidents, meaning Aurora needs significantly more mileage to prove it is safer than a human.

Regulatory Arbitrage Commenters noted that the "efficiency" gains—beating human transit times by hours—are largely due to bypassing human limitations rather than driving speed. Users described this as "regulation arbitrage," as the software does not require the federally mandated rest breaks that cap human drivers to 11 hours of operation.

Hub-to-Hub Model vs. Rail There was consensus that the "hub-to-hub" model (autonomous driving on interstates, human drivers for the complex last mile) is the most viable path for the technology. However, this inevitably triggered a debate about infrastructure, with critics joking that this system is simply an "inefficient railway." Defenders of the trucking approach countered that rail infrastructure in the specific region mentioned (LA/Phoenix) is currently insufficient or non-existent for this type of freight.

Skepticism and Market Optimism Opinions on the company's trajectory were mixed. Some users worried the technology is "smoke and mirrors," citing a lack of detail regarding how the trucks handle complex scenarios like warehouses, docks, and urban navigation. Conversely, others noted that Aurora appears to be delivering on timelines where competitors like Tesla have stalled, pointing to the company's rising stock price (up ~52% in the last year) as a sign of market confidence.

Spotify says its best developers haven't written code since Dec, thanks to AI

Submission URL | 17 points | by samspenc | 18 comments

Spotify says its top devs haven’t written a line of code since December—AI did

  • On its Q4 earnings call, Spotify co-CEO Gustav Söderström said the company’s “best developers have not written a single line of code since December,” attributing the shift to internal AI tooling.
  • Engineers use an in-house system called Honk, powered by generative AI (Claude Code), to request bug fixes and features via Slack—even from a phone—then receive a built app build to review and merge, speeding deployment “tremendously.”
  • Spotify shipped 50+ features/changes in 2025 and recently launched AI-driven Prompted Playlists, Page Match for audiobooks, and About This Song.
  • Söderström argued Spotify is building a non-commoditizable data moat around taste and context (e.g., what counts as “workout music” varies by region and preference), improving models with each retraining.
  • On AI-generated music, Spotify is letting artists/labels flag how tracks are made in metadata while continuing to police spam.

Why it matters: If accurate at scale, Spotify’s workflow hints at a tipping point for AI-assisted development velocity—and underscores how proprietary, behavior-driven datasets may become the key moat for consumer AI features. (Open questions: code review, testing, and safety gates when deploying from Slack.)

Hacker News Discussion Summary

There is significant skepticism in the comments regarding co-CEO Gustav Söderström's claim, with users contrasting the "efficiency" narrative against their actual experience with the Spotify product.

  • App Quality vs. AI Efficiency: The most prevalent sentiment is frustration with the current state of the Spotify desktop app. Commenters complain that the app already consumes excessive RAM and CPU cycles just to stream audio; many argue that if AI is now writing the software, it explains why the app feels bloated or unoptimized (with one user noting the Linux version is currently broken).
  • The "Code Review" Reality: Several engineers speculate that "not writing lines of code" doesn't mean the work is finished—it implies developers are now "wading through slop-filled code reviews." Users worry this workflow will lead to technical debt and a collapse of code quality as senior engineers get burned out checking AI-generated commits.
  • Safety and Standards: The concept of deploying via Slack triggered alarm bells. Commenters equate this to "testing in production" or bypassing critical thinking protections, suggesting it represents terrible development hygiene rather than a breakthrough.
  • Cynicism toward Leadership: Some view the CEO's statement as corporate theater—either a misunderstanding of engineering (confusing "typing" with "building") or a way to game performance reviews. One user invoked Office Space, joking that not writing code for years is usually a sign of slacking off, not hyper-productivity.