AI Submissions for Wed Nov 05 2025
Open Source Implementation of Apple's Private Compute Cloud
Submission URL | 237 points | by adam_gyroscope | 40 comments
OpenPCC: an open-source take on Apple’s Private Cloud Compute for provably private AI inference
What it is
- An open, auditable framework to run AI inference without exposing prompts, outputs, or logs.
- Inspired by Apple’s Private Cloud Compute, but self-hostable and community-governed.
- Enforces privacy with encrypted streaming, hardware attestation (TPM/TEEs), transparency logs, and unlinkable requests.
- Apache-2.0 licensed; currently ~327 stars.
How it works
- OpenAI-compatible API surface (drop-in style /v1/completions).
- Clients verify server identity and policy via a transparency log (Sigstore-style) and OIDC identity policies (example shows GitHub Actions).
- Routing by model tags (e.g., X-Confsec-Node-Tags: qwen3:1.7b) to target specific attested nodes.
- Repo focuses on the Go client plus a C library used by Python/JS clients; includes in-memory services for local testing.
Why it matters
- Brings verifiable, privacy-preserving inference to your own infrastructure—useful for regulated environments and users wary of black-box cloud AIs.
- Open standard approach may enable broader auditing and community trust than closed PCC implementations.
Try it
- Read the whitepaper: github.com/openpcc/openpcc/blob/main/whitepaper/openpcc.pdf
- Dev workflow: install mage, run “mage runMemServices” (in-memory OpenPCC services), then “mage runClient”.
- Programmatic use: instantiate the OpenPCC client, set a TransparencyVerifier and OIDC identity policy, then send OpenAI-style requests; route by model tag.
Caveats / open questions to watch
- Trust roots and attestation scope across vendors/TEEs, model supply-chain attestations, and performance overhead.
- Maturity of server-side deployments vs. in-memory dev services; breadth beyond completions endpoints.
Repo: github.com/openpcc/openpcc
The discussion around OpenPCC centers on technical trust, regulatory challenges, and practical implementation:
-
Hardware Security & Trust:
- Debates focus on reliance on hardware-backed solutions (e.g., AWS Nitro Enclaves, TEEs). Users question whether trust in vendors like Amazon or NVIDIA/AMD is justified, given centralized control.
- Mentions NCC Group’s audit of AWS Nitro, highlighting mechanisms for isolating customer data but lingering doubts about attestation scope.
-
Regulatory Concerns:
- EU compliance issues arise, with critiques of cryptocurrency payments in OpenPCC potentially enabling money laundering. Emphasis on traceable payments under EU laws.
-
Technical Implementation:
- Praise for the project’s open-source approach and Apache 2.0 licensing, but concerns about maturity: server-side deployments lag behind in-memory dev tools, and performance overhead remains unaddressed.
- Questions about integration with existing workflows (e.g., debugging, logging) and whether OpenPCC simplifies privacy for developers.
-
Comparisons & Branding:
- Contrasted with Apple’s proprietary Private Cloud Compute (PCC). Some users find OpenPCC’s branding too generic, though supporters clarify its broader, self-hostable vision.
- Emphasis on the need for reproducible builds and attestation chains to ensure model integrity.
-
Skepticism & Optimism:
- Skeptics highlight unresolved trust roots and potential for NSA access via hardware backdoors.
- Optimists see value in community-driven, privacy-preserving AI for regulated industries, praising its verifiable inference approach.
Key Takeaway: While OpenPCC is seen as a promising step toward auditable AI privacy, its success hinges on addressing trust in hardware, regulatory compliance, and real-world deployment maturity.
ChatGPT terms disallow its use in providing legal and medical advice to others
Submission URL | 353 points | by randycupertino | 381 comments
OpenAI: No tailored medical or legal advice in ChatGPT
- What’s new: According to a CTV News report (Nov 5, 2025), OpenAI says ChatGPT cannot be used for personalized legal or medical advice. The company is reinforcing that the system shouldn’t diagnose, prescribe, or provide individualized legal counsel.
- What’s still allowed: General information, educational content, and high-level guidance appear to remain okay, but not case-specific recommendations.
- Likely impact: Users will see more refusals or safety redirects on prompts seeking individualized diagnoses, treatments, or legal strategies. Developers building workflows in health and law may need licensed human oversight or alternative tooling.
- Why it matters: It underscores growing caution around AI in regulated domains, steering ChatGPT toward information and drafting support rather than professional advice.
Summary of Hacker News Discussion:
-
Confusion Over APIs vs. ChatGPT Product:
- Users debated whether journalists conflated OpenAI’s GPT-5 API with the consumer-facing ChatGPT product. Some argued that developers building on the API might face stricter terms-of-service restrictions compared to ChatGPT’s general use.
-
Epic Systems & Healthcare Integration:
- Commenters noted that healthcare apps like Epic’s MyChart cannot integrate ChatGPT due to regulatory constraints (e.g., HIPAA compliance). Others pointed out that OpenAI’s terms explicitly prohibit medical/legal use cases without licensed human oversight.
-
Liability Concerns:
- Many criticized OpenAI’s move as a liability-avoidance tactic. Developers argued that redirecting users to “consult professionals” undermines ChatGPT’s utility in drafting or research workflows, even if not providing direct advice.
-
AI vs. Human Judgment:
- Sarcastic remarks compared ChatGPT to WebMD’s infamous overdiagnosis tendencies. Users warned against relying on AI for mental health advice, citing risks of flawed self-diagnoses or sycophantic responses that confirm biases.
-
Ethical and Practical Implications:
- Some highlighted the absurdity of using ChatGPT for legal strategies, given its potential to generate flawed or “guardrail-free” arguments. Others defended OpenAI’s restrictions as necessary to avoid misuse in high-stakes domains like medicine or law.
-
Community Reactions:
- Mixed responses: Some praised the caution, calling it “long overdue,” while others dismissed it as “lazy lawyering” that stifles innovation. A few suggested OpenAI should offer certified/licensed versions for regulated industries instead of blanket bans.
Key Takeaway:
The discussion reflects skepticism about AI’s reliability in critical domains and frustration over regulatory hurdles, balanced by acknowledgment of the need for safeguards. Most agree that human expertise remains irreplaceable in high-risk contexts.
I’m worried that they put co-pilot in Excel
Submission URL | 455 points | by isaacfrond | 316 comments
Simon Willison highlights a viral TikTok by Ada James celebrating “Brenda,” the unsung mid-level finance pro who actually knows how to tame Excel—the “beast that drives our entire economy.” The punchline with teeth: adding Copilot to Excel may tempt higher-ups to bypass experts like Brenda, trust an AI they don’t understand, and ship hallucinated formulas they can’t spot. The core argument isn’t anti-AI; it’s a warning about overconfidence and invisible errors in mission-critical spreadsheets. Takeaway: AI can assist, but in Excel—where small mistakes move real money—human expertise, review, and accountability still matter. Respect your Brendas.
Summary of Discussion:
The Hacker News debate around Simon Willison’s “Brenda vs. AI in Excel” submission centers on determinism vs. probabilistic systems, human expertise, and accountability. Key arguments include:
-
Deterministic vs. Probabilistic Systems
- Traditional code (Brenda’s Excel macros) is praised for being deterministic: repeatable, debuggable, and predictable. AI (like Copilot), by contrast, is probabilistic—even if correct once, its outputs may vary unpredictably, raising risks in critical applications like finance.
- Critics argue AI’s probabilistic nature amplifies the “invisible error” problem: outputs may appear correct but propagate subtle mistakes (e.g., hallucinated formulas) that humans must catch.
-
Human vs. Machine Reliability
- Human expertise: Brenda represents domain knowledge and accountability. While humans make errors, they can reason about why a mistake occurred and iterate. AI, as a “black box,” lacks transparency.
- Software isn’t perfect: Participants note that even deterministic systems fail (e.g., calculator bugs, Excel crashes), but their predictability allows for audits and fixes. AI’s errors are harder to trace.
-
Corporate Incentives & Overconfidence
- Skepticism arises about AI’s valuation hype: companies may push AI as a cost-saving “innovation” while downplaying risks. TradFi processes, where small errors can “move real money,” demand rigor that probabilistic AI may not yet offer.
- Accountability matters: In regulated fields like finance, Brenda’s work is auditable and traceable. AI’s decision-making process is often opaque, complicating compliance.
-
Practical Compromise?
- Several suggest AI could augment experts like Brenda (e.g., speeding up drafts), but only if paired with human validation and deterministic guardrails.
- A recurring analogy: AI is like a junior analyst who might get things right but lacks Brenda’s experience to foresee edge cases or contextual pitfalls.
Final Takeaway:
The thread rejects a “Brenda vs. AI” dichotomy, instead emphasizing collaboration—AI as a tool to assist experts, not replace them. The real danger isn’t AI itself but organizational overconfidence in deploying it without safeguards. In mission-critical systems, deterministic processes and human oversight remain irreplaceable. As one commenter quipped: “Respect your Brendas, or pay the price.”
Apple nears $1B Google deal for custom Gemini model to power Siri
Submission URL | 62 points | by jbredeche | 39 comments
Apple reportedly nears $1B/year deal for custom Google Gemini to power revamped Siri
- Bloomberg (via 9to5Mac) says Apple is finalizing a roughly $1B annual agreement for a custom 1.2T-parameter Google Gemini model to handle Siri’s “summarizer” and “planner” functions in a major revamp slated for next spring.
- Google beat Anthropic largely on price, not performance, per the report.
- Apple will keep some Siri features on its own models (currently ~150B parameters in the cloud and 3B on-device). Gemini will run on Apple’s servers within Private Cloud Compute, meaning user data won’t go to Google.
- Internally dubbed project Glenwood and led by Mike Rockwell after a Siri shake-up, the deal is framed as a bridge: Apple is still building its own large cloud model (~1T parameters) and aims to replace Gemini over time despite recent AI talent departures.
Why it matters: $1B/year underscores the escalating cost of state-of-the-art AI, while Apple’s privacy-first deployment and parallel in-house push signal a pragmatic, transitional reliance on Google rather than a long-term shift. Source: Bloomberg via 9to5Mac.
Hacker News Discussion Summary: Apple's $1B Google Gemini Deal for Siri
Key Themes:
-
Trust & Implementation Concerns:
- Users debate whether Apple’s privacy-focused deployment (via Private Cloud Compute) truly prevents Google from accessing data. Skepticism arises about Apple’s ability to replicate Gemini’s integration quality internally, especially after AI talent departures.
-
Technical & Financial Pragmatism:
- Some argue Apple’s reliance on Gemini is a cost-driven “bridge” until its in-house 1T-parameter cloud model matures. Others question if on-device models (3B parameters) are sufficient compared to cloud-based alternatives.
-
Siri’s Current Limitations:
- Frustration with Siri’s performance (“Siri sucks”) contrasts with praise for GPT/Claude. Critics suggest Apple’s focus should be UX integration rather than raw model performance.
-
Antitrust & Market Dynamics:
- The deal’s size ($1B/year) sparks discussions about antitrust risks, given Apple’s existing search revenue agreements with Google.
-
Privacy vs. Practicality:
- While Apple’s Private Cloud Compute is touted as privacy-first, users speculate whether data might still indirectly reach Google. Others highlight the challenge of balancing privacy with state-of-the-art AI costs.
-
Developer & Ecosystem Impact:
- Comments note potential lock-in effects if Apple prioritizes Gemini over open-source models (e.g., Llama) and the broader implications for AI commoditization.
Notable Takes:
- “Google beat Anthropic on price, not performance” underscores cost as a key factor.
- “Apple’s $359B cash reserves make this a transitional bet, not a long-term shift.”
- “Gemini’s integration might be good, but Apple’s closed ecosystem risks repeating Siri’s stagnation.”
Conclusion: The discussion reflects cautious optimism about Apple’s strategic bridge to in-house AI, tempered by skepticism over execution, privacy, and market power. Users emphasize that success hinges on seamless UX integration and Apple’s ability to innovate beyond reliance on external models.
Code execution with MCP: Building more efficient agents
Submission URL | 29 points | by pmkelly4444 | 5 comments
Anthropic’s Engineering blog explains how to scale agents connected to many tools by shifting from direct tool calls to code execution over MCP (Model Context Protocol). Since MCP’s launch in Nov 2024, the community has built thousands of MCP servers and SDKs across major languages, letting agents tap into hundreds or thousands of tools. But loading every tool definition into the model and piping every intermediate result through the context window explodes token usage and latency—especially with large documents.
Their fix: present MCP servers as code APIs and let the agent write small programs that call those APIs in a sandboxed execution environment. Instead of stuffing tool definitions and big payloads into the model’s context, the model imports only the wrappers it needs (e.g., a generated TypeScript file tree per server/tool), then executes code that moves data directly between tools.
Why it matters
- Cuts context bloat: Only the needed tool interfaces are loaded; large intermediate data never hits the model’s token window.
- Reduces cost and latency: Fewer tokens processed, fewer round-trips.
- Improves reliability: Avoids error-prone copy/paste of large payloads between tool calls.
- Scales better: Practical to connect “hundreds or thousands” of tools across many MCP servers.
Concrete example
- Old way: Ask the model to get a long Google Drive transcript and paste it into a Salesforce record → the full transcript flows through the model twice.
- New way: The model writes a short script that imports gdrive.getDocument and salesforce.updateRecord wrappers and moves the transcript directly in code—no giant payloads in the model context.
Takeaway: Treat MCP tools as code libraries, not direct model tools. Let the model discover/import only what it needs and do the heavy lifting in an execution runtime. The result is more efficient, cheaper, and more reliable agents as MCP ecosystems grow.
Summary of Discussion:
The discussion around Anthropic's MCP (Model Context Protocol) approach highlights mixed reactions, focusing on its efficiency, practicality, and innovation:
Key Points:
-
Efficiency vs. Tool Design:
- While MCP reduces token bloat by avoiding large payloads in the context window, poorly designed tools (e.g., verbose SQL scripts) can still inflate token usage. Users stress that tool quality matters—badly written tools undermine MCP’s benefits.
-
CLI Tools as Reliable Alternatives:
- Participants argue that existing CLI tools (e.g., Atlassian CLI) are already reliable for integrations. Leveraging ecosystems like npm or CLI-focused solutions avoids reinventing the wheel and simplifies deterministic tool installation.
-
Tool Discovery Challenges:
- Some note that MCP’s approach to tool discovery isn’t a registry but resembles CLI-centric patterns. This raises questions about scalability and ease of adoption compared to established package managers.
-
Skepticism About Innovation:
- Critics compare MCP to traditional API orchestration tools, calling it a step backward. One user dismisses it as "depressing," arguing that modern coding practices (e.g., Rust/Python) and workflow diagrams should suffice without new protocols.
Community Sentiment:
- Pragmatic Optimism: Recognition of MCP’s potential to cut costs and latency, but emphasis on tool design and integration with existing systems.
- Criticism: Viewed by some as reinventing existing solutions, lacking novelty compared to Web API orchestration or CLI ecosystems.
Takeaway: MCP’s success hinges on balancing innovation with practical tooling and leveraging established ecosystems to avoid redundancy.
Kosmos: An AI Scientist for Autonomous Discovery
Submission URL | 53 points | by belter | 13 comments
TL;DR: A large multi-institution team introduces Kosmos, an autonomous research agent that loops through data analysis, literature search, and hypothesis generation for up to 12 hours. It maintains coherence via a shared “world model,” cites every claim to code or primary literature, and reportedly surfaces findings that collaborators equate to months of work.
What’s new
- Long-horizon autonomy: Runs up to 12 hours over ~200 agent rollouts without losing the plot, thanks to a structured world model shared by a data-analysis agent and a literature-search agent.
- Scale of activity: On average per run, executes ~42,000 lines of code and reads ~1,500 papers.
- Traceable outputs: Final scientific reports cite every statement to either executable code or primary sources.
- External checks: Independent scientists judged 79.4% of report statements accurate. Collaborators said a single 20-cycle run matched ~6 months of their own research time, with valuable findings increasing linearly up to 20 cycles (tested).
- Results: Seven showcased discoveries across metabolomics, materials science, neuroscience, and statistical genetics; three independently reproduced preprint/unpublished results not accessed at runtime; four are claimed as novel.
Why it matters
- Pushes beyond short “demo” agents: Most research agents degrade as actions accumulate; Kosmos targets sustained, multi-step scientific reasoning.
- Reproducibility and auditability: Per-claim citations to code or literature address a core criticism of agentic systems.
- Potential acceleration: If results generalize, this looks like a force multiplier for literature-heavy, code-driven science.
Caveats and open questions
- 79.4% accuracy leaves meaningful room for error in high-stakes domains.
- Compute/cost and generality across fields aren’t detailed here.
- Access and openness (code, models, datasets) aren’t specified in the abstract.
Paper: “Kosmos: An AI Scientist for Autonomous Discovery” (arXiv:2511.02824v2, 5 Nov 2025) DOI: https://doi.org/10.48550/arXiv.2511.02824 PDF: https://arxiv.org/pdf/2511.02824v2
Hacker News Discussion Summary: Kosmos AI Scientist Submission
-
Validity & Novelty of Claims:
- Users debated whether Kosmos truly achieves "Autonomous Discovery" or overstates its capabilities. Some praised its ability to reproduce human scientists' conclusions faster ("79.4% accuracy"), while others questioned if this genuinely accelerates scientific discovery or merely automates literature review.
- Example: A user ("grntbl") found it impressive but noted the 20% error rate could be critical in high-stakes fields. Others argued novelty hinges on whether findings were pre-existing or truly new.
-
Technical Implementation:
- Interest focused on Kosmos’s "world model" architecture, which combines data analysis and literature-search agents. Discussions compared learned relationships vs. human-specified rules, with speculation on whether ML-driven approaches outperform traditional methods.
- Example: User "andy99" questioned if learned models surpass human-coded rules, referencing past ML limitations.
-
Reproducibility & Transparency:
- While traceable outputs (citations to code/sources) were praised, users highlighted missing details on compute costs, code availability, and cross-field applicability. Skepticism arose around whether Kosmos’s "novel discoveries" were preprints or truly unpublished results.
- Example: A subthread ("svnt") linked to prior research, questioning if Kosmos’s findings were genuinely novel.
-
Skepticism & Humor:
- Some comments humorously dismissed claims ("sttstcl vbs" = "statistical vibes") or veered into jokes about extraterrestrial life. Others ("lptns") mocked the hype with "slp dscvr" ("sleep discover").
Key Takeaway:
The community acknowledged Kosmos’s potential as a literature/code-driven research tool but emphasized caveats—error rates, transparency gaps, and unclear novelty—while debating its true impact on accelerating science.
“Artificial intelligence” is a failed technology - time we described it that way
Submission URL | 9 points | by ChrisArchitect | 4 comments
Title: Treating AI as a Failed Technology
A widely shared essay argues large language models have failed as a product class despite relentless hype. Key points:
- Consumers distrust AI features and don’t want them; brands that add them erode trust. After ~3 years, LLMs haven’t met normal success markers.
- The tech’s social and ecological costs are high: energy use, copyright violations, low-paid data labor, and alleged real-world harms—making this more than a good product without a market.
- Corporate adoption is often top-down and defensive (“fear of falling behind”), with many pilots failing (citing an MIT report). The author argues LLM ubiquity is propped up by investment capital and government contracts.
- Example: Zapier’s “AI-first” push now ties hiring and performance to “AI fluency,” with a rubric where skepticism is “Unacceptable” and “Transformative” means rethinking strategy via AI. The author critiques “adoption” as a success metric that says nothing about quality or value.
- Reframing AI as a failure, the piece suggests, could help surface the narrow use cases that actually work and spur better alternatives.
Why HN cares: sharp critique of AI ROI, product-market fit, and the labor/culture impact of AI mandates inside tech companies.
Summary of Hacker News Discussion:
The discussion reflects divided opinions on the critique of AI as a "failed technology":
-
Technical Counterarguments:
- User
symbolicAGIchallenges the failure narrative, citing tools like Anthropic’s Claude for coding efficiency, claiming AI can perform tasks "10,000x cheaper" than humans. This rebuts the essay’s claims about lack of ROI or utility.
- User
-
Skepticism Toward AI Hype:
mnky9800nmocks AI enthusiasm with a sarcastic analogy ("smoking crack"), aligning with the essay’s critique of inflated expectations and corporate FOMO driving adoption.
-
Corporate Dynamics & Alternatives:
p3opl3discusses technical challenges (e.g., hallucinations, data labor) and advocates decentralized, open-source approaches (e.g., OpenCog), criticizing OpenAI’s profit-driven model. This echoes the essay’s concerns about capital-driven ubiquity.
-
Dismissal of Critique:
MrCoffee7dismisses the essay as "Clickbait," reflecting broader polarization in tech circles between AI optimists and skeptics.
Key Themes:
- Debate over AI’s practical value vs. hype.
- Corporate adoption driven by fear vs. genuine utility.
- Calls for transparent, decentralized alternatives to dominant models.
The discussion mirrors broader tensions in tech: balancing innovation with ethical and economic realities.
Flock haters cross political divides to remove error-prone cameras
Submission URL | 46 points | by Bender | 10 comments
Flock Safety’s ALPR empire faces federal scrutiny and local pushback amid error, privacy, and policing concerns
-
Federal heat: Sen. Ron Wyden and Rep. Raja Krishnamoorthi urged a federal investigation, alleging Flock “negligently” handles Americans’ data and fails basic cybersecurity. Wyden warned abuse is “inevitable” and urged communities to remove Flock cameras.
-
Expanding backlash: Campaigns across at least seven states have succeeded in removing Flock systems, with organizers sharing playbooks for others. Critics cite both privacy risks and the tech’s error rate.
-
Documented misuse: Texas authorities reportedly ran more than 80,000 plate scans tied to a suspected self-managed abortion “wellness check.” ICE has accessed Flock data via local police partnerships; Flock says such access is up to local policy.
-
Error-prone tech, real harms: EFF has tracked ALPR misreads (e.g., “H” vs. “M,” “2” vs. “7,” wrong state), leading to wrongful stops and even guns-drawn detentions.
-
Policing by “hits”: A Colorado incident shows overreliance on Flock data. A Bow Mar-area officer accused Chrisanna Elser of a $25 package theft largely because her car passed through town. He refused to show alleged video evidence and issued a summons. Elser compiled GPS, vehicle, and business surveillance proving she never neared the address; charges were dropped with no apology. Quote from the officer: “You can’t get a breath of fresh air, in or out of that place, without us knowing.”
-
Scope creep: As Flock rolls out an audio-based “human threat” detection product, critics warn the error surface—and incentives for shortcut policing—will grow.
Why it matters: The country’s largest ALPR network is becoming default infrastructure for local policing. Between alleged security lapses, expansive data sharing, and documented false positives, the risks aren’t just theoretical—they’re producing bad stops and brittle investigations. The fight is shifting from policy tweaks to outright removal at the municipal level.
Here’s a concise summary of the Hacker News discussion about Flock Safety’s ALPR system and its controversies:
Key Themes from the Discussion
-
Community-Led Removal Efforts
- Users highlighted grassroots campaigns to remove Flock cameras, citing success in at least seven states. Organizers share "playbooks" for others, though reinstalling cameras (e.g., Evanston renewing contracts) remains a challenge.
- Example: A Colorado resident, Chrisanna Elser, disproved false theft allegations using GPS/data evidence after Flock errors led to a wrongful summons.
-
Privacy & Legal Concerns
- Critics emphasized risks like data sharing with ICE, misuse (e.g., Texas abortion-related "wellness checks"), and cybersecurity flaws. The EFF noted ALPR misreads (e.g., misidentifying letters/numbers) causing wrongful detentions.
- Legal liabilities: Municipalities face exposure under Illinois laws for Flock data misuse.
-
Corporate Partnerships
- Lowe’s and Home Depot were flagged for installing Flock cameras in parking lots, sparking debates about boycotts. Some users questioned the practicality of consumer-driven activism against corporate-police partnerships.
-
Technical & Systemic Flaws
- ALPR errors were criticized as systemic, with examples of "hits"-driven policing leading to overreach (e.g., guns-drawn stops over misreads). Expansion into audio-based surveillance raised alarms about compounded risks.
-
Ideological Debates
- Libertarian-leaning users criticized Flock’s growth as antithetical to privacy ideals. Others dismissed boycotts as ineffective, advocating instead for policy changes or municipal-level removals.
Notable Quotes & References
- "You can’t get a breath of fresh air without us knowing": A police quote underscoring pervasive surveillance concerns.
- Project 2025: Mentioned as a potential framework for "deflocking" towns.
- EFF’s Role: Highlighted for tracking ALPR errors and advocating against shortcut policing.
Conclusion
The discussion reflects mounting skepticism toward Flock’s ALPR infrastructure, blending technical criticism with activism-focused strategies. While some push for outright removal, others stress the need for stronger regulation and accountability amid corporate-police collaboration.
What Happened to Piracy? Copyright Enforcement Fades as AI Giants Rise
Submission URL | 111 points | by walterbell | 57 comments
- Thesis: Fang argues that as AI companies race to train models, federal zeal for anti-piracy enforcement has cooled—just as tech giants themselves are accused of using pirated material at scale.
- Then vs. now: In the 1990s–2000s, firms like Microsoft bankrolled aggressive anti-piracy campaigns (e.g., Business Software Alliance) and pushed criminal enforcement; the DOJ’s 2011 Aaron Swartz case is cited as emblematic of that era.
- The pivot: Today, Microsoft, OpenAI, Meta, Google, Anthropic, and others face civil suits from authors and publishers alleging their models were trained on copyrighted books and articles without permission or payment.
- Discovery details: In Kadrey v. Meta, court filings allege Meta used a Library Genesis mirror and torrents; internal emails reportedly show employees uneasy about “torrenting from a corporate laptop” and note the decision was escalated to—and approved by—leadership.
- Big claim: Fang frames this as a stark double standard—after decades of warning about piracy’s harms, the same companies allegedly turned to illicit sources for prized training data.
- Enforcement shift: He says criminal enforcement has largely given way to private litigation, reflecting the industry’s clout in Washington and leaving courts to decide if mass training on copyrighted works is fair use or requires licensing.
- Stakes: Outcomes could reset norms for AI training data, compensation for creators, and how copyright law is applied in the age of foundation models.
The Hacker News discussion on the submission "What Happened to Piracy? Copyright Enforcement Fades as AI Giants Rise" reveals heated debates and key themes:
Key Arguments & Themes
-
Hypocrisy & Double Standards:
- Users highlight the irony of tech giants (e.g., Meta, Microsoft) once aggressively opposing piracy but now allegedly using pirated content (e.g., Library Genesis, torrents) to train AI models. Internal Meta emails reportedly show employees uneasy about torrenting, yet leadership approved it.
- Comparisons are drawn to historical crackdowns (e.g., Aaron Swartz) versus today’s leniency toward AI companies.
-
Legal Shifts & Enforcement:
- Criminal vs. Civil: The DOJ’s past focus on criminal enforcement (e.g., piracy prosecutions) has shifted to civil lawsuits (e.g., Kadrey v. Meta), reflecting industry lobbying power.
- Fair Use Debate: Users clash over whether AI training constitutes transformative "fair use" or requires licensing. Some cite court summaries (e.g., California District ruling) to argue cases are decided procedurally, not on merits.
-
Power Imbalance:
- Small entities/individuals face harsh penalties (e.g., YouTube takedowns, SciHub bans), while AI firms operate with impunity.
- Critics accuse courts and lawmakers of favoring corporations (e.g., Disney, Google) over creators, undermining copyright’s original purpose.
-
Technical & Ethical Concerns:
- Data Sources: Meta’s alleged use of pirated books contrasts with platforms like YouTube enforcing strict anti-piracy rules.
- Compensation: Calls for AI companies to pay creators for training data, mirroring systems like ASCAP for music licensing.
-
Cynicism Toward Systems:
- Users argue copyright law is weaponized against individuals while tech giants exploit loopholes (e.g., “transformative use” claims).
- Mentions of Annas Archive being targeted, while AI firms use similar data without repercussions.
Notable Quotes
- On Hypocrisy:
“After decades of warning about piracy’s harms, the same companies turned to illicit sources for training data.” - On Legal Bias:
“Big AI companies have a legal blind spot—what’s theft for us is ‘innovation’ for them.” - On Fair Use:
“Training LLMs on copyrighted books isn’t transformative—it’s theft with extra steps.”
Conclusion
The discussion underscores frustration with systemic inequities, where AI giants leverage legal and financial clout to sidestep accountability, while creators and smaller entities bear enforcement’s brunt. The outcome of ongoing lawsuits could redefine copyright norms in the AI era, balancing innovation with creator rights.