AI Submissions for Tue Jun 02 2026
AI outperforms law professors in Stanford Law study
Submission URL | 381 points | by berlianta | 332 comments
- In a blind test of nearly 3,000 head-to-head comparisons, 16 U.S. law professors judged AI-generated responses to contract-law questions better than answers written by other professors in 75% of cases.
- The questions were realistic “office hours” prompts (40 in total), emphasizing judgment and nuanced reasoning rather than right-or-wrong facts. AI performance was comparable to the best human instructor.
- Professors flagged AI answers as pedagogically harmful only 3.5% of the time, versus 12% for human-written responses. Researchers matched response length/structure and used multiple evaluation methods to reduce bias.
- Multiple models were tested (including commercial tutors and Google’s NotebookLM); performance varied, but AI was still often preferred even with context constraints.
- Caveats: small sample, single domain (contracts), and the study evaluates answer quality—not the broader impacts of integrating AI into legal education. Authors caution against wholesale adoption and urge focus on responsible deployment.
Why it matters: Prior AI benchmarks skew toward domains with clear right answers; this suggests LLMs can perform strongly in judgment-heavy fields like law, potentially reshaping tutoring and access to expert guidance—if implemented thoughtfully.
Here is a summary of the Hacker News discussion surrounding the Stanford study on AI and law professors:
The Hacker News community had mixed reactions to the study, balancing curiosity about AI’s capabilities with heavy skepticism regarding the methodology and the real-world implications of using LLMs in high-consequence environments.
Here are the central debates and takeaways from the thread:
- Skepticism Over Methodology and Sample Size: Some users—including one identifying as both a lawyer and a statistician—called out "red flags" in the study's data. They argued that a sample size of just 16 professors conducting 3,000 comparisons leaves room for massive variance. The critique is that just a few inherently poor human instructors could heavily skew the results in the AI's favor, weakening the statistical power of the conclusion.
- The "Human-in-the-Loop" Paradox: A major debate emerged over how AI will actually be used by professionals. While some argued that AI will drastically lower the required skill level for legal or medical work, experts pushed back. They noted that reviewing AI-generated text often requires more expertise, not less. Because AI hallucinations can be incredibly subtle, spotting a legal error in an AI-drafted contract requires a highly trained eye to avoid multi-million-dollar consequences.
- The "Self-Driving Car" Problem for Knowledge Work: Several users compared LLMs in law to autonomous vehicles. Just as humans struggle to passively monitor a self-driving car and react perfectly in the split-second it makes a catastrophic error, professionals will struggle to stay alert while babysitting an AI. Even if an AI is 80% to 95% accurate, the remaining gap in high-risk environments (like law or medicine) makes full autonomy a distant frontier.
- Engineering vs. "Shooting Straight" from an LLM: Rather than just using raw prompts, some users pointed out that the real future of AI in law relies on building robust, deterministic pipelines. By using multiple models to check each other's work and implementing independent quality gates (similar to AlphaCodium), the reliability of AI dramatically increases.
- Existential Dread, UBI, and the Loss of Purpose: The thread naturally veered into the socio-economic impacts of AI replacing knowledge workers. While some optimists hoped that falling labor costs would usher in Universal Basic Income (UBI) and free humanity from "soul-sucking" jobs, pessimists argued that modern capitalism and corporate structures make UBI highly unlikely in places like the US. Interestingly, a psychological argument was raised: pointing to historical precedents, one user argued that stripping people of their professional craft and fundamental purpose (even in stressful jobs like law or investment banking) could lead to an epidemic of societal psychological breakdown.
The TL;DR: While the community largely agrees that AI has made massive leaps in domain-specific reasoning, HN remains highly skeptical of using this study to predict imminent job replacement. In high-risk fields like law, an AI that is almost right is still too dangerous to operate without highly skilled human oversight.
How we index images for RAG
Submission URL | 193 points | by mooreds | 26 comments
How we index images for RAG (Kapa) Core idea: Don’t send images to the model at query time. At indexing, run a cheap vision model once to caption each image; store those captions as text and retrieve them like any other chunk. The model sees text-only context and cites the original image URL.
Why this beats query-time multimodal
- Cost: Raw images added 27% per-query cost on GPT 5.1 and 51% on Claude 4.6 Sonnet (Claude ~975 tokens/image vs GPT ~716).
- Fit: Typical queries touch 20–30 images (tail >130). With 30–50 MB payload caps (Claude/OpenAI), you quickly hit limits.
- Retrieval: CLIP-style embeddings blur the fine detail that matters in tables, charts, and annotated screenshots; short technical queries don’t give enough signal to match image vectors.
What images do in docs
- Illustrative: Screenshots make textual instructions immediately actionable (which icon, where to click).
- Load-bearing: Tables, matrices, schematics often contain values found only in the figure.
Impact
- With image-derived text available, an LLM judge preferred answers across three customer projects and two models (McNemar’s test, p < 0.05).
- Query costs rise only 1–6% vs text-only.
- Users get more specific, actionable answers (e.g., exact UI path plus screenshot reference), reducing support escalations.
How it works
- Ingestion: A vision-language model captions each image.
- For screenshots: descriptive captions.
- For figures/tables: transcriptions of actual content (values, labels, structure).
- Retrieval/Generation: Pure text flow; captions are retrieved with normal chunks; the answer cites the original image URL.
- Note: Preserving table/matrix structure at ingestion avoids the “flattened text” errors that lead to wrong claims. Microsoft research arrived at a similar “describe at ingestion” approach.
Production lessons called out
- Filtering is essential: most images are junk (logos, avatars, banners, social cards).
- First-pass heuristics: drop unsupported formats, tiny images, extreme aspect ratios.
- Then a cheap zero-shot classifier over multimodal embeddings to decide what’s worth captioning.
- Indexing is a one-time cost; the rest of the pipeline stays text-only.
Bottom line: Describe images once, as text, and treat them like any other chunk. You get most of the value of vision in RAG without paying a vision tax on every query—and answers get measurably better.
Here is a summary of the Hacker News discussion regarding Kapa's approach to indexing images for RAG:
Overall Sentiment The Hacker News community largely validates Kapa’s “describe at ingestion” approach, with many developers noting they have successfully used this exact "eager processing" pattern in their own pipelines or personal knowledge management systems. While a few cynics dismissed the original post as AI-generated marketing fluff, the majority engaged in a highly practical technical discussion about the nuances of image-to-text retrieval.
Key Themes & Takeaways from the Comments:
- Validation from the Trenches:
- Several developers chimed in to say they have been doing this for years with great success. One user detailed how they handle personal infodumps in Obsidian: whenever an image is important, they generate a text description upfront to ensure it surfaces during text-based searches.
- Another user working on enterprise RAG systems with dense PDFs and PowerPoints noted that text-based retrieval (aided by generated descriptions or OCR) often works "leaps and bounds" better than pure image-based retrieval.
- The Power of Mermaid.js for Diagrams:
- A highly discussed tactic is converting block-and-arrow diagrams or flowcharts directly into Mermaid.js code during the ingestion phase. Users noted that models like ChatGPT are surprisingly good at translating arbitrary structural diagrams into Mermaid facsimiles, which perfectly preserves the structural layout of the image in a highly searchable, pure-text format.
- The Drawbacks of Pre-Processing (Vagueness & Non-Determinism):
- While the "eager processing" approach saves money at query time, commenters pointed out its main flaw: you are entirely relying on the AI to guess what will be important about the image later. Because LLM outputs can be non-deterministic, long-winded, or overly vague, the ingestion model might miss a tiny visual nuance that a user will specifically query for in the future.
- Multimodal Embeddings vs. Text Embeddings:
- The thread expanded on why pure multimodal retrieval struggles here. Commenters agreed that models using CLIP-style multimodal embeddings fail to capture the fine, granular details found in data tables, charts, or annotated screenshots. Furthermore, short, textual, technical queries simply don't provide enough signal to accurately match against dense image vectors. Ultimately, the community agreed that deciding between text-captioning vs. pure multimodal retrieval comes down to latency, cost, and specific product use cases.
Minor Notes: A user linked to open-source frameworks utilizing similar logic, and the original author was present in the thread to field questions and fix UI bugs on their website.
Adafruit receives demand letter from Fenwick legal counsel on behalf of Flux.ai
Submission URL | 665 points | by semanser | 273 comments
Adafruit says Flux.AI’s lawyers sent a demand letter; proposes open dialogue instead of litigation
-
What happened: Adafruit says it received a May 22 demand letter from Fenwick & West partner Jonathan F. Lenzner, counsel for Defy Gravity, Inc. (Flux.AI), warning Adafruit not to publish an article the firm claims contains false and potentially defamatory statements about Flux’s IP, traction, and user base, and raising Computer Fraud and Abuse Act (CFAA) claims.
-
Adafruit’s position: The company says it accessed only information Flux’s own systems made publicly available due to a server misconfiguration, framed its work as responsible disclosure on a matter of public security interest, and “vigorously rejects” the letter’s assertions. Adafruit has temporarily paused blog publishing while considering next steps.
-
Update: On June 2, founder Limor “Ladyada” Fried says she reached out directly to Flux founder Matthias Wagner (not via lawyers), proposing a live podcast with open Q&A to address concerns transparently, even offering to keep cofounder Phil off the podcast. She characterized it as an attempt to de-escalate and “build rather than litigate.” The offer remains open.
-
Why it matters: The dispute sits at the intersection of security research, responsible disclosure, and legal risk. The CFAA mention and defamation claims could chill reporting; Adafruit’s push for a public conversation highlights a contrasting openness ethos in the hardware/maker community.
-
What’s next: Adafruit says it will update the community after deciding on a response; no public reply from Flux noted in the post.
Based on the Hacker News discussion, the community's reaction to the dispute between Adafruit and Flux.AI is highly active, heavily favoring Adafruit, and highly critical of Flux.AI's product and legal tactics.
Here is a summary of the main themes and arguments from the comment section:
1. Overwhelming Support for Adafruit’s Track Record The majority of the thread is filled with praise for Adafruit. Users from various backgrounds (from hobbyists to Amazon Devices engineers) commended Adafruit for its massive contributions to the open-source hardware community. Commenters specifically highlighted Adafruit's top-notch customer service, reliable shipping, high-quality electronics, and dedication to maintaining documentation and software for legacy parts years after they are no longer sold.
2. Direct Engagement from Adafruit's Founder Limor “Ladyada” Fried participated directly in the thread. She reiterated her position, stating she reached out directly to Flux’s founder, Matthias Wagner, to bypass the lawyers. She emphasized her desire to de-escalate the situation, resolve the issue peacefully for the good of the electronics community, and hash out their differences transparently on a live podcast.
3. Debate Over Publishing the Demand Letter A vocal debate sparked regarding Adafruit's decision not to publish the legal demand letter.
- The Skeptic: One user argued that Adafruit was being "passive-aggressive" and manipulative by announcing the legal threat to gain sympathy without actually sharing the letter's contents to prove they have "clean hands."
- The Defense: Multiple users strongly pushed back against this critique. They argued that unilaterally publishing a demand letter while actively trying to negotiate an out-of-court, public resolution with the Flux founder would be a massive, hostile escalation. Furthermore, they pointed out that Adafruit likely had to post something to explain the sudden pause in their normally active blog, and that any competent legal counsel would advise against publicly posting the letter while the dispute is pending.
4. Sharp Criticism of Flux.AI’s Product and Practices Several users shared negative firsthand experiences with Flux.AI. Commenters described the software as frustrating and expensive, noting that its AI-driven component placement and routing algorithms handled simple boards poorly compared to standard software like KiCad. Furthermore, one user claimed that after spending money on Flux, the founder's automated system scheduled a meeting with them, only for the founder to "ghost" (fail to show up to) the meeting. Commenters criticized Flux for prioritizing "lawfare" over addressing core product issues and customer service.
5. Tangent: AI vs. Deterministic Engineering Tools Triggered by the critiques of Flux.AI, the thread briefly veered into a highly technical discussion about the current limitations of AI tools in precise engineering environments. Hardware engineers and software developers debated the pitfalls of trying to replace classical, deterministic methods (like simulated annealing for PCB routing) with LLMs and AI agents, drawing parallels to current frustrations with GitHub Copilot and complex code-base management (LSP integration).
Bringing Up DeepSeek-V4-Flash on AMD MI300X
Submission URL | 117 points | by kkm | 20 comments
Bringing up DeepSeek-V4-Flash on AMD’s MI300X: great hardware, messy software
Doubleword documents the gritty path to running DeepSeek-V4-Flash on AMD’s MI300X using vLLM—and why this underutilized accelerator is tempting despite the pain. With 192GB of HBM3 (vs 80GB on H100), comparable FP8 throughput, and roughly half the list price, MI300X is still rentable on-demand at lower prices while NVIDIA capacity is scarce and climbing. The catch is software: out-of-the-box, DeepSeek on MI300X didn’t work as of early May 2026.
Key hurdles and fixes
- FP8 dialect mismatch: MI300X uses AMD/Graphcore’s older “fnuz” FP8 (no -0/inf), while newer AMD parts (MI325/350/355X) and NVIDIA follow the OCP FP8. vLLM knew about e4m3 vs e5m2 but not fnuz vs OCP; the exponent bias differs by one, so values can be off by exactly 2x if decoded with the wrong dialect. They patched vLLM so compressors, fused quant/cache writes, and sliding-window K-cache all use the platform’s FP8 type and fnuz-aware paths.
- Sparse attention fast paths: DeepSeek v4 uses a learned top‑k indexer plus a sliding window, FP8 KV caches, and compression—lots of kernels to tune. AMD’s AITER library provides the tuned paths, but coverage is spotty on MI300X’s CDNA3 (gfx942); where missing or broken, vLLM falls back to generic Triton, which is much slower.
- Missing on gfx942: paged MQA logits, sparse MLA prefill/decode → added ROCm helpers that call AITER when available, else Triton.
- Present but broken on gfx942: AITER prefill MQA logits and sparse prefill logits → guarded off on gfx942 to force Triton fallbacks.
- HIP graphs: To cut Python overhead from hundreds of per-token launches, they lean on HIP graphs (AMD’s CUDA-graph analog). Graph capture demands a “pure” region, so DeepSeek’s many moving parts required careful plumbing to make decode graphable.
Why it matters
- If you can afford engineering time, MI300X can unlock cheaper, immediately available inference capacity with huge memory headroom—valuable for large KV caches and longer contexts.
- The work highlights where AMD’s software stack still lags, especially for older parts: FP8 standardization pitfalls, uneven tuned-kernel coverage, and the importance of graph execution for LLM decode.
- Doubleword published demo PRs showing the concrete vLLM changes (fnuz-aware FP8 handling, ROCm/AITER guards, Triton fallbacks), a useful roadmap for anyone trying to stand up DeepSeek or similar sparse-attention models on MI300X.
Here is a summary of the Hacker News discussion regarding Doubleword’s efforts to run DeepSeek-V4-Flash on AMD’s MI300X:
Validation of AMD's Software Struggles Commenters largely agreed with the article’s core premise: AMD hardware is incredibly capable, but the software stack requires heavy lifting. One user shared a similar experience getting Gemma 4 31B to run on older AMD MI250X hardware, noting it required a massive amount of software-side engineering. Another user linked directly to the vLLM GitHub patches Doubleword created to make DeepSeek work, highlighting the necessity of manual community fixes.
Bullish on AMD Despite the Pain Despite the software headaches, the sentiment in the thread is largely "long AMD." Users noted that the market desperately needs a viable alternative to NVIDIA's monopoly in AI hardware. Commenters agreed with Doubleword that AMD is highly attractive for "low-interactivity" or bulk inference tasks where upfront engineering costs can be amortized over cheaper hardware. The thirst for compute was further highlighted by a side-discussion where users were tracking down obscure Asrock BC-250 mining cards just to get their hands on affordable AMD silicon.
The Economics of Inference and Memory The discussion touched on the steep costs of running large models. One user noted the growing trend of providers charging distinct prices for cached versus uncached inputs, given that workloads like DeepSeek are highly dependent on large memory caches. There is a strong consensus that the industry needs cheaper memory to bring down the monthly costs of running models of this size, with some expressing hope that emerging competitors (such as Chinese memory manufacturers) will eventually drive down component prices.
Alternative Ecosystems and Providers Different layers of the AI stack chimed in on the thread. The CEO of Hot Aisle (a compute provider) praised the work, noting they support self-hosted MI300X instances for these exact types of deployments. Meanwhile, some users wondered if alternative software stacks—specifically Modular's Mojo—could be partnered with to bypass the ROCm/vLLM bottleneck entirely and provide smoother out-of-the-box inference on alternative hardware.
GitHub Copilot App
Submission URL | 116 points | by theanonymousone | 75 comments
GitHub unveils Copilot app (technical preview): an agent-driven desktop client for the full dev cycle
What’s new
- End-to-end workflow in one place: Pick issues/PRs from an inbox, have agents implement changes, review diffs, and even merge. You can also let an agent “close the loop.”
- Parallel agents: Run multiple, isolated agent sessions across repos with real-time tracking.
- Extensible automation: Automate recurring workflows and extend agents with MCP servers and custom skills.
- Built natively on GitHub: Designed around issues and PRs rather than just code completion inside an editor.
Availability
- Install now if you’re on Copilot Pro, Pro+, Max, Business, or Enterprise.
- Others can join a waitlist; access for Copilot Free/new customers is “coming soon.”
- Business/Enterprise require org-level opt-in for the preview and Copilot CLI enabled.
Why it matters
- Pushes Copilot beyond in-IDE suggestions into autonomous, multi-step agents integrated directly with GitHub’s PR/issue flow.
- A bid to own the “issue-to-merge” surface outside traditional IDEs—positioning against tools like Cursor, Windsurf, and Cody.
- Transparency (session isolation/tracking) may help trust, but agent-led merges raise process and governance questions for teams.
What to watch
- How well MCP-based extensions and custom skills integrate with real-world workflows.
- Enterprise controls, auditing, and safety around automated merges.
- Lock-in concerns for orgs standardized on other platforms or IDE-centric agents.
While the announcement of GitHub's new standalone Copilot app promises an autonomous "issue-to-merge" workflow, the Hacker News community's reaction was a mix of technical curiosity, UI sleuthing, and existential reflection on GitHub's changing identity.
Here are the main themes from the discussion:
- Under the Hood with
git worktree: The feature that allows parallel agents to work on multiple repositories at once sparked a deep technical discussion. Many developers praised GitHub’s apparent reliance ongit worktreeto handle session isolation, noting it is an elegant solution to the age-old problem of stashing work and context-switching. However, a minor debate ensued about whether spinning up Docker containers would have provided a cleaner, more isolated environment than local worktrees. - UI Origins and the "Cursor Killer" Strategy: Eagle-eyed users immediately recognized the app’s interface. Many pointed out that its design and left-sidebar structure heavily borrow from GitHub Next's earlier experimental "Project Ace" and the Codex UI. Commenters widely view this as GitHub's strategic (and somewhat belated) response to rising competitors like Cursor and Windsurf.
- Nostalgia vs. The C-Suite: A philosophical thread emerged contrasting GitHub's origins with its current trajectory. Veteran developers lamented that the GitHub of 2008 was built around making human collaboration easy and beautiful. Today, they feel the platform is catering increasingly to C-suite executives who want automation over human interaction, leading to concerns that GitHub is losing its "community hub" soul.
- Pricing, Tokens, and Competitors: Several users expressed frustration with Copilot's pricing structure and perceived token limits. A vocal segment of commenters shared that they have already moved away from Copilot in favor of custom or alternative setups powered by cheaper or superior models, specifically citing Anthropic's Claude 3.5 Sonnet and the highly cost-effective DeepSeek models.
- Frustrations with Core Reliability: The polished launch of this AI tool rubbed some the wrong way given GitHub's recent streak of site availability issues. Others contrasted the highly refined Copilot app with recently released core-platform features—like native Stacked PRs—which users felt were clunky, poorly designed, and treated as an afterthought in comparison to AI initiatives.
- Automated Merges & Security Risks: A few commenters expressed anxiety around the governance of agent-driven merges. They cautioned that allowing AI agents to fully close the loop on pull requests could introduce new supply chain vulnerabilities and security risks if not carefully monitored.
How is Groq raising more money?
Submission URL | 152 points | by hasheddan | 68 comments
Groq is reportedly raising $650M—wait, didn’t Nvidia acquire them? Not exactly. Axios’s scoop meets a key nuance: Nvidia licensed Groq’s tech and hired its core chip/compiler/software teams, but the Groq corporate entity survived and now runs four AI inference datacenters plus an API focused on smaller models.
Why this makes sense:
- Datacenters are the new bottleneck. Power, permits, and expertise are slowing new builds, so existing, running AI DCs are scarce and valuable. For investors who want direct DC exposure, there aren’t many startup options.
- Comparables are eye-popping: CoreWeave (
$50B/43 DCs) and Nebius ($50B/11 DCs). By that yardstick, Groq’s four DCs could justify a multibillion valuation on infrastructure alone.
But there are real caveats:
- Hardware age: Groq’s facilities run on seven-year-old LPUv1s. Nvidia is now selling LPUv3s (derived from Groq’s architecture) broadly, eroding Groq’s speed-as-differentiator.
- Strategy fit: Groq’s all-SRAM design delivers blazing tokens-per-second on smaller models (up to ~120B params) but worse tokens-per-dollar, and can’t economically host frontier-scale models without HBM.
- Brand confusion: “Acquired” yet still operating, and tightly associated with ultra-fast, high-cost inference—at a time when many buyers prefer cheaper, batched throughput and are balking at AI tool costs.
What to watch: Can Groq cheaply refit its DCs with new hardware (possibly via a sweetheart Nvidia deal), and does the market reward high-speed, high-cost tokenomics—or shift decisively to lower-cost, batched inference? The raise is essentially a bet that datacenter scarcity beats chip uniqueness.
Here is a summary of the Hacker News discussion regarding Groq, its recent funding, and its complex relationship with Nvidia:
The "Shadow Acquisition" and Anti-Trust Dodging A major portion of the thread is dedicated to untangling the exact nature of Nvidia’s relationship with Groq. Several commenters accuse tech media of "group psychosis" for framing the deal merely as a "white-glove product rental" or a non-exclusive licensing agreement. Users argue this was essentially an end-run around SEC and FTC anti-monopoly regulators. One commenter pointed to a (purported) Nvidia 10-K filing showing a $1.3 billion cash flow outlay for the Groq transaction, arguing that Nvidia effectively bought out the tech and talent to neutralize a competitor, even if Groq was left as a nominally independent corporate shell to raise this new $650M.
Blistering Speed vs. Secret Quantization Developers uniformly praise Groq’s raw speed—often citing 200 to 1,000 tokens per second—noting that instantaneous responses fundamentally change the programming experience by eliminating "context switching" while waiting for a model to reply.
However, a fierce debate is brewing over how Groq achieves these speeds. Multiple users claim Groq secretly quantizes the models underneath to achieve low latency, which degrades performance on precision-heavy tasks like tool-calling. Some developers reported that Groq-hosted models performed noticeably worse on verification programs compared to traditional hosts, citing "random errors and silly quirks," while others pushed back, asking for systemic proof of these degradations.
The Architecture War: SRAM vs. Batching The thread dives deep into the technical limitations of Groq’s SRAM-heavy chips versus Nvidia’s GPUs.
- The Groq Advantage: Proponents argue Groq is a fundamentally superior product for low-latency inference because it doesn't rely on the bulk/batch processing necessary to make Nvidia’s architecture economically viable.
- The Hardware Reality: Critics point out that Groq's low memory density means it takes massive amounts of hardware (e.g., 6 whole server racks) to serve a model like Llama3-70B, making it economically unviable for frontier-scale models.
Enter Cerebras With Groq reportedly retiring support for large models like the 1-trillion parameter Kimi K2, the community is looking toward competitors. Cerebras was frequently mentioned as the next hope for ultra-fast, large-model inference, though commenters noted Cerebras appears to be pivoting away from regular developer APIs toward lucrative $5M+ dedicated enterprise endpoints.
The Takeaway The Hacker News consensus is a mix of awe at Groq's latency and skepticism about its business model. While Groq's architecture opens up new paradigms for developer workflow, questions remain about its hardware efficiency, hidden quantization trade-offs, and what its future actually looks like living in Nvidia's shadow.
Show HN: Paseo – Beautiful open-source coding agent interface
Submission URL | 80 points | by timhigins | 47 comments
Paseo: one interface to run Claude Code, Copilot, Codex, OpenCode, and Pi agents locally, in parallel, across devices
What it is
- A self-hosted “daemon” that orchestrates coding agents and exposes a WebSocket/MCP API, with clients for desktop (Electron), mobile/web (Expo), and CLI.
- Unified, provider-agnostic UI/CLI so you can pick the best model per task and hand work between them.
Why it matters
- Multi-agent workflows without vendor lock-in: plan with one model, implement with another, verify in a loop.
- Truly cross-device: start at your desk, continue from your phone, or drive everything from the terminal.
- Privacy-first posture: no telemetry or forced logins; your agents run against your local dev environment.
How it works
- You install at least one provider’s agent CLI (e.g., Claude Code, Copilot, OpenCode, Pi), then run Paseo’s daemon.
- The CLI mirrors the app’s capabilities: e.g., paseo run --provider claude/opus-4.6 "implement user authentication"; attach to live output; send follow-ups; target specific worktrees; run against a remote daemon.
- “Skills” extend orchestration inside any agent chat:
- /paseo-handoff: pass work between agents (e.g., Claude plans → Codex implements)
- /paseo-loop: iterate until acceptance criteria are met (Ralph loops), with optional verifier
- /paseo-advisor: spin up a second-opinion advisor without delegating work
- /paseo-committee: two contrasting agents do root-cause analysis and produce a plan
- Remote access via a self-hosted relay (Go), with optional TLS and an example nginx WebSocket proxy.
Getting started
- Desktop app: download from paseo.sh/download (daemon starts automatically; scan a QR in Settings to pair your phone).
- CLI/headless: npm install -g @getpaseo/cli, then run paseo to launch and display a QR code for client pairing.
Notable details
- Monorepo packages include server (daemon, MCP), app, CLI, desktop, relay, and website.
- License: AGPL-3.0.
- Repo stats at time of posting: 7.6k stars, 715 forks; 111 releases (latest v0.1.89 on Jun 2, 2026).
Caveats and considerations
- You still rely on provider credentials/APIs for the underlying models; “no telemetry” refers to Paseo itself.
- Granting agents write access to your repo is powerful—use branches/worktrees and reviews, especially with loops.
- Exposing the relay requires careful TLS and network configuration to avoid leaking access to your dev environment.
HN angle
- A timely alternative to IDE-tied assistants (Cursor, Cody, etc.) and single-model tools: self-hosted, multi-provider, mobile-friendly agent orchestration with opinionated workflows (handoff, loops, committees).
Here is a daily digest summary of the Hacker News discussion surrounding Paseo, the self-hosted orchestrator for local and cloud AI coding agents:
Paseo Hits HN: The Community Debates Mobile Coding, Open Source Monetization, and UI Aesthetics
Paseo, a self-hosted daemon that orchestrates coding agents (like Claude Code, Copilot, and open-source models) across desktop, terminal, and mobile interfaces, generated a lot of buzz on Hacker News. Users were largely impressed by its multi-provider orchestration and anti-vendor lock-in approach, but the discussion quickly branched out into philosophical and technical debates.
Here are the top takeaways from the comment section:
1. The Great "Coding on Mobile" Debate One of the most active threads revolved around Paseo’s mobile and web capabilities.
- The Skeptics: Some users expressed dismay at the idea of coding on a smartphone, viewing it as a depressing symptom of hustle culture and the inability to disconnect from work.
- The Defenders: The backlash to the skepticism was swift. Several developers—particularly parents—praised the mobile functionality as life-changing. Users shared stories of pushing code while at their kids' swimming lessons, brainstorming architecture while taking walks, and using iPad setups to escape heavy laptops. One user even mentioned using voice control on bike rides to draft React components, arguing that mobile access actually gets developers away from their desks.
2. Maintainer Insights & Business Model
The project's maintainer (bdr) was highly active in the thread, answering questions and clarifying Paseo's direction:
- Monetization: When asked about how Paseo plans to compete with well-funded tools like Cursor, the maintainer noted they are currently focusing on building a great "local-first" open-source layer. Any future monetization will likely come from building an enterprise convenience layer on top of the free FOSS base.
- API vs. Subscriptions: Users asked how this interacts with Claude’s billing. The maintainer clarified that using Claude Code programmatically through Paseo relies on API credits, not a standard Claude Pro subscription.
- Comparisons: When asked how Paseo differs from competitors like Conductor, the maintainer highlighted Paseo's FOSS/AGPL license, local-first architecture (daemon/client), mobile apps, and lack of forced logins or telemetry.
3. The Semantics of a "Beautiful" UI In classic Hacker News fashion, users got caught up in a semantic debate over the word "beautiful" (which was used by the original poster in the HN title).
- UI designers and developers debated whether a clean, unopinionated Tailwind CSS and Lucide Icon design could actually be called "beautiful," or if it was simply "utilitarian" and "convenient."
- The maintainer stepped in to clarify that they didn't actually use the word "beautiful" in any of their marketing materials or docs, preferring users to judge the interface for themselves. Regardless, many users praised the interface for being remarkably clean and functional for an open-source project.
4. Technical Features and Alternatives
- Capabilities: Users were impressed that Paseo supports embedded images, charts, and MDX rendering directly in the chat.
- Token Management: Technical questions popped up regarding token caching—specifically how Paseo handles the potentially high API costs of 30-step multi-agent loops if large context windows aren't cached properly.
- Comparisons to Other Workflows: The thread served as a good discovery zone for similar tools, with users trading notes on alternatives like Punchimber, Conductor, Superconductor, and even custom Gitea interfaces where users
@mentionAI agents directly in pull requests.
The Verdict: The community sees Paseo as a highly timely and capable open-source alternative to IDE-locked tools like Cursor, particularly for developers who want flexibility in which models they use and portability in where they use them.
Now AI agents need what RSS does
Submission URL | 82 points | by julienreszka | 61 comments
RSS isn’t dead; it’s what AI agents actually need
TL;DR: Google Reader’s demise killed human discovery via feeds, not RSS itself. Podcasts quietly proved RSS’s durability, and AI agents now prefer exactly what RSS offers: deterministic, structured, pull-based access without auth or rate limits. If you want agents to reliably ingest your content, publish an RSS feed.
What’s new
- The argument: Social algorithms beat RSS for human attention because they trade on surprise. Agents don’t want surprise—they want a predictable list of new items in a stable, parseable format with no platform gatekeeping.
- RSS fits that brief: deterministic chronology, machine-readable structure, no ad-tied rate limits, no login for public content. Social APIs offer the opposite—and change or get paywalled.
- Proof it never died: The $25B podcast industry still runs on RSS (Apple, Spotify, Overcast, Pocket Casts all pull feeds). No middleman to negotiate, just URLs that keep working.
- Trend watch: After years of decline in Google Trends, “RSS” spikes in 2025—framed as the agent era rebooting feeds for written content, filings, and newsletters.
Why it matters
- Agents that monitor competitors, track regulations, or fetch research need reliability more than recommendation. RSS minimizes breakage and ongoing maintenance versus scraping.
- For publishers, an RSS feed can be a distribution channel to agent aggregators—reach without platform dependency.
Debate from the comments
- Pro-RSS (dev viewpoint): Wiring in sites with feeds takes seconds; scrapers are brittle, break on redesigns, CAPTCHAs, and bot blocks. Maintenance cost scales with number of sources.
- Skepticism (publisher viewpoint): Machine-digestible feeds can commodify content; some may pull feeds to prevent AI reuse.
- Counterpoint (scraping works now): Agents can parse HTML too. Rebuttal: they can—until they can’t. Determinism and low upkeep win at scale.
- Anecdote: Adding a newsletter RSS led to automatic pickup by niche aggregators—“the feed was the distribution.”
Takeaway
- If you want AI agents and aggregators to find and trust your updates, ship an RSS feed. Keep it structured, stable, and public. In an agent-first world, open pull beats closed push.
Here is a summary of the Hacker News discussion to include in your daily digest:
The Conversation: Are AI Agents the Saviors or Destroyers of RSS? While the original article argued that RSS is the perfect, machine-readable format for AI agents, the Hacker News community immediately dug into the technical and economic realities of this "agent-first" world.
Here are the top takeaways from the comment section:
- The Publisher’s Dilemma (Ad Revenue vs. AI): Several users pointed out a glaring economic flaw. Publishers rely on pageviews and ad revenue to survive. If AI agents pull a tiny, cheap 50KB RSS feed, summarize it, and deliver it to the end-user, the publisher makes $0. Many argued that publishers have no incentive to maintain rich RSS feeds for AI unless those endpoints are licensed and paid. Otherwise, AI is just slurping content without providing backlinks or traffic.
- The "DDoS by Polling" Problem: A fascinating technical debate emerged around rate limits and server load. Users referenced Rachel by the Bay’s famous complaints about poorly built RSS readers hammering web servers. If millions of personalized AI bots constantly pull feeds to stay updated, it could overwhelm publishers. Some argued that "pull-based" RSS needs to be replaced by "push-based" protocols (like ActivityPub or PubSubHubbub), or we need a giant centralized cache—ironically, exactly what Google Reader used to do.
- Users Are Already Building This: The theory is already a reality for many HN readers.
- Users shared how they use local LLMs and tools like Claude to aggressively filter Hacker News down to their specific interests, essentially building their own tailored briefings.
- One user (
dchk) shared an open-source Rails project they built that acts exactly like a Hacker News clone, but is entirely powered by RSS feeds and AI-generated bullet points. (They even ended up live-debugging a Firefox user-agent bug with the HN community right in the thread).
- Hidden RSS Gems & Defenses: In a classic HN tangent, users traded their favorite RSS tips. A highly upvoted revelation for some was that every YouTube channel natively has an RSS feed (just grab the channel ID). Conversely, for publishers wanting to keep AI out of their feeds, users noted that standard
robots.txtpractices still apply. - The Circle of Life: One commenter perfectly summed up the irony of the situation: "We spent a decade killing structured feeds in favor of algorithmic timelines, and now we’re rebuilding algorithms on top of structured feeds. It’s the circle of tech life."
More than 6 out of 10 people turn to AI for psychological support
Submission URL | 80 points | by mgh2 | 84 comments
AXA/Ipsos: 61% use AI for mental health; 28% of those users say AI advice led to harmful behavior
What’s new
- Mental health is sliding: 46% of respondents say they’re “struggling or languishing.” In 10 of 16 tracked countries, mental health scores hit their lowest since 2021.
- Screen time vs. well-being: People report 5.1 hours/day on screens on weekdays (excluding work/study and weekends); two-thirds say screens negatively affect their mental health.
- Turning to AI: 61% already use AI for mental health questions; 42% of those users say they “almost always” follow the advice (≈26% of all adults surveyed).
- Mixed outcomes: 55% are satisfied with AI advice, but 32% felt uncomfortable with it at least once, and 28% say AI recommendations led them to harmful behavior (≈17% overall).
- Trust still favors humans: Only 38% trust AI platforms more than mental health professionals.
- Care gap: 43% of people flagged as potentially in “mental suffering” didn’t see a professional in the past year—citing lack of perceived need first, then cost and time.
- Workplace angle: 84% (88% of 18–24-year-olds) would join employer mental health programs. AXA cites WHO’s $1T annual productivity loss from depression/anxiety; in France, these are now the top cause of long-term sick leave, especially among under-30s.
Why it matters
- AI has quickly become a first-line, always-on support channel for mental health—reducing access barriers but introducing new safety risks.
- A sizable minority following AI advice almost by default raises design, disclosure, and guardrail questions for consumer AI and health-adjacent products.
- Employers are seen as key distribution/activation points for prevention and early support.
Methodology
- Ipsos surveyed 19,000 adults (18–75) across 18 countries, Jan 12–Feb 16, 2026.
Caveats
- Corporate press release (AXA) with a clear employer/insurer lens.
- Self-reported metrics; “harmful behavior” and “mental suffering” aren’t fully defined here.
- Minor inconsistency: results cite “10 of 16 countries” while the study spans 18 countries.
Here is a daily digest summary of the Hacker News discussion regarding the AXA/Ipsos study on AI and mental health.