AI Submissions for Sat Nov 22 2025
Show HN: I built a wizard to turn ideas into AI coding agent-ready specs
Submission URL | 24 points | by straydusk | 12 comments
AI-Powered Spec Generator is a landing-page pitch for a tool that turns rough product ideas into full technical specs via a single structured chat. It promises to replace back-and-forth prompt tinkering with a guided flow that produces production-ready documentation and plans.
What it generates:
- ONE_PAGER.md: product definition, MVP scope, user stories
- DEV_SPEC.md: schemas, API routes, security protocols, architecture diagrams
- PROMPT_PLAN.md: stepwise, LLM-testable prompt chains
- AGENTS.md: system prompts/directives for autonomous coding agents
Who it’s for: founders, PMs, and tech leads who want faster idea-to-MVP handoff, plus teams experimenting with agent-based development.
Why it matters: centralizes product, architecture, and prompt-engineering into a consistent spec bundle, aiming to cut planning time and reduce ambiguity between stakeholders and AI agents.
Caveats to keep in mind: like any LLM-driven planning tool, outputs will need human review for feasibility, security, and scope creep; “architecture diagrams” and protocols are only as solid as the inputs and model.
Discussion Summary:
The discussion focuses on the tool's user experience, specific output bugs, and its role in the "AI coding agent" ecosystem.
- UX and Copy Critique: One user (
nvdr) praised the "slick" styling but encountered a bug where the tool returned placeholder text ("Turn messy ideas...") instead of a generated spec. This sparked a debate about the homepage copy—users suggested calling ideas "messy" has negative connotations, though the creator (strydsk) noted it was intended to highlight the tool's clarity. - Technical Glitches: The creator attributed some of the erratic behavior (specifically "jumping steps" in the wizard) to a regression or API-level usage limits while using
gpt-4o-mini. - Role in Workflow: Users sought clarity on how this links to actual coding. The consensus—confirmed by the creator—is that this tool creates the roadmap (specs and plans) which are then fed into separate autonomous coding agents, rather than the tool being an agent that indexes codebases itself.
- Feature Requests: There was a strong suggestion for a distinct "Plan Mode" to help users evaluate the strategy before generation; the creator agreed this was a key differentiator and provided a sample "Prompt Plan" output in response.
New Apple Study Shows LLMs Can Tell What You're Doing from Audio and Motion Data
Submission URL | 68 points | by andrewrn | 29 comments
Apple says LLMs can infer your activity from sensor summaries—without training on your data
- What’s new: An Apple research paper explores “late multimodal fusion” using LLMs to classify activities from short text summaries of audio and IMU motion data (accelerometer/gyroscope), not the raw signals.
- How it works: Smaller models first convert audio and motion streams into captions and per-modality predictions. An LLM (tested: Gemini 2.5 Pro and Qwen-32B) then fuses those textual hints to decide what you’re doing.
- Data: Curated 20-second clips from the Ego4D dataset across 12 everyday activities (e.g., cooking, vacuuming, laundry, weights, reading, watching TV, using a computer, sports, pets, dishes, eating).
- Results: Zero- and one-shot classification F1 scores were “significantly above chance” with no task-specific training; one-shot examples improved accuracy. Tested in both closed-set (12 known options) and open-ended settings.
- Why it matters: LLM-based late fusion can boost activity recognition when aligned multimodal training data is scarce, avoiding bespoke multimodal models and extra memory/compute for each app.
- Privacy angle: The LLM sees only short text descriptions, not raw audio or continuous motion traces—potentially less sensitive and lighter-weight to process.
- Reproducibility: Apple published supplemental materials (segment IDs, timestamps, prompts, and one-shot examples) to help others replicate the study.
- Big picture: Expect more on-device orchestration where compact sensor models summarize streams and a general-purpose LLM does the reasoning—useful for health, fitness, and context-aware features without deep per-task retraining.
Here is a summary of the discussion on Hacker News:
Privacy and Surveillance Concerns A significant portion of the discussion focused on the privacy implications of "always-on" sensing. Users drew parallels to 1984 and "telescreens," with some arguing that modern smartphones already surpass those dystopian surveillance tools in capability. Commenters expressed concern that even if data is encrypted or anonymized now, companies may hoard it to decrypt later when technology advances (e.g., quantum computing). Others noted that this granular activity tracking poses specific dangers to high-risk individuals like activists and journalists, regardless of how benign the consumer feature appears.
Apple Watch Performance & UX The conversation shifted to current Apple Watch capabilities, with users debating the reliability of existing activity detection. Some complained that newer models feel slower to detect workouts (like running) compared to older generations or competitors. Others defended this as a design choice, suggesting the system now requires a larger data window to ensure "confidence" and prevent false positives, though they noted Apple communicates this mechanism poorly to users.
Technical Implementation and Ethics Technically-minded commenters clarified the papers distinction: the LLM does not process raw data but relies on smaller, intermediate models to generate text captions first. Some questioned the efficiency of this, suggesting standard analytics might be sufficient without adding an LLM layer. While some acknowledged the positive potential—such as distinguishing a senior citizen falling from a parent playing with their kids—others argued that beneficial use cases (like nuclear power) do not automatically justify the existence of the underlying dangerous capabilities (like nuclear weapons).
Show HN: PolyGPT – ChatGPT, Claude, Gemini, Perplexity responses side-by-side
Submission URL | 17 points | by ncvgl | 12 comments
A new open‑source desktop app promises to end tab‑hopping between AI chats by letting you type once and query multiple models—like ChatGPT, Gemini, and Claude—simultaneously. It mirrors your prompt to all connected model interfaces and shows responses side‑by‑side in real time, making it handy for prompt crafting, QA, and quick model comparisons.
Highlights:
- Cross‑platform downloads: Mac, Windows, Linux; code available on GitHub
- Supports “4+” models (including ChatGPT, Gemini, Claude)
- One prompt, mirrored to all interfaces; live, side‑by‑side outputs
- Free, open source, and positioned as privacy‑focused
Good fit for teams and tinkerers who routinely compare models or iterate on prompts. Practical caveats remain (provider logins/API keys, rate limits, usage costs, and provider ToS), but the friction reduction and real‑time comparison view are the draw.
The discussion focuses on the delivery method, technical implementation, and potential features for evaluating the AI outputs:
- Web vs. Native: Several users requested a web-based version, expressing strong reluctance to install native applications (specifically Electron wrappers) from unknown sources. They cited security concerns and a preference for the control and accessibility tools available in their own customized browsers.
- Alternatives: One commenter pointed out that Open WebUI already has this functionality built‑in.
- Implementation details: There was a brief debate on the underlying mechanics, specifically comparing the use of API keys versus embedding web apps, and how those choices affect context handling.
- "AI Judge" Feature: A significant portion of the thread explored adding a feature where a "Judge" model compares the parallel responses to determine the best output. Ideas included using a "blind jury" of models or a democratic voting system among the agents, though one user noted past experiments where agent democracy led to AI models "conspiring" against the rules.
Google tells employees it must double capacity every 6 months to meet AI demand
Submission URL | 46 points | by cheshire_cat | 28 comments
Google says it must double AI serving capacity every six months—targeting a 1000x scale-up in 4–5 years—while holding cost and, increasingly, power flat. In an internal all-hands seen by CNBC, Google Cloud VP Amin Vahdat framed the crunch as an infrastructure race that can’t be won by spending alone: the company needs more reliable, performant, and scalable systems amid GPU shortages and soaring demand.
Key points:
- Bottlenecks: Nvidia’s AI chips are “sold out,” with data center revenue up $10B in a quarter. Compute scarcity has throttled product rollouts—Sundar Pichai said Google couldn’t expand access to its Veo video model due to constraints.
- Google’s plan: build more physical data centers, push model efficiency, and lean on custom silicon. Its new TPU v7 “Ironwood” is claimed to be nearly 30x more power efficient than Google’s first Cloud TPU (2018).
- Competitive backdrop: OpenAI is pursuing a massive US buildout (reported six data centers, ~$400B over three years) to reach ~7 GW, serving 800M weekly ChatGPT users who still hit usage caps.
- The bet: Despite widespread “AI bubble” chatter (Pichai acknowledges it), Google views underinvesting as riskier than overcapacity. Pichai warned 2026 will be “intense” as AI and cloud demand collide.
Why it matters: A six‑month doubling cadence implies a Moore’s Law–style race, but with power and cost ceilings that force co-design across chips, models, and data centers. If demand holds, winners will be those who align compute, energy, and reliability; if it doesn’t, capex-heavy bets could sting.
Here is the summary of the discussion:
Demand: Organic vs. Manufactured A major point of contention was the source of the "demand" driving this infrastructure buildup. While some users pointed to ChatGPT’s high global ranking (5th most popular website) as evidence of genuine consumer interest, others argued Google’s specific demand metrics are inflated. Skeptics noted that "shimming" AI into existing products—like Gmail, Docs, and Search—creates massive internal query volume without necessarily reflecting user intent or willingness to pay. Several commenters likened these forced features to a modern "Clippy," expressing annoyance at poor-quality AI summaries in Search.
Feasibility and Physics Commenters expressed deep skepticism regarding the technical feasibility of Google’s roadmap. Users argued that doubling capacity every six months is "simply infeasible" given that semiconductor density and power efficiency gains are slowing (the end of Moore’s Law), not accelerating. Critics noted that optimizations like custom silicon and co-design can't fully overcome the physical constraints of raw materials, construction timelines, and energy availability needed to sustain 1000x growth in five years.
The "Bubble" and Post-Crash Assets The discussion frequently drifted toward the "AI bubble" narrative. Users speculated on the consequences of a market correction, comparing it to the housing crash.
- Hardware Fallout: Many hoped a crash would result in cheap hardware for consumers, specifically discounted GPUs for gamers, inexpensive RAM, and rock-bottom inference costs that could make "AI wrapper" business models viable.
- Infrastructure: There was debate over what happens to specialized data centers if the tenants fail; while some suggested conversion to logistics centers (e.g., Amazon warehouses), others noted that the electrical and HVAC infrastructure in AI data centers is too over-engineered to be cost-effective for standard storage.
Comparison to Past Tech Shifts Users debated whether AI is a frantic infrastructure race or a true paradigm shift. Some questioned if AI has reached "mass consumer" status comparable to the PC or smartphone, citing older generations who still don't use it. Conversely, others argued that student adoption of LLMs indicates a permanent shift in how the future workforce will operate, justifying the massive investment.
Google begins showing ads in AI Mode (AI answers)
Submission URL | 19 points | by nreece | 7 comments
- What’s new: Google is rolling out “Sponsored” ads directly within AI Mode (its answer-engine experience). Until now, AI answers were ad-free.
- How it appears: Ads are labeled “Sponsored” and currently show at the bottom of the AI-generated answer. Source citations mostly remain in a right-hand sidebar.
- Who sees it: AI Mode is free for everyone; Google One subscribers can switch between advanced models like Gemini 3 Pro, which can generate interactive UIs for queries.
- Why it matters: This is a clear monetization step for Google’s AI answers and a test of whether users will click ads in conversational results as much as in classic search. Placement at the bottom suggests Google is probing for higher CTR without disrupting the main answer.
- The backdrop: Google has been nudging users into AI Mode over the past year. Keeping it ad-free likely helped adoption; adding ads tests the business model—and could reshape SEO, publisher traffic, and ad budgets if performance holds.
- What to watch:
- Do AI answer ads cannibalize or complement traditional search ads?
- Changes in ad load/placement over time.
- Regulatory scrutiny around disclosures and ranking in AI experiences.
- Publisher referral impacts as AI answers absorb more user intent.
Discussion prompt: Will users click “Sponsored” links in AI answers at rates comparable to top-of-page search ads—or does the chat-style format depress ad engagement?
Google starts showing ads inside AI Mode answers Google is rolling out "Sponsored" ads directly within its AI Mode answer engine, a feature that was previously ad-free. These ads appear at the bottom of AI-generated responses, while citations remain in the sidebar. This move represents a significant test of monetization for conversational search, potentially reshaping SEO and publisher traffic as Google probes for higher click-through rates without disrupting the primary user experience.
Hacker News Discussion Summary:
The introduction of ads into Google's AI Mode sparked a discussion regarding user interface comparisons, the potential for "extortionary" business models, and the future of ad blocking.
- Perplexity vs. Google: Users compared the new layout to Perplexity. While some find Perplexity superior for semantic understanding and source checking, others analyzed Google’s specific UI choices (blocks of links vs. scrolling), with one user describing the integration of irrelevant or cluttered link blocks as "embarrassing" compared to organic layouts.
- Monetization Concerns: Several comments expressed cynicism regarding the intent behind these ads.
- One user theorized that AI might eventually refuse to answer "DIY" questions (e.g., plumbing instructions) to force users toward paid local service ads, comparing the model to "mugshot publishing" extortion.
- Others noted that Google already forces brands to bid on their own trademarks (like Nike or Adidas) to secure top slots; embedding ads in AI is seen as a way to maintain this gatekeeper status and potentially bypass current ad-blocking technologies.
- Ad Blocking: The conversation inevitably touched on countermeasures, with users predicting the immediate rise of "AI ad blockers" designed specifically to scrub sponsored content from generated answers.