AI Submissions for Tue Nov 11 2025
We ran over 600 image generations to compare AI image models
Submission URL | 190 points | by kalleboo | 99 comments
LateNiteSoft (makers of Camera+, Photon, REC) ran 600+ AI image edits to see which models work best for everyday photo tweaks—and to decide what to support in their new MorphAI app. Because they refuse “unlimited” AI pricing with fair‑use gotchas, they built CreditProxy, a pay‑per‑generation billing layer (and are inviting trial users).
How they tested
- Realistic use cases: pets, kids, landscapes, cars, product shots
- Naive prompts (what typical users actually type), not prompt‑engineered
- Tracked speed and behavior across models
Latency (avg during tests)
- OpenAI gpt-image-1: High ~80s (Medium ~36s)
- Gemini: ~11s
- Seedream: ~9s
- Times were stable across prompts
Findings
- Classic/photoreal filters: Gemini best preserves original detail and resists hallucinations, but often weakens or refuses edits—especially on people. OpenAI applies stronger looks but introduces “AI slop,” notably on faces. Seedream had some odd shortcuts.
- Long exposure: OpenAI did best when the effect made sense (landscapes, cars) but failed on cats/product and got trippy on portraits. Gemini often did nothing. Seedream leaned on generic light streaks.
- Heat map: None showed real “heat” understanding; Seedream mostly assumed humans emit heat.
- Creative effects (vintage, kaleidoscope, etc.): Gemini is conservative; OpenAI more creative but less faithful.
Why it matters
- Model choice should be task‑driven: Gemini for faithful edits, OpenAI for bold stylization (with risk), Seedream for speed and low cost but less grounding.
- For consumer photo apps, predictable costs, latency, and “do no harm” edits often beat raw creativity.
There’s a big, flip‑throughable comparison gallery on the post (with keyboard shortcuts).
Summary of Hacker News Discussion:
-
Model Comparisons & Quirks:
- Gemini is praised for preserving details and refusing risky edits (e.g., faces), but often returns unchanged images or weakens edits.
- OpenAI (GPT-image-1) is seen as more creative but introduces "AI slop," altering faces/objects and applying a yellow tint. Users debate whether this tint is intentional (e.g., vintage styling) or a technical flaw.
- Seedream excels in speed and cost but sacrifices detail, using shortcuts like generic light streaks.
-
Technical Insights:
- OpenAI’s pipeline regenerates images semantically rather than editing pixels directly, leading to unintended changes (e.g., altered faces). This is attributed to tokenization and latent space architecture.
- Gemini’s conservatism, especially with people, may stem from safety filters.
-
Practical Challenges:
- Users report frustration with models ignoring prompts (e.g., Gemini refusing edits) or altering unintended areas, necessitating manual checks.
- Cost and latency matter: Seedream’s speed appeals to small creators, while OpenAI’s pricing and reliability raise concerns.
-
Community Reactions:
- Skepticism about "AI slop" as hype vs. substance, with critiques of stock photo industry impacts.
- Debate over whether OpenAI’s yellow tint is a feature (stylistic choice) or a bug.
- Interest in hybrid workflows (e.g., SDXL, LoRAs) for better control, highlighting a gap in commercial SaaS offerings.
-
Notable Quotes:
- "Models sometimes alter objects they weren’t supposed to touch… a complete failure."
- "Peak quality in realistic rendering might already be behind us." (referring to DALL-E 3’s trade-offs).
Key Takeaway:
The discussion underscores the need for task-specific model selection, transparency in AI editing behavior, and tools that balance creativity with fidelity. Community sentiment leans toward cautious adoption, emphasizing manual oversight and hybrid approaches for professional use.
Scaling HNSWs
Submission URL | 206 points | by cyndunlop | 42 comments
Scaling HNSWs: antirez’s hard-won lessons from bringing HNSW to Redis
- Not just another intro: After a year building an HNSW-based “Redis experience,” antirez shares advanced, practical insights—what it takes to make HNSW low-latency and production-ready, not just paper-correct.
- HNSW isn’t the final form: The original paper is excellent but incomplete for real systems. He added true deletions (beyond tombstones), and questions how necessary the hierarchy (“H”) really is—early results suggest a flat, single-layer variant can work but with higher seek time. The sweet spot may be modified level selection rather than all-or-nothing.
- Memory is the real enemy: HNSW is pointer-heavy and multi-level; vectors are big. Extra layers cost ~1.3x space on average (with p=0.25), so hierarchy isn’t the main bloat—vector storage is.
- Biggest win: 8‑bit quantization by default. Per-vector max-abs scaling delivers roughly 4x faster search and ~4x smaller vectors with near-identical recall in practice. Pointers still dominate some footprint, but this is the low-hanging fruit that moves the needle in Redis.
- Why this quantization: Using a single max-abs per vector keeps cosine similarity fast—compute a simple scale factor and do the heavy lifting in the integer domain with unrolled loops and multiple accumulators for modern CPUs. It’s faster than min/max quantization while preserving accuracy.
- Tradeoffs he didn’t take (yet): Pointer compression could save memory (upper bytes often identical on 64-bit) but may cost latency; he hasn’t adopted it given Redis’s performance bar.
- Direction of travel: Don’t assume “evolution” just means on-disk HNSW. There’s room for fresh data-structure ideas around hierarchy, level selection, deletions, and quantization that can beat conventional wisdom.
Why it matters: If you’re building vector search in latency-sensitive systems, quantization and careful algorithmic choices can deliver big wins without killing recall—and some revered parts of HNSW may be optional with the right tweaks. Redis Vector Sets ships with 8-bit quantization on by default for exactly these reasons.
Summary of Discussion:
The discussion around antirez's insights on scaling HNSW for Redis highlights several technical challenges, trade-offs, and alternative approaches in vector search systems:
1. Filtered Searches & Performance
- Applying metadata filters (e.g., regional constraints) during HNSW traversal can degrade performance, as it requires checking each candidate vector against filter criteria. Solutions like Turbopuffer (200ms latency for 100B vectors) and Vespa’s hybrid search were cited as addressing this, though antirez notes Redis prioritizes low-latency by limiting graph traversal depth early if filters are restrictive.
- Lucene/Elasticsearch shortcuts filtering by pre-determining eligible nodes, but worst-case scenarios still involve brute-force distance comparisons.
2. Quantization & Efficiency
- Redis’s 8-bit per-vector quantization (using max-abs scaling) was praised for reducing memory usage by ~4x and speeding up searches while preserving recall. Critics noted that DiskANN and other systems achieve similar gains via int8/binary quantization but require trade-offs in recall.
- antirez clarified that Redis’s approach prioritizes CPU-friendly integer operations and avoids complex schemes like product quantization (PQ), balancing practicality with near-identical recall for most use cases.
3. Hierarchy in HNSW
- Debate arose over whether HNSW’s hierarchical layers ("H") are essential. antirez’s experiments suggest a flat, single-layer variant could suffice with higher seek times, proposing modified level selection as a middle ground. Academic references (e.g., "Hubs in HNSW") were shared, underscoring ongoing research into hierarchical efficiency.
4. Implementation Challenges
- Memory vs. Latency: Pointer compression was discussed but deemed risky for Redis’s strict latency goals.
- Single-Threaded Design: Redis’s single-threaded model influenced HNSW implementation choices, favoring simplicity and deterministic performance over parallelism.
5. Alternative Approaches
- Vespa and SPFresh were highlighted for hybrid search optimizations.
- Broader themes emerged on system design philosophy: Simplicity and "good enough" solutions (e.g., 60th vs. 72nd recall percentile) often trump theoretical perfection, especially in latency-sensitive applications like RAG.
Key Takeaway:
The discussion underscores that real-world vector search systems require pragmatic trade-offs—quantization, filtered search shortcuts, and hierarchy adjustments—to balance speed, memory, and recall. Redis’s choices reflect a focus on practical, low-latency solutions over algorithmic purity.
Adk-go: code-first Go toolkit for building, evaluating, and deploying AI agents
Submission URL | 80 points | by maxloh | 23 comments
Google open-sources ADK for Go: a code-first toolkit for building and deploying AI agents
What it is: ADK (Agent Development Kit) for Go is a modular, model-agnostic framework focused on building, evaluating, and orchestrating AI agents using idiomatic Go. It’s optimized for Gemini but works with other models and frameworks.
Why it matters: Go is a natural fit for cloud-native, concurrent systems. ADK brings a strongly typed, testable, versionable approach to agent development—aimed at production-grade workloads and multi-agent orchestration—without locking you into a specific model or deployment target.
Highlights
- Code-first and idiomatic Go: define agent logic, tools, and orchestration in code for flexibility and testability.
- Rich tool ecosystem: use prebuilt tools or wire in custom functions to extend agent capabilities.
- Multi-agent systems: compose specialized agents into larger workflows.
- Deploy anywhere: easy containerization; strong fit for Cloud Run and cloud-native environments.
- Model-agnostic, Gemini-optimized: integrates with Gemini while staying portable.
Quick start: go get google.golang.org/adk
Details: Apache-2.0 licensed, ~2.8k GitHub stars, with companion ADKs for Python and Java plus docs and samples at google.github.io/adk-docs/.
Summary of Hacker News Discussion:
Key Themes
-
Go’s Strengths for AI Agents:
- Concurrency & Performance: Users highlight Go’s native concurrency (goroutines/channels) as ideal for AI agents handling parallel tasks (e.g., HTTP requests, database operations) without serialization bottlenecks. Its compiled binaries and efficiency suit cloud/serverless deployments (e.g., Cloud Run).
- Type Safety & Testability: Go’s strong typing and idiomatic design enable reliable, maintainable agent code. Some contrast this with Python’s flexibility, which can lead to runtime errors in complex systems.
-
Comparison with Python/Java:
- Python ADK: Praised for simplicity (e.g., defining agents as objects with tools) and built-in features (debugging, session management). However, Go is seen as better for production-scale systems requiring strict concurrency and type safety.
- Java: Noted for enterprise-grade performance but seen as less agile for rapid agent development. Go strikes a balance between performance and developer ergonomics.
-
Use Cases & Skepticism:
- Production Readiness: Users see ADK-Go as promising for multi-agent orchestration in cloud-native environments, especially with Gemini optimizations. Some question if inference latency (often model-dependent) negates Go’s runtime advantages.
- Model Agnosticism: While Gemini-optimized, the framework’s portability across models (e.g., OpenAI, Claude) is appreciated, though integration efforts vary.
-
Tooling & Ecosystem:
- Prebuilt Tools: The ADK’s tool ecosystem (e.g., HTTP/SQLite connectors) simplifies agent development. Custom tool integration via Go functions is seen as a plus.
- Debugging/Orchestration: Features like session management and callbacks for data anonymization are highlighted as valuable for complex workflows.
Notable Opinions
- Rust vs. Go: A user notes Rust’s popularity but argues Go’s concurrency model is more approachable for agent development.
- Python’s Dominance: Some acknowledge Python’s hold on AI prototyping but see Go as better for scaling “script-like” agents into robust applications.
- Deployment Flexibility: Go’s compiled binaries are praised for serverless/edge deployments, with one user sharing success in production serverless functions.
Criticisms & Questions
- Learning Curve: A few users express surprise at Go’s type-driven agent definitions (similar to TypeScript) but find it manageable.
- Gemini Lock-In?: Clarified that ADK is model-agnostic, though Gemini optimizations are a focus.
Miscellaneous
- Community Excitement: Several users express enthusiasm for Go’s role in advancing multi-agent systems and cloud-native AI.
- References: Links to prior HN posts about agents and Claude’s Python implementation are shared for comparison.
Overall Sentiment: Positive, with developers seeing ADK-Go as a compelling option for building scalable, type-safe AI agents in production, particularly where concurrency and cloud-native deployment matter. Python remains favored for prototyping, but Go’s strengths in reliability and performance are seen as filling a critical niche.
Xortran - A PDP-11 Neural Network With Backpropagation in Fortran IV
Submission URL | 46 points | by rahen | 10 comments
XOR Neural Network in FORTRAN IV (RT-11, PDP-11/34A) — A delightful retrocomputing crossover: a tiny multilayer perceptron that learns XOR, written in 1970s-era FORTRAN IV and run under RT-11 on a PDP‑11/34A (via the SIMH emulator). It’s a legit backprop network: 1 hidden layer (4 neurons, leaky ReLU), MSE loss, tanh output, “He-like” Gaussian init via a Box–Muller variant, and learning-rate annealing. The whole thing trains 17 parameters and converges in minutes on real hardware (or at a realistic 500K throttle in SIMH), printing loss every 100 epochs and nailing the XOR targets. It compiles with the original DEC FORTRAN IV compiler and needs just 32 KB plus an FP11 floating-point unit. Includes an RT‑11 disk image, so you can attach it in SIMH and run, or build with .FORTRAN and .LINK. A neat proof that backprop doesn’t require modern frameworks—just patience, floating point, and a 1970s minicomputer.
The discussion highlights a mix of nostalgia, technical insights, and historical context around retrocomputing and early neural networks:
-
Retro Hardware & Neural Networks: Users reminisce about professors implementing neural networks on PDP-11s in the 1980s, noting limitations like the PDP-11/34A’s modest power (roughly comparable to an IBM XT) but praising its ability to handle sustained workloads with its FPU. References are made to historical models like the Neocognitron (1980s) and the role of VAX systems in later backpropagation research.
-
FORTRAN IV Nuances: Debate arises around FORTRAN IV’s features, including its use of FORTRAN 66 extensions, lack of modern constructs (e.g., structured
If/Then/Else), and reliance on hardware FPUs or software emulation. The project’s compatibility with the original DEC compiler and constraints (32 KB memory, FP11 support) sparks appreciation for its efficiency. -
Humor & Corrections: A lighthearted thread corrects Fortran version timelines (Fortran IV in 1966 vs. Fortran 77 in 1977), jokingly referencing Charles Babbage’s Analytical Engine. Another user points out the ironic overlap between PDP-11 hardware and the “Parallel Distributed Processing” (PDP) connection in neural network literature.
-
Appreciation for Simplicity: Commentators laud the project for demonstrating core concepts without modern frameworks, emphasizing the value of understanding fundamentals over today’s complexity.
Overall, the exchange blends technical admiration for early computing with wry humor about its historical quirks.
AI documentation you can talk to, for every repo
Submission URL | 161 points | by jicea | 118 comments
Devin DeepWiki: turn any repo into an AI‑generated code wiki A new tool called Devin DeepWiki promises “index your code with Devin,” letting you add a GitHub repo and get a browsable, wiki‑style view of the codebase with AI summaries and search. The demo shows a catalog of popular projects (VS Code, Transformers, Express, SQLite, React, Kubernetes, etc.) you can pick to “understand,” suggesting it pre‑indexes large OSS repos for instant exploration. The pitch is faster onboarding and code comprehension: instead of hopping across files, you get cross‑linked context and natural‑language explanations.
Why it’s interesting
- Speaks to the growing demand for AI‑first code navigation and docs, competing with tools like Sourcegraph/Cody, CodeSee, and auto‑docs generators.
- Could be useful for due diligence, learning popular frameworks, or ramping onto large legacy codebases.
What to watch
- Accuracy and hallucinations in summaries; keeping the wiki in sync with fast‑moving repos.
- Privacy/security for private code and indexing scope.
- How it handles truly large monorepos and language/tooling diversity.
The discussion around Devin DeepWiki highlights skepticism and critical feedback, focusing on accuracy, documentation integrity, and practical usability:
-
Accuracy Concerns:
- Users criticize AI-generated summaries and diagrams for being outdated, incorrect, or misleading. For example, the tool inaccurately claims a VS Code extension exists, but the linked repository shows it’s experimental/unreleased.
- Debate arises over whether AI can reliably handle subjective or nuanced topics (e.g., React vs. functional frameworks, OOP vs. FP), with concerns that LLMs might reinforce biases or misinterpretations instead of clarifying them.
-
Documentation Frustrations:
- The project’s own documentation is flagged as confusing or incomplete, such as installation instructions for an unreleased VS Code extension. Users note that incomplete or incorrect docs waste time and erode trust, especially for contributors trying to build/use the tool.
- A meta-point emerges: If AI-generated docs (like DeepWiki’s) are error-prone, they risk creating a “hallucination spiral” where future AI models train on flawed data, worsening accuracy over time.
-
Project Transparency:
- Critics argue the demo’s pre-indexed OSS repos (e.g., VS Code, React) mask the tool’s limitations. The maintainer admits parts are experimental but defends the approach as a calculated risk.
- Some users question the ethics of promoting unfinished tools, suggesting it prioritizes hype over practicality, especially for private codebases.
-
Mixed Reactions to AI’s Role:
- While some acknowledge AI’s potential to surface high-level insights, others stress that human-curated documentation remains irreplaceable for precision.
- A recurring theme: AI-generated docs might complement but not replace manual efforts, particularly in filling gaps for legacy/unmaintained projects.
Key Takeaway:
The discussion reflects cautious interest in AI-powered code navigation tools but emphasizes the need for accuracy, transparency, and human oversight. DeepWiki’s current implementation raises red flags, but its concept sparks debate about balancing automation with reliability in developer tools.
How to Train an LLM: Part 1
Submission URL | 15 points | by parthsareen | 3 comments
What it is
- A hands-on series documenting the author’s attempt to build a domain-specific LLM from scratch. Part 1 sets a clean, “boring” Llama 3–style baseline and maps out the training math, memory, and token budgeting before getting fancy.
Model and data
- Architecture: ~1.24B params, Llama 3–ish
- 16 layers, hidden size 2048, SwiGLU (×4), 32 heads with 8 KV heads (GQA), RoPE theta 500k, vocab 2^17, tied embeddings, no attention/MLP bias, norm_eps 1e-5.
- Context: targeting 4096 at the end, but trains mostly at 2048 (common practice: short context for 80–90% of steps, extend near the end).
- Data: Karpathy’s fine-web-edu-shuffled.
- No cross-document masking (for now).
Compute plan
- Hardware: 8×H100 80GB.
- Token budget: Chinchilla-style 1:20 params:tokens → ~20B tokens for a 1B model.
- Global batch target: 1M tokens (GPT-3 XL–style).
- With FP32 ballpark estimates and a 5GB “misc” reserve per GPU, each H100 fits ~7×[2048] sequences per step.
- Across 8 GPUs: micro-batch ≈ [56, 2048] = 114,688 tokens/step.
- Gradient accumulation: ceil(1,048,576 / 114,688) = 10 micro-steps per global batch.
- Steps: 20B / 1M = 20,000 optimizer updates; with accumulation, ≈200,000 forward/backward micro-steps.
Memory insights (intuition, FP32, unfused)
- Rough peaks by phase:
- Forward: Weights + Activations.
- Backward: Weights + Activations + Gradients (often the peak).
- Optimizer step: Weights + Gradients + Optimizer states (~4× params in bytes).
- Activation memory dominates at realistic batch sizes due to unfused ops saving intermediates.
- Empirical activation cost scales linearly with batch size; ~7.95GB per [1,2048] sequence in this setup.
Immediate optimizations planned
- torch.compile and FlashAttention to fuse ops and slash activations.
- Gradient accumulation (already used).
- More to come (mixed precision, custom kernels, infra upgrades).
Why it matters
- Clear, number-first walkthrough of how far 8×H100 can push a 1B Llama-style pretrain without exotic tricks.
- Sets a reproducible baseline before exploring “BLASphemous” optimizations, longer context, inference-friendly tweaks, and a custom “token farm.”
What’s next
- Improving training infra, scaling token throughput, extending context efficiently, and architectural changes aligned with the final task. The domain target is still under wraps.
The discussion touches on contrasting perspectives about LLM deployment and hardware requirements:
-
Mobile vs. Server Debate: One user argues LLMs should prioritize optimization for mobile/portable devices (cheaper, easier maintenance) rather than expensive server infrastructure. They suggest deploying LLMs directly on phones or edge devices.
-
Counterexample with Laptops: A reply highlights running a 70B-parameter LLM on a $300 laptop with 96GB RAM using tools like
llm.cpp, implying powerful models can already operate on consumer-grade hardware. The user mentions purchasing the laptop for non-AI reasons, suggesting incidental compatibility with AI workloads. -
Unclear Contribution: A third comment ("cdcntnt nwbrrwd") appears fragmented or mistyped, offering no clear insight.
Key Takeaway: The exchange reflects ongoing tensions in the AI community between centralized (server-based) and decentralized (edge/mobile) LLM deployment strategies, with practical examples demonstrating feasibility on modest hardware.