AI Submissions for Thu May 15 2025
The unreasonable effectiveness of an LLM agent loop with tool use
Submission URL | 405 points | by crawshaw | 278 comments
In an exciting new development for AI-based programming assistance, Philip Zeyliger shares insights about an innovative project called Sketch, an AI Programming Assistant powered by an LLM (Language Learning Model) and tool integration. Zeyliger and his team have distilled the process into a deceptively simple, yet highly effective, loop consisting of just nine lines of code. This loop enables the LLM to interact with tools like bash to automate and solve programming challenges with surprising ease.
Sketch leverages Claude 3.7 Sonnet extensively to tackle various problems in one go, turning previously tedious tasks like esoteric git operations, type checking, and manual merges into more streamlined processes. The AI's adaptability is notable; if a tool is missing, Sketch will seek to install it and adjust to variations in command-line options seamlessly. However, it's not without quirks, sometimes humorously opting to skip failing tests rather than fixing them.
The core advantage of this AI-powered loop is its potential to handle specific and nuanced automation needs that traditional tools struggle with. The ability to correlate stack traces with git commits or to tackle sed one-liners underscores its powerful impact on improving developer workflows. Zeyliger envisions a future where custom LLM agent loops become commonplace in automating day-to-day tasks, transforming the tedium into efficiency.
For those intrigued, Zeyliger encourages readers to experiment with creating their own ad-hoc LLM agent loops by grabbing a bearer token and diving into the code. The full blog post can be found at philz.dev, where Zeyliger shares further thoughts on this promising technology and its implications for the future of programming automation.
The discussion revolves around experiences and opinions on AI-powered coding assistants like Sketch, Claude, and Aider, with a focus on their capabilities, limitations, and practical integration into workflows. Key points include:
-
Success Stories & Enthusiasm:
Users highlight successful implementations, such as automating git operations, type checking, or generating code with Claude 3.7 Sonnet ("impressed" with GitHub cleanup scripts). Some praise AI's ability to handle "tedious tasks" or act as a "junior partner" in coding with proper prompting. -
Challenges & Skepticism:
- Reliability Issues: Agents sometimes loop endlessly, skip tests, or fail to reflect on errors, requiring human intervention ("20+ iterations no progress").
- Prompt Engineering: Users note the necessity of explicit, step-by-step instructions to guide AI behavior, akin to managing a junior developer. For example, prompts must enforce "design-first" approaches or clarify assumptions.
- Cost Concerns: API costs (e.g., Claude’s $100/month plan) and scalability are debated, though some share budget-friendly workflows ($0.20/API call scripts).
-
Workflow Strategies:
- Structured Guidelines: One user shares a detailed framework for AI interactions (e.g., "STYLEGUIDE.md" enforcing clarity, testing, and documentation), mirroring software engineering principles.
- Hybrid Approaches: Combining AI automation with human oversight (e.g., "aggressively intercepting" execution when stuck) is seen as critical for complex projects.
-
Tool Comparisons:
- Aider vs. Claude: Aider’s configurability and static analysis tools are contrasted with Claude’s code-generation strength.
- Ruby vs. Python: Some users advocate for Ruby's simplicity in implementing AI agents over Python’s ecosystem.
-
Philosophical Debates:
- Users humorously question if AI agents are evolving into "robot PMs/devs," raising concerns about job impacts.
- Optimists argue AI’s growing "reasonable effectiveness" in specific use cases could mirror early programming language adoption trajectories.
Overall, the discussion reflects cautious optimism: while AI assistants show promise in reducing grunt work, their effectiveness hinges on human guidance, careful prompt design, and balancing automation costs with productivity gains.
Show HN: Real-Time Gaussian Splatting
Submission URL | 137 points | by markisus | 48 comments
Introducing LiveSplat, the cutting-edge algorithm for real-time Gaussian splatting using RGBD camera streams, launched by developer Mark Liu. Initially part of a proprietary VR telerobotics system, the algorithm caught attention after a Reddit post showcasing its capabilities. Now, LiveSplat makes its debut as an independent project. Although still in alpha phase, this tool promises to transform RGBD data into stunning visual outputs in real-time, using up to four RGBD sensors.
LiveSplat offers a glimpse into its potential for various applications, from improving VR experiences to advancing robotic perception. While the tool isn't open source, Liu invites businesses interested in incorporating this technology to contact him for licensing opportunities.
Designed for systems running Python 3.12+ on Windows or Ubuntu with an Nvidia GPU, LiveSplat requires some integration to connect your RGBD streams. A ready-made script for Intel Realsense devices is included to help users get started.
Join the LiveSplat community on Discord for assistance, inspiration, and to see the remarkable demo video showcasing its capabilities. Whether you're a hobbyist or a company eager to push the boundaries of RGBD processing, LiveSplat opens exciting new possibilities. Dive in and explore the future of real-time 3D streaming today!
The Hacker News discussion around LiveSplat highlights both enthusiasm for its real-time Gaussian splatting capabilities and technical curiosity about its implementation. Here's a concise summary:
Key Discussion Themes:
-
Technical Insights & Comparisons
- Users noted the demo’s resemblance to 3D point clouds but highlighted improvements, such as reduced artifacts and view-dependent effects.
- Comparisons were drawn to NeRFs (Neural Radiance Fields) and traditional point cloud rendering. Gaussian splatting was praised for enabling real-time, photorealistic 3D reconstruction by leveraging RGBD data and gradient-based optimization.
- The speed (33ms processing time) was contrasted with slower methods like InstantSplat (minutes to hours), emphasizing LiveSplat’s potential for live applications.
-
Demo Clarifications
- Some users were confused about the demo’s visuals, questioning whether it showed real-time conversion of RGBD streams or post-processed results. Developer mrkss clarified that the system dynamically converts live camera views into Gaussian splats, with the demo screen-recorded from a running system.
-
Applications & Potential
- Excitement centered on uses in VR/AR, robotics, and creative fields (e.g., stylized 3D worlds, interactive 4D canvases). One user imagined blending Gaussian fields with diffusion models for artistic tools.
- Questions arose about handling dynamic scenes (not just static environments) and temporal consistency, with the developer noting temporal accumulation as a future focus.
-
Technical Challenges
- Users debated limitations, such as handling sparse data, view-dependent effects from single/multiple cameras, and the role of neural networks in interpolating colors.
- The reliance on RGBD input (vs. 2D-only) was seen as key for geometry optimization and real-time performance.
-
Licensing & Accessibility
- While not open-source, LiveSplat’s licensing model for businesses sparked interest. The developer invited collaboration, particularly for enterprise applications in VR, robotics, or graphics.
Developer Responses:
- mrkss addressed technical queries, explaining how RGBD data bypasses traditional optimization bottlenecks and enables real-time rendering.
- Acknowledged current alpha-stage limitations (e.g., pixelation in low-resolution areas) but emphasized the system’s foundational advancements over point clouds.
Community Sentiment:
The thread reflects a mix of admiration for the technical achievement and curiosity about practical implementation. While some users sought deeper technical details, others envisioned transformative applications in gaming, virtual production, and beyond. Critiques focused on demo clarity and scalability, but overall, LiveSplat was seen as a promising leap in real-time 3D reconstruction.
Show HN: A free AI risk assessment tool for LLM applications
Submission URL | 31 points | by percyding99 | 11 comments
Today's digest includes a spotlight on a new tool making waves on Hacker News: TavoAI's AIRiskOps assessment tool. The tool is designed to provide users with insights into operational risks associated with artificial intelligence—a growing concern in today's increasingly automated landscape. Users can access the tool by signing in with their GitHub accounts, which streamlines onboarding and ensures a secure connection. By using AIRiskOps, individuals agree to abide by the service's Terms of Service and Privacy Policy. This development highlights the tech community's ongoing efforts to address AI transparency and safety, marking a significant step toward responsible AI management.
Summary of Discussion:
-
Security Standards & Enterprise Expectations:
- Users highlighted the importance of aligning the tool with enterprise security frameworks like SOC 2 and ISO 27001, emphasizing the need for clear data points and compliance processes for large organizations.
-
Privacy Link & Data Usage Clarification:
- A broken privacy policy link was flagged and promptly fixed by the developer (percyding99). Users inquired about secondary repositories being used for training data, which the developer clarified are not utilized, ensuring transparency.
-
GDPR Compliance Concerns:
- Feedback noted potential misalignment with GDPR regulations, pointing out that GDPR focuses on "personal data" (not just PII) and requires pseudonymization for compliance. The developer acknowledged the feedback, stating the tool is in early stages and requires further testing for regulatory adherence.
-
Target Audience Debate:
- A discussion emerged about whether the tool should prioritize enterprises (for compliance needs) or hobbyists/small businesses (seeking affordability and creativity).
- Developers indicated a focus on enterprises but expressed interest in exploring hobbyist use cases. Critics argued hobbyists may not pay, while others noted regulated industries would value compliance features.
-
Developer Responsiveness:
- The developer actively addressed concerns, fixed issues (e.g., broken links), and engaged with feedback on compliance and market strategy, acknowledging potential pivots if assumptions about regulated industries prove incorrect.
Key Themes:
- Compliance and Security dominate enterprise concerns.
- Transparency in data handling and regulatory alignment is critical.
- Market Focus debates highlight tensions between enterprise rigor and hobbyist accessibility.
The discussion reflects a tool in evolution, balancing user feedback with strategic goals for AI risk management.
Stop using REST for state synchronization (2024)
Submission URL | 51 points | by Kerrick | 26 comments
In a recent blog post, the author critiques the prevalent use of REST for client-server communication in web app development, arguing that most applications actually require state synchronization rather than state transfer. This distinction is crucial because it highlights the limitations of REST in handling dynamic user interactions efficiently.
The author shares their experience of building web apps during a sabbatical using React and TypeScript for the frontend and Rust with the Axum library for the backend. Despite this modern tech stack, they found the approach cumbersome and brittle due to the REST protocol's inherent complexity in synchronizing state changes between the frontend and backend.
Illustrated with a common web app scenario—a text input that syncs with a backend database—the discussion reveals how REST necessitates writing repetitive boilerplate code to handle fetching, updating, and error management. More critically, REST can inadvertently introduce bugs, especially in scenarios with concurrent requests. For instance, if two quick successive text changes ("A" to "B") are made, REST’s lack of guarantees on request order and concurrency could lead to the first change overwriting the second in the database, contrary to user intent.
To mitigate these issues, developers often employ workarounds like disabling inputs during in-flight requests or queuing requests. However, this either compromises user experience or slows down server communication.
The article advocates for transitioning from REST to state synchronization protocols better suited for real-time updates and consistent state handling, aligning system architecture with modern application needs and offering a more robust and responsive user experience.
The Hacker News discussion around the critique of REST for client-server communication highlights several key debates and perspectives:
Core Critique of REST
- Participants agree that REST struggles with real-time state synchronization, especially for dynamic UIs requiring concurrent updates. Issues like request ordering conflicts and over-reliance on boilerplate code are cited as limitations.
Alternative Solutions
- CRDTs (Conflict-Free Replicated Data Types) and OT (Operational Transformation) are proposed for resolving conflicts in distributed systems, but their complexity and steep learning curve make implementation daunting, particularly for existing systems not designed for multiplayer/multi-writer scenarios.
- The Braid Project is highlighted as a promising extension to HTTP, aiming to transform it into a state synchronization protocol. It offers backward compatibility with existing HTTP infrastructure and avoids forcing developers to adopt entirely new protocols like WebSockets or GraphQL.
Industry Realities
- Many argue that companies continue using REST or GraphQL due to familiarity, even if these tools don't fully address state-sync challenges. Examples include AWS API Gateway with WebSockets and DynamoDB for real-time updates, though costs and operational complexity remain barriers.
- Electric SQL and Yjs are noted as tools easing CRDT adoption, but users warn of pitfalls (e.g., schema migration, document-size management) and the mental overhead of maintaining synchronization.
Skepticism and Practical Challenges
- Some question the necessity of abandoning REST entirely, arguing most apps don’t need CRDTs’ guarantees. Retrofitting state-sync into existing systems is seen as risky or overkill for non-collaborative apps.
- Debates arise over REST’s original definition (per Roy Fielding) versus its misuse in practice, with many "RESTful" APIs diverging from Fielding’s standards.
Implementation Hurdles
- Handling schema changes, versioning, and ensuring client compatibility in CRDT-based systems is nontrivial. Users share war stories, like YJS throwing errors when documents grow too large, requiring careful data chunking and storage strategies.
- The Braid Project’s promise of native HTTP-based state sync is tempered by concerns about industry adoption and the inertia of existing REST/GraphQL ecosystems.
Conclusion
The discussion underscores a gap between theoretical solutions (CRDTs, Braid) and practical implementation realities, with many advocating for context-specific choices rather than a one-size-fits-all approach. While alternatives to REST show promise, challenges around complexity, cost, and industry readiness persist.
A Tiny Boltzmann Machine
Submission URL | 249 points | by anomancer | 43 comments
The fascinating realm of Boltzmann Machines (BMs) has taken center stage in the AI landscape once again. These machines, one of the earliest generative AI models introduced back in the 1980s, have been revitalized in a bite-sized, browser-friendly format. At their core, BMs are designed for unsupervised learning, enabling them to conjure new data akin to the training samples without explicit guidance.
Delving deeper, a Boltzmann Machine operates by harmonizing with the physics of energy systems. It consists of interconnected neurons that either carry a signal (turned on) or do not (turned off), with the connectivity or "weights" influencing the machine's learning process. Some neurons are visible and interact directly with inputs, while others remain hidden, playing a crucial role in generating complex patterns.
The two main flavors of these neural networks are the General Boltzmann Machine, where all neurons interlace, and its more streamlined sibling, the Restricted Boltzmann Machine (RBM). The RBM simplifies learning by ensuring neurons within the same layer don't connect, making the model not only quicker to train but also easier to interpret.
The driving force behind a Boltzmann Machine's learning capability lies in its energy-based model. Essentially, it minimizes energy to understand and generate data, with the energy ebbs and flows being calculated through a specific equation involving visible and hidden neuron states, weights, and biases.
Training a Boltzmann Machine involves a procedure called Contrastive Divergence, where the machine trains on samples by adjusting weights to align its output closely with input samples. It's a step-by-step dance of clamping visible units to data and shaping the hidden ones to reinforce learning. The ultimate goal is to have the output mirror the input as accurately as possible.
For hands-on enthusiasts, the journey unfolds with an online simulator where you can watch as the RBM hones its weights and lowers energy over time. The simulator showcases the transformation from initial mismatched states to eventually converging to a stable configuration where the output mirrors the input data.
For those raring to explore, the appendix provides an in-depth look at the Contrastive Divergence algorithm, ideal for anyone diving deeper into the mathematical underpinnings of these neural networks. Whether you're an AI aficionado or a curious coder, Boltzmann Machines offer an intriguing window into the intricacies of machine learning's past and present.
The discussion surrounding the resurgence of Boltzmann Machines (BMs) and Restricted Boltzmann Machines (RBMs) touched on several themes:
- Historical Context: Users highlighted foundational work by researchers like Smolensky, Hinton, and Rummelhart, with references to pivotal papers and the evolution of energy-based learning models.
- Technical Nuances: Debates arose around training methods (e.g., Contrastive Divergence vs. Gibbs sampling), structural differences between BMs/RBMs and feed-forward networks, and the challenges of probabilistic sampling. A subthread critiqued the article’s title for conflating BMs with the cosmological "Boltzmann Brain" concept, sparking speculative tangents about quantum computing and AI.
- Simulator Feedback: Praise was given for the interactive RBM demo, though some noted scrolling issues on mobile, which the author addressed.
- Research Investment: A tangent debated U.S. R&D spending, with users citing Wikipedia data and critiquing short-term business priorities over long-term research.
- Nostalgia & Applications: Longtime practitioners reminisced about 1990s implementations (e.g., music recognition systems) and shared links to related projects, including AI music generation and educational neural network content.
- Queries & Corrections: Users flagged typos, clarified RBM architecture (visible/hidden layer connectivity), and requested deeper dives into Bayesian methods.
Overall, the thread blended technical insights, historical perspectives, and lighthearted critiques, reflecting both admiration for BMs’ simplicity and curiosity about their modern relevance.
Show HN: Min.js style compression of tech docs for LLM context
Submission URL | 174 points | by marv1nnnnn | 52 comments
Hello, tech enthusiasts! Today, we dive deep into a fascinating new initiative shaking up the AI world—meet "llm-min.txt," a project aimed at revolutionizing how AI assistants process technical documentation. Led by marv1nnnnn and currently boasting over 400 stars on GitHub, this project is all about making AI smarter and more efficient in handling up-to-date tech docs.
The Problem: AI's Knowledge Lag
AI models, even the sharpest like GitHub Copilot, often struggle with the latest updates in programming libraries due to their "knowledge cutoff" dates. This lead to inaccurate suggestions and broken code since software evolves faster than these models can learn.
Previous Solutions and Their Shortcomings
Efforts like llms.txt and Context7 have tried to bridge this gap by providing structured documentation formatted specifically for AI use. However, these approaches come with limitations: large file sizes that exceed AI context windows and the "black box" nature of some services which reduces transparency.
Enter llm-min.txt: A New Hope for Efficient AI Comprehension
Inspired by the compact efficiency of min.js files in web development, llm-min.txt applies a similar strategy to tech documentation. Instead of a verbose manual, llm-min.txt leverages AI to distill these documents into super-condensed summaries. These summaries carry only the most essential data, perfectly optimized for machine parsing, making it lean yet powerful for AI assistants to process.
The Machine-Optimized Format: Structured Knowledge Format (SKF)
The llm-min.txt files are formatted in SKF, a compact structure that's better suited for machines than for humans. Here's a glimpse into its elements:
- Header Metadata: Includes critical contextual details, like the original documentation source and creation timestamp.
- DEFINITIONS Section: Covers static aspects like class definitions, properties, and inheritance structures.
- INTERACTIONS Section: Details dynamic behaviors such as method interactions, usage patterns, and error handling.
- USAGE_PATTERNS Section: Offers concrete examples of library use, breaking down workflows into easily digestible steps.
Why It Matters
In a world where accuracy and up-to-dateness are imperative for coding teams and AI tools alike, llm-min.txt presents a promising solution. By minimizing token consumption while maximizing information value, this approach represents a significant leap forward in AI knowledge management.
Whether you're a tech enthusiast, an AI developer, or just someone curious about the future of AI tools, llm-min.txt is definitely worth keeping an eye on. Contribute, learn, and explore how this initiative could shape the next generation of code-assisting AI models.
Summary of Hacker News Discussion on "llm-min.txt":
The discussion around the llm-min.txt project highlights enthusiasm for its goal of compressing technical documentation for AI efficiency, alongside critical questions and skepticism. Key points include:
Positive Reactions & Interest
- Token Reduction Success: Users praised the 92% reduction in token usage, which could significantly speed up AI workflows (e.g., Google AI Studio integration).
- Practical Applications: Developers shared use cases, such as integrating compressed docs with tools like Claude Code or React Router, to improve AI-assisted coding.
- Related Projects: Mentions of similar efforts, like a prompt compression contest and Microsoft’s KBLaM (external knowledge integration for LLMs), suggest a growing interest in this space.
Critiques & Concerns
-
Lack of Benchmarks:
- Users expressed disappointment at the absence of rigorous benchmarks comparing llm-min.txt to raw documentation or alternatives like Context7.
- Skepticism arose about claims that AI performance with compressed docs matches uncompressed versions, with calls for objective metrics (e.g., accuracy in code generation).
-
Format Readability & Hallucinations:
- Concerns that the Structured Knowledge Format (SKF) might be too machine-focused, risking misinterpretation by LLMs or hallucinations.
- Debates emerged about whether LLMs can reliably parse compressed formats without human-readable context.
-
Transparency & Guidelines:
- Critiques of the project’s llm_min_guideline.md for lacking clarity, with users urging better documentation to ensure consistent AI interpretation.
Project Lead Responses
- marv1nnnnn acknowledged challenges in evaluation design and emphasized iterative improvements.
- They defended the approach as a "first step," highlighting the balance between compression and retaining essential information.
Technical Debates
- SKF’s Novelty: Questions about whether SKF introduces a new knowledge representation standard or builds on existing frameworks.
- Human vs. Machine Formats: Some argued that LLMs inherently prefer natural language over highly structured formats, complicating adoption.
Community Contributions
- Developers shared experiments with AI tools (e.g., Claude, Gemini) and workflows for real-time doc integration, underscoring demand for solutions but noting gaps in reliability.
Conclusion: While llm-min.txt shows promise in addressing AI’s "knowledge lag," the discussion reflects a cautious optimism. Success hinges on transparent benchmarks, clearer guidelines, and addressing LLMs’ unpredictable behavior with compressed formats. The project’s evolution will likely depend on community feedback and real-world testing.
LLMs get lost in multi-turn conversation
Submission URL | 362 points | by simonpure | 246 comments
In today's Hacker News roundup, arXiv, the revered open-access repository for scientific papers, has exciting news: they’re on the hunt for a new DevOps Engineer. This is a golden opportunity to be a part of an essential platform for open science, impacting one of the most significant websites in the scientific community.
Meanwhile, a new study titled "LLMs Get Lost in Multi-Turn Conversation," authored by Philippe Laban and his colleagues, delves into the challenges faced by Large Language Models (LLMs) in multi-turn dialogues. These advanced chatbots shine when handling single-turn, fully-specified instructions but stumble significantly when engaging in prolonged conversations—showcasing a striking 39% performance drop across various tasks. The researchers identified that the models often make premature assumptions and fail to recover when they stray off course. This important finding underscores the complexity of human-like conversation modeling and poses intriguing possibilities for further AI advancement.
For those eager to explore the intricacies of AI conversations and contribute to cutting-edge developments in open science, arXiv houses this groundbreaking research paper alongside a unique career opportunity. Discover the full job description and paper online to see how you might engage with these exciting developments.
The discussion on HackerNews revolves around the challenges and practical applications of Large Language Models (LLMs) in technical contexts, sparked by a study highlighting their struggles with multi-turn conversations. Key themes include:
-
Context Management & Recovery Issues:
Users confirm that LLMs like Gemini often falter in prolonged conversations, struggling to maintain context or recover from errors. One user shared an example where debugging IPSec configurations required manually feeding logs and iterating with the model to resolve issues. Clear, concise context and structured feedback loops were critical for success. -
Practical Use Cases:
- Debugging & Code Fixes: Users reported using LLMs to troubleshoot code (e.g., fixing a PPP driver in Zephyr OS) by pasting logs, decoding hex dumps, and referencing RFC documents. However, models occasionally missed critical details (e.g., specific RFC sections), requiring human verification.
- Documentation & Knowledge Compression: LLMs were praised for distilling complex information (e.g., large codebases or documentation) into actionable insights, though outputs sometimes lacked precision.
-
Debate: Tool vs. Learning Aid:
- Critics argued that over-reliance on LLMs risks bypassing foundational learning, likening it to using a calculator without understanding arithmetic.
- Proponents countered that LLMs act as "accelerators" for experienced developers, helping identify patterns, optimize workflows, and navigate large systems—complementing, not replacing, expertise.
-
Philosophical Reflections:
The "Chinese Room" argument resurfaced, with users debating whether LLMs truly "understand" context or merely mimic it through statistical patterns. Some noted parallels to how humans process information instinctively versus LLMs starting "from scratch" in each interaction. -
Model Comparisons & Workflows:
- Mixed results were noted across models (Gemini, Claude, GPT), with Gemini praised for handling large context windows but criticized for occasional inaccuracies.
- Users emphasized iterative prompting, cross-referencing outputs, and combining models (e.g., using Claude for rewrites, GPT for API integrations) to mitigate limitations.
Takeaway: While LLMs are powerful tools for specific tasks, their effectiveness hinges on human guidance, context curation, and validation—especially in complex, multi-step problem-solving. The discussion underscores a balance between leveraging AI efficiency and maintaining deep technical understanding.
Show HN: Heygem AI – An Open Source, Free Alternative to Heygen AI
Submission URL | 23 points | by heygem-ai-new | 3 comments
In today's roundup on Hacker News, there's buzz surrounding the "Duix.Heygem" project on GitHub, particularly its impressive traction within the community. With 1.4k forks and 8.4k stars, it seems this repository has captured the interest of many developers. However, some users are experiencing issues with signing in to adjust their notification preferences or perform certain actions, leading to a discussion about potential glitches in account management when using multiple tabs or switching accounts. The repository continues to gain attention and interaction, as other developers are keen to understand what makes Duix.Heygem so appealing. Keep an eye on this space for user-generated solutions and workarounds, as well as updates from the repository's contributors.
Summary of Discussion:
The discussion around the Duix.Heygem project includes three key points:
- Setup Instructions: A user (djfbbz) notes that instructions for running the project on Google Colab are available.
- Skepticism About Engagement: Yiling-J raises concerns about potential artificial inflation of GitHub stars, linking to accounts (Hammerock, MacKeepUS, Hirako) and projects like WuKongOpenSource and Heygem. They suggest a "90% chance" these stars are fake or bot-generated, casting doubt on the project's organic traction.
- EULA Concerns: Another user (ndrr) hints at possible issues with the project's End User License Agreement (EULA), describing it as "tsty" (likely "testy" or contentious).
This discussion adds context to the original submission, highlighting both technical guidance and community skepticism about the project's legitimacy and legal terms.
If AI is so good at coding where are the open source contributions?
Submission URL | 72 points | by thm | 36 comments
In today's digest from Hacker News, we're diving into the skepticism surrounding AI's ability to replace human programmers—starting with claims from tech giants like Microsoft and Meta. Despite lofty assertions from CEOs like Satya Nadella and Mark Zuckerberg about AI-generated code potentially forming a significant chunk of their companies' future programming efforts, critics demand proof. The open-source community, where any developer can scrutinize and contribute to code, poses the perfect transparency test for AI contributions. Yet, signs of AI's presence in meaningful, complex open-source contributions remain scant.
Java expert Ben Evans challenges the AI coding hype by asking, "Where are the AI-driven pull requests for non-obvious, non-trivial bugs in mature open-source projects?" His call has seen limited actionable responses. Contributions like one AI-assisted pull request to the Rails project required human refinement, while another experiment in the Servo project went through over a hundred revisions due to basic errors.
Interestingly, experiments such as the Cockpit project using AI tools for code reviews revealed more noise than value—pointing to AI’s current limitations. Furthermore, the pushback from the open source community is partly due to inexperienced users flooding projects with subpar AI-generated submissions, creating more chaos than aid. With some projects even banning AI-generated "contributions" due to low-quality outputs and misuse, the gap between AI ambition and practical, high-value coding remains a talking point.
Ultimately, until AI consistently produces quality results beyond trivial tasks or operates autonomously without extensive human oversight and correction, skepticism will persist. The challenge isn't just for AI to code, but to do so at a level that convinces seasoned developers of its worth, while not alienating the community it seeks to serve.
Summary of Discussion:
The Hacker News discussion reflects skepticism about AI's current ability to meaningfully contribute to open-source projects, despite hype from tech leaders. Key points include:
-
Lack of Evidence for Non-Trivial Contributions: Critics highlight the absence of AI-driven pull requests addressing complex, non-obvious bugs in mature projects. Examples like an AI-assisted Rails PR requiring human refinement and a Servo experiment with 100+ error-prone revisions underscore AI’s limitations in context-aware problem-solving.
-
Licensing and Copyright Concerns: AI-generated code faces legal ambiguity. Contributors note that open-source projects often require copyright assignments, which AI cannot provide. Licensing compatibility (e.g., AGPL) is also questioned, as AI tools may inadvertently reproduce code without proper attribution, risking legal issues.
-
Noise vs. Value: Tools like GitHub Copilot are criticized for generating low-quality, "noisy" contributions, with inexperienced users flooding projects with flawed AI code. Some projects now ban AI submissions to avoid maintenance burdens.
-
Gradual Improvement vs. Hype: While some acknowledge incremental progress (e.g., AI rewriting 2/3 of a codebase in one experiment), most argue current tools are best for narrow, well-defined tasks. The gap between marketing claims ("30% of code is AI-written") and tangible results remains stark.
-
Community Resistance: Developers resist AI’s role due to fears of degraded code quality and legal risks. The consensus is that until AI operates autonomously at a high level—without extensive human oversight—it will remain a supplementary tool, not a transformative force.
Conclusion: The discussion emphasizes that AI’s coding potential is still aspirational, hindered by technical, legal, and cultural barriers. For now, human expertise remains irreplaceable in open-source ecosystems.