Hacker News
Daily AI Digest

Welcome to the Hacker News Daily AI Digest, where you will find a daily summary of the latest and most intriguing artificial intelligence news, projects, and discussions among the Hacker News community. Subscribe now and join a growing network of AI enthusiasts, professionals, and researchers who are shaping the future of technology.

Brought to you by Philipp Burckhardt

AI Submissions for Wed Jul 09 2025

Perplexity launches Comet, an AI-powered web browser

Submission URL | 14 points | by gniting | 3 comments

Perplexity has just launched Comet, its ambitious new AI-powered web browser designed to give Google Search a run for its money. As the latest in a series of bold initiatives from the startup, Comet debuts with its AI search engine at the forefront, alongside Comet Assistant—an AI agent keen on streamlining everyday digital tasks. Initially available to those on the $200-per-month Max plan and select waitlist invitees, Comet intends to empower users by summarizing emails, organizing calendar events, and smoothly managing web browsing.

At the heart of Comet is Perplexity’s AI search engine, delivering concise summaries of search results directly to users. The browser further integrates the Comet Assistant, a persistent AI companion capable of managing tabs, summarizing inboxes, and even guiding users through web navigation without the hassle of jumping between windows. This potentially robust AI assistant, however, requires significant access permissions to perform effectively, a factor that may cause some users to hesitate.

Despite the challenges, CEO Aravind Srinivas has high hopes for Comet, viewing it as crucial in Perplexity's quest to bypass Google Chrome’s dominance and courageously step into the competitive world of browsers. This move aligns with the overarching goal of developing a browser that could become the primary platform for user activities—a vision of "infinite retention" by embedding the AI deeply into the daily digital routine.

But the journey won't be easy, as the browser arena is already packed with strong contenders like Google Chrome and Apple’s Safari. Even rivals like The Browser Company with its AI-powered Dia browser and speculated ventures from OpenAI make the space highly competitive. Though Comet hopes to build momentum on Perplexity’s recent traction, convincing users to switch browsers and abandon the familiarity of Google presents a formidable challenge.

In early tests, Comet Assistant shines in addressing straightforward queries, but its performance dims with complexity and the trade-off in privacy for functionality may deter some users. Regardless, users might find its seamless integration for browsing assistance notably beneficial, particularly for email and calendar management—a step forward for those accustomed to manually relaying information to AI like ChatGPT.

As Comet steps into this lively ecosystem, its innovation and expanded tools offer a fresh take on web browsing, although persuading users to fully embrace it remains a daunting task. Nonetheless, Perplexity’s robust approach and fast-paced developments hint at a spirited fight ahead in the browser battleground.

The discussion around Perplexity’s new Comet browser highlights a mix of cautious optimism and skepticism. Users note that Comet appears to be a Chromium-based wrapper enhanced with AI features, raising questions about its innovation compared to existing browsers.

Key points from the conversation include:

  • YouTubers promoting Comet for simplifying tasks like meal planning, grocery-list generation, and research automation, though actual user testing remains limited.
  • Skepticism about whether the AI can consistently deliver on these promises, with one user admitting they haven’t personally tested it but express doubts about reliability (e.g., "things done automatically [are] supposedly successful... but haven’t tested").
  • Speculation about AI’s broader potential to transform daily workflows and productivity, coupled with uncertainty about whether Comet’s implementation lives up to the hype.
  • Comparisons to Chromium underscore debates about whether Comet offers meaningful differentiation in a crowded market.

Overall, while there’s interest in Comet’s AI-driven vision, users remain hesitant until real-world performance verifies its utility and reliability.

Biomni: A General-Purpose Biomedical AI Agent

Submission URL | 215 points | by GavCo | 32 comments

In an exciting development from Stanford University, Biomni has emerged as a versatile game-changer in the biomedical research landscape. Described as a "general-purpose biomedical AI agent," Biomni is a powerful tool tailored to revolutionize research by autonomously executing a wide array of complex tasks across various biomedical fields.

Key to Biomni's prowess is its integration of cutting-edge large language model (LLM) reasoning with retrieval-augmented planning and code-based execution. This combination significantly amplifies research productivity and assists scientists in formulating testable hypotheses with increased efficiency.

For those eager to dive in, the environment setup is conveniently streamlined through a single script, preparing users to harness Biomni's capabilities right away. Example tasks include planning CRISPR screens or predicting the ADMET properties of compounds, demonstrating the tool’s broad scope and utility.

Engagement with the community is a vital aspect of Biomni's ecosystem, welcoming contributions ranging from new tools and datasets to software integrations and performance benchmarks. A collaborative spirit is particularly encouraged with the upcoming development of Biomni-E2, envisioned to push the boundaries of what's possible in the biomedical domain. Notably, contributors making substantial impacts may receive co-authorship on future scholarly work.

Biomni is openly licensed under Apache-2.0, although users should be vigilant about the licensing of specific integrated tools. As it stands, Biomni represents a leap forward in AI-driven biomedical innovation, poised to streamline and enhance scientific discovery processes. For more on how to get involved or use Biomni, the community can explore detailed tutorials and engage with the AI through its web interface.

The Hacker News discussion around Biomni highlights a mix of enthusiasm, skepticism, and critical questions about its implications and technical approach:

Praise and Excitement

  • Several users (e.g., frdmbn, pnb, pstss) express optimism about AI's potential to accelerate biomedical research, particularly in identifying patterns, genomic analysis, and drug discovery. Biomni’s integration of RAG (Retrieval-Augmented Generation) and code-based execution is seen as a promising step.
  • Tools like PaperAI and PaperETL are referenced as complementary projects for literature review, suggesting interest in AI-driven research pipelines.

Skepticism and Concerns

  • Misuse Risks: User andy99 raises ethical concerns about AI enabling bioweapon development, though grzy counters that technical barriers (e.g., specialized skills, equipment) and real-world failures (e.g., the Tokyo sarin attack) make large-scale threats unlikely.
  • Utility Debate: Some question Biomni’s practicality. SalmoShalazar dismisses it as "needless wrappers around LLM API calls," sparking debate about whether domain-specific wrappers (e.g., legal or biomedical workflows) constitute meaningful innovation. teenvan_1995 questions the utility of 150+ tools without real-world validation.
  • Technical Limitations: Critiques focus on potential hallucinations, data formatting challenges, and reliance on LLMs’ reliability, with examples from legal AI tools producing flawed outputs (mrlngrts, slacktivism123).

Comparative Perspectives

  • Projects like ToolRetriever and domain-specific SaaS tools are cited as alternatives, emphasizing the importance of context-aware tool selection and integration.
  • ImaCake and others caution against hype-driven adoption, framing Biomni as part of a trend where institutions prioritize marketing over substance.

Broader Implications

  • Discussions highlight divergent views: Optimists see AI democratizing research (gronky_), while skeptics stress the need for verifiable results and domain expertise. Mixed reactions reflect the broader AI community’s tensions around innovation versus practicality.

In summary, Biomni sparks hope for a biomedical AI revolution but faces scrutiny over ethics, technical execution, and whether its approach transcends existing tools. The debate underscores the challenges of balancing ambition with real-world applicability in AI-driven research.

HyAB k-means for color quantization

Submission URL | 41 points | by ibobev | 16 comments

Pekka Väänänen of 30fps.net dives into a fascinating exploration of color quantization using an intriguing twist on the traditional algorithm: the HyAB distance formula in CIELAB color space. At the heart of this exploration is the quest for enhanced image quality by converting the RGB values of an image into CIELAB space, where color differences can be calculated more in line with human perception.

Väänänen is inspired by the FLIP error metric and a 2019 paper that introduces an alternative method for large color differences—HyAB, a hybrid distance formula combining "city block" and Euclidean distances. This method aims to improve perceptual accuracy by treating lightness and chroma as separate when calculating color differences.

The real clincher in Väänänen’s research is applying the HyAB-inspired technique to k-means clustering, a statistical method popular for its applicability in color quantization. The idea is to select a suitable palette of colors from a high-color image by clustering similar colors together. By using the HyAB formula in place of the standard Euclidean distance within CIELAB space, the color quantization is allegedly more representative of actual visual differences.

The results of implementing this method show promise: images processed with the HyAB-adjusted k-means retain hues more accurately than those quantized with traditional methods, like sRGB or pure CIELAB with Euclidean distance. This method particularly shines in maintaining distinct hues in challenging colors like magenta and green, though with some caveats, such as a halo effect around red hues.

Väänänen explores further refinements, such as weighting the luminance differently in the HyAB formula, which offers more control over the final appearance without distorting hues, a common issue when other weights are adjusted in sRGB or CIELAB spaces. This weighting flexibility adds a layer of customization to how images can be processed under specific aesthetic goals or constraints.

While there's still ongoing debate about whether this method surpasses all traditional techniques, Väänänen’s experiment stands out by making the k-means clustering more adaptable through HyAB. It highlights how understanding and manipulating the theory behind color perception can translate into practical improvements in digital image processing, a critical concern in many fields including graphic design, printing, and digital media.

In summary, Väänänen's work is a testament to the power of rethinking established formulas with a perception-centric approach. It's an encouraging invitation for other developers and researchers to further explore color quantization's possibilities for more visually authentic and nuanced digital images.

The Hacker News discussion explores the trade-offs between color spaces like OKLab, CIELAB, CAM16-UCS, and HyAB for tasks like color quantization, gradient rendering, and dynamic design systems. Here's a distilled summary:

Key Points of Debate:

  1. OKLab vs. CAM16-UCS:

    • OKLab is praised for its simplicity, speed, and smoother gradients (e.g., in CSS), avoiding grays in blue-yellow transitions. Critics argue it’s a simplified, "good enough" model but lacks the perceptual rigor of CAM16-UCS, which is derived from complex color appearance models.
    • CAM16-UCS is considered more accurate but computationally intensive (e.g., converting 16M RGB colors to CAM16 takes ~6 seconds in Dart/JS), making it impractical for real-time applications.
  2. Performance vs. Accuracy:

    • For web and design tools (e.g., CSS gradients), OKLab’s speed and deterministic results are prioritized. Real-time systems need conversions in milliseconds, not seconds.
    • Material 3’s dynamic color system uses clustering (Celebi’s K-Means) for accessibility and contrast, emphasizing deterministic outcomes over perfect perceptual accuracy.
  3. Perceptual Uniformity:

    • OKLab claims perceptual uniformity but faces skepticism. Critics highlight edge cases (e.g., blue-yellow gradients) where CAM16-UCS might better model human vision. Proponents argue OKLab’s simplicity and smoother gradients suffice for most design needs.
  4. Gamut Mapping:

    • OKLab’s approach (e.g., Oklch in CSS) is noted for smoother gamut mapping compared to CIE Lch, though some confusion arises about whether this is due to the color space or the mapping algorithm itself.
  5. Industry Use:

    • Tools like Google’s Material Design balance theory with practicality. While CAM16 is scientifically robust, OKLab’s ease of implementation makes it a pragmatic choice for workflows requiring speed and simplicity.

Conclusion:

The thread underscores the tension between scientific rigor (CAM16-UCS) and practical application (OKLab). Design systems prioritize speed and deterministic results, while academic contexts favor accuracy. OKLab’s adoption in CSS and tools highlights its niche as a "good enough" solution, even as debates about its perceptual fidelity persist.

Is the doc bot docs, or not?

Submission URL | 188 points | by tobr | 111 comments

In a candid exploration of the challenges faced while modernizing Shopify email notification templates, Robin Sloan highlights a curious encounter with Shopify's LLM-powered developer documentation bot. The issue centers on figuring out how to detect if an order includes items fulfilled through Shopify Collective, a task that led Sloan to seek advice from the doc bot after traditional search methods fell short.

The bot's initial suggestion seemed plausible, proposing a Liquid syntax solution that should have worked. However, real-world testing (which involved repeated order placements and refunds) revealed that the requisite "Shopify Collective" tag wasn't attached to the order until after the confirmation email was sent. This delay in tagging, a nuance not documented, rendered the bot's advice ineffective.

Sloan questions the reliability of AI-powered documentation that may resort to educated guesses rather than providing infallible insights, especially when official documentation stakes are high. Despite some past successes in quick queries, this incident underscores the critical need for precise and dependable guidance in tech environments.

Ultimately, Sloan found a workaround by adapting existing code, checking product-level tags available at the email's generation time, successfully identifying Shopify Collective orders. This tale not only warns of the pitfalls of over-relying on AI but also celebrates the ingenuity required to navigate around them when they fall short.

The discussion revolves around the challenges and limitations of using AI, particularly Retrieval-Augmented Generation (RAG) systems, for technical documentation like Shopify's LLM-powered bot. Key points include:

  1. AI vs. Human Judgment: While AI can quickly generate plausible answers, it often struggles with nuance and accuracy in complex technical contexts. Users note that AI may confidently provide incorrect or incomplete solutions (e.g., missing real-world timing issues like delayed order tagging), highlighting the need for human oversight.
  2. RAG System Limitations: Technical hurdles with RAG—such as context window constraints, degradation in accuracy with larger documents, and inefficiency in filtering relevant information—make it unreliable for intricate queries.
  3. Cost and Scalability: Some argue AI documentation tools are cost-effective and faster than human efforts, but skeptics warn hidden costs (e.g., error correction) and context-handling flaws undermine scalability.
  4. Human-Curated Documentation: Participants stress that structured, human-written documentation remains critical, as AI cannot yet match the reliability, contextual awareness, and adaptability of expert-driven content.
  5. Workarounds and Adaptability: The incident underscores the necessity of developer ingenuity (e.g., using product tags) to bypass AI shortcomings when official documentation fails.

Overall, the consensus leans toward cautious integration of AI—valuing its speed but recognizing its fallibility—while advocating for hybrid approaches that prioritize human expertise in critical technical domains.

Using MPC for Anonymous and Private DNA Analysis

Submission URL | 36 points | by vishakh82 | 18 comments

Monadic DNA embarked on a unique project earlier this year, aiming to demonstrate how individuals could access and interact with their genetic data while maintaining privacy through cutting-edge technology. At an event in Denver, thirty pioneering participants provided saliva samples, which were processed using Multi-Party Computation (MPC) technology developed by Nillion. This ensured participants could analyze their genotyping results without ever exposing sensitive raw data.

The sample collection took place during the ethDenver conference, drawing a lively crowd at Terminal Bar thanks to perfect weather and a bit of social media buzz. Though the turnout was higher than anticipated, the team managed the rush effectively. Participants signed forms, selected kit IDs and PINs, and submitted their samples, being rewarded with both a drink and an optional digital token, known as a POAP, marking their participation.

The samples were then handled by Autogen, a lab chosen for their ability to manage both timelines and the privacy needs of the project. Despite only needing basic metadata like kit IDs, many labs expressed a willingness to work with anonymized samples, underscoring a trend towards privacy-respectful genomic research.

The data processing used the Global Screening Array for genotyping, providing participants with insights from around 500,000 genetic markers. This choice struck a balance between cost and data richness, opting against full-genome sequencing due to its high costs and current market irrelevance.

Once processed, the anonymized data was shared securely via standard cloud storage solutions, enabling participants to claim and analyze their genetic information confidentially. This project not only underscored the potential of MPC technology in safeguarding genetic data but also laid the groundwork for more private consumer genomic products in the future. The participants' enthusiasm, even months after the event, highlighted a growing trust in secure, privacy-focused genomic technologies.

Hacker News Discussion Summary:
The discussion on Monadic DNA’s privacy-focused genomic project highlighted a mix of technical curiosity, skepticism, and enthusiasm. Here are the key points:

  1. Terminology & Humor

    • Users joked about the overlap between “Multi-Party Computation (MPC)” and “Media Player Classic,” with playful confusion over abbreviations [wckgt].
  2. Technical Debates

    • Encryption & Trust: While krnck praised FHE (Fully Homomorphic Encryption) for securing results, others raised concerns about trusting external labs with raw data. mbvtt questioned whether encryption truly removes reliance on labs, noting markers’ interpretative dependence.
    • Molecular Cryptography: Projects like cryptographic DNA molecules were suggested as future solutions [Real_S], with vishakh82 (likely a team member) acknowledging ongoing work but emphasizing current regulatory realities.
  3. Philosophy & Scope

    • The term "monadic" sparked discussion, with odyssey7 linking it to self-contained encrypted insights. vishakh82 clarified the goal: personalized genetic insights via aggregated, consented data, avoiding centralized models.
  4. Cost & Practicality

    • Critics like gpypp queried legal/logistical risks of anonymization, while vishakh82 explained challenges with "de-anonymized" metadata and budget constraints, noting their project’s experimental nature vs. production-scale feasibility.
  5. Future Implications

    • phrnxrly critiqued cloud storage (S3) reliance, prompting vishakh82 to outline MPC/FHE for access control and ambitions to build a decentralized model akin to 23andMe, but centered on user consent.
  6. Broader Context

    • Links to newborn screening practices [vishakh82] and academic papers on genomic data privacy [Real_S] contextualized challenges like industrial trust and regulatory hurdles.

Conclusion: The thread reflects excitement for cryptographic privacy in genomics, tempered by realism around costs, trust in labs, and regulatory complexity. The project’s team actively addressed concerns, positioning MPC/FHE as foundational tools for future ethical, user-centric genomic services.

Springer Nature book on machine learning is full of made-up citations

Submission URL | 130 points | by ArmageddonIt | 50 comments

In an unexpected twist fit for a sci-fi drama, one of the latest machine learning resources might be taking some creative liberties with the truth—when it comes to citations, at least. The book "Mastering Machine Learning: From Basics to Advanced" by Govindakumar Madhavan is raising eyebrows—and not just for its $169 price tag. Published by Springer Nature, it turns out that many of the book's citations might be more fiction than fact.

Retraction Watch, tipped off by a concerned reader, dug into this mystery and discovered a murky world of missing or incorrect citations. An analysis of 18 out of 46 references revealed that an astonishing two-thirds weren't quite what they seemed. Some researchers even found themselves surprisingly cited for works they never wrote, with one paper cited being no more than an unpublished arXiv preprint inaccurately referred to as an IEEE publication.

This citation conundrum hints at the possible use of AI-style generation methods, reminiscent of those employed by large language models (LLMs) like ChatGPT. These models, while proficient in creating human-like text, can sometimes fall prey to fabricating references, creating fictitious citations that look realistic but don't hold up under scrutiny.

Madhavan hasn't fully offered clarification on whether AI played a role in crafting his book, but he acknowledged the growing difficulty in distinguishing between AI- and human-generated content. As the debate over the use of AI in academia continues, this case underscores the importance of rigorous verification, lest we end up with scholarly versions of "alternative facts." The mystery deepens, awaiting further comment from the author, who is no stranger to the tech world, leading SeaportAi and creating an array of educational resources. Stay tuned as this tale of academic intrigue unfolds!

The Hacker News discussion revolves around the implications of AI-generated content in academia, sparked by a book published by Springer Nature containing fabricated citations. Key points include:

  1. AI’s Role in Content Creation:
    Users debate the difficulty of distinguishing AI-generated text from human writing, especially as LLMs advance. Some suspect the book’s citations were AI-generated, highlighting issues like "confabulation" (mixing real and invented references) and overconfident but inaccurate outputs.

  2. Publisher Accountability:
    Springer is criticized for damaging its reputation by failing to verify content. Commenters note a trend of declining textbook quality, with publishers prioritizing profit (e.g., high prices for poorly reviewed books) over rigorous peer review. References to past publishing errors (e.g., typos, incorrect images) suggest systemic issues.

  3. Verification Challenges:

    • Existing tools like DOI links and AI detectors are deemed insufficient, as they can’t always validate context or prevent circular dependencies (e.g., GPT-4 generating valid-looking but fake citations).
    • Suggestions include manual checks, cross-referencing summaries with source material, and better institutional incentives for thorough peer review.
  4. Broader Academic Concerns:

    • Fear that AI could exacerbate problems like paper mills, fraudulent research, and "citation stuffing" to game academic metrics.
    • Jokes about a future where AI reviews AI-written content, creating a self-referential loop of unverified information.
    • Nostalgia for traditional, human-curated resources and lament over the erosion of trust in educational materials.
  5. Cultural Shifts:
    Mention of "Sturgeon's Law" (90% of content is "crap") underscores worries that AI might flood academia with low-quality work. Commenters stress the need for vigilance, better tools, and a return to quality-focused publishing practices to preserve scholarly integrity.

In summary, the discussion reflects skepticism about AI's unchecked use in academia, frustration with profit-driven publishing, and calls for more robust validation mechanisms to combat misinformation.

AI Submissions for Tue Jul 08 2025

Smollm3: Smol, multilingual, long-context reasoner LLM

Submission URL | 350 points | by kashifr | 70 comments

Exciting news from the world of language models! Meet SmolLM3, a cutting-edge multilingual, small-scale language model with big ambitions. Developed collaboratively by a team of experts, SmolLM3 is designed with efficiency and long-context reasoning in mind, aiming to outperform its peers like Llama-3.2-3B and Qwen2.5-3B, while being a worthy competitor to the larger 4B models such as Qwen3 & Gemma3.

Boasting a 3 billion parameter design, SmolLM3 is built to support six major languages, including English, French, and Spanish, making it an attractive option for global applications. Capable of handling long contexts up to 128,000 tokens, SmolLM3 promises breakthrough performance with its novel attention mechanisms like Grouped Query Attention (GQA) and the innovative NoPE hybrid attention strategy.

The creators are not just sharing a final product; they're offering an open-source blueprint of how they constructed this marvel from the ground up. This transparency allows enthusiasts and developers to understand the intricacies behind achieving such performance at a smaller scale. The model is trained on a whopping 11 trillion tokens with a three-stage approach focusing on datasets from diverse domains such as web, math, and code.

SmolLM3 uses advanced techniques like intra-document masking and improved stability strategies akin to its predecessor, SmolLM2, while introducing tweaks for superior stability and performance during training. The model's robustness was ensured through numerous validations using massive computing power—an awe-inspiring setup involving 384 H100 GPUs over 24 days.

For those curious about the finer points of SmolLM3, the project offers a goldmine of engineering insights and methodologies, making it a remarkable reference for anyone looking to elevate their understanding or build upon this foundation. Whether you're interested in language models' architecture or aiming to push the boundaries of machine learning capabilities, SmolLM3 paints an inspiring picture of what skilled and thoughtful engineering can achieve in the AI landscape.

Hacker News Discussion Summary: SmolLM3 Release

Cost & Training Resources:
The discussion highlights the significant computational cost of training SmolLM3, initially cited as using 384 H100 GPUs over 24 days. Users debated the exact cost, with estimates ranging from $28k to over $500k, depending on GPU rental pricing (e.g., $2–$3/hour on cloud platforms like Runpod). Corrections clarified the math, emphasizing the high barrier to entry for reproducing such training without corporate-scale resources.

Open-Source Debate:
Participants questioned whether SmolLM3 is "truly open-source," comparing it to models like OLMo, which provide full training code, datasets, and weights. Some users expressed skepticism, noting that many "open" models omit critical details like training data or infrastructure. The SmolLM3 team clarified they are releasing a full engineering blueprint, including architecture and dataset mixes, to aid reproducibility.

Technical Challenges & Local Deployment:
Users shared mixed success running SmolLM3 on local hardware (e.g., Macs). Issues with inference engines like llama.cpp and Ollama were noted, though workarounds using MLX-LM or Transformers libraries were suggested. Quantization (e.g., 4-bit GGUF) was discussed to reduce VRAM usage, with some achieving 128k-token contexts on 24GB GPUs like the RTX 4090. The Mac community faced hurdles but found partial success with PyTorch and Metal GPU acceleration.

Use Cases & Small Model Potential:
The conversation pivoted to practical applications for 3B-scale models, such as edge devices (Jetson, mobile) and RAG systems. Participants debated whether small models can compete with larger ones in reasoning tasks, emphasizing domain-specific fine-tuning and hybrid approaches (e.g., combining vector search and keyword retrieval). Some shared success stories with Mistral 7B for specialized tasks, while others stressed the need for rigorous benchmarking.

Community Reception:
The release of SmolLM3’s detailed methodology was praised as a valuable resource for engineers and researchers. However, skepticism lingered about its benchmark claims and真正的 "state-of-the-art" status, with calls for independent validation. Developers expressed enthusiasm for testing the model, particularly its multilingual and long-context capabilities, despite deployment challenges.

Key Takeaways:

  • SmolLM3’s engineering transparency is a standout feature, though its open-source credentials face scrutiny.
  • Costs and infrastructure requirements limit reproducibility for individuals.
  • Local deployment remains tricky but feasible with community-driven tools.
  • Small models like SmolLM3 show promise for niche applications but require careful optimization and benchmarking.

The Tradeoffs of SSMs and Transformers

Submission URL | 64 points | by jxmorris12 | 8 comments

In the world of machine learning, a fascinating discussion is taking place between enthusiasts and experts alike about State Space Models (SSMs) and Transformers. A blog post has been adapted from a popular talk, aimed at making this complex subject accessible to everyone, from casual readers to dedicated researchers. The crux of the conversation lies in understanding how SSMs are evolving as a strong contender in sequence modeling, traditionally dominated by Transformers, particularly in language processing.

State Space Models have come a long way, derived from a lineage of work that culminated in the development of models like Mamba. At their core, SSMs can be conceptualized as modern successors to recurrent neural networks (RNNs), with distinct advantages that help them rival the performance of Transformers.

Three essential ingredients characterize the success of SSMs:

  1. State Size: SSMs feature a hidden state with a larger size than the inputs and outputs, enabling the model to store more context-rich information—a crucial trait for handling complex modalities like language.

  2. State Expressivity: The recursive update functions in SSMs are expressive enough to store and selectively access needed information, akin to the gating mechanisms in classic RNNs like LSTMs and GRUs. This flexibility allows the model to handle sequences with varying information rates, a key requirement for language modeling.

  3. Training Efficiency: While having a larger recurrent state boosts performance, it also increases computational complexity. Innovations like parallel scan algorithms have been employed to enhance the feasibility of training SSMs on modern hardware, balancing memory usage and computational workload.

The blog highlights that these strategies, though not entirely new, when combined effectively, bring SSMs to the forefront, demonstrating near-equivalence to Transformers in language modeling tasks.

The landscape of modern machine learning is rapidly shifting as researchers continuously seek to improve recurrent models. SSMs and other models like RWKV and Griffin are explored further, depicting diverse approaches in state expressivity and parallel training efficiency. The post delves into the nuances of linearity, selectivity, and the theoretical underpinnings of these models, underscoring a vibrant research area ripe with potential.

In sum, while Transformers have been the rockstars of sequence modeling, the advancements in SSMs suggest that the spotlight may start to share its focus, prompting an exciting era of innovation and rediscovery in the field.

The discussion around State Space Models (SSMs) versus Transformers reflects a mix of skepticism, optimism, and technical debate:

Key Themes:

  1. Tokenization Debate:

    • Some users argue that replacing tokenization schemes like BPE with raw bytes could simplify representations and better align with linguistic fundamentals. For example, one user claims raw bytes (as in Chinese characters or English letters) might offer a more basic, language-agnostic alternative to BPE.
    • Counterarguments suggest Transformers require preprocessing to compress dense information efficiently, particularly for video/audio tasks. Current architectures still depend heavily on tokenization despite its limitations.
  2. SSMs vs. Transformers:

    • Skeptics (mbowcut2) question whether SSMs justify significant R&D investment compared to optimized Transformer-based LLMs. They argue that established methods (like Transformers) dominate benchmarks, making SSMs a risky bet without clear evidence of outperformance.
    • Proponents (vsrg, nxts) highlight SSMs’ potential differentiation (e.g., efficiency gains, novel architectures like xLSTM) and niche applications (time-series forecasting). Some cite models like xLSTM as proof that alternative architectures can rival Transformers in specific domains.
  3. Practical Challenges:

    • Training costs and scalability remain barriers. While SSMs might theoretically reduce bottlenecks like "information density," users note current SSMs lack competitive benchmarks and struggle to match Transformer-scale datasets.
    • Hybrid approaches (Herring) and incremental innovations (e.g., DeepSeek’s models) are seen as safer bets than full SSM overhauls.
  4. Broader Research Landscape:

    • Comments hint at parallel efforts (LiquidAI, Griffin) exploring lightweight architectures or alternatives that blend SSM concepts with Transformers. However, the dominance of Transformers in industry R&D (e.g., Llama, Gemma) makes radical shifts unlikely in the short term.

Conclusion:

The discussion underscores cautious interest in SSMs as a complement or niche alternative to Transformers, but few see them as an imminent replacement. Technical challenges, entrenched infrastructure for Transformers, and high costs of experimentation temper enthusiasm, even as theoretical advantages (efficiency, differentiation) keep SSMs on the radar.

Google can now read your WhatsApp messages

Submission URL | 448 points | by bundie | 309 comments

This week, Google stirred the pot in the Android community with an unexpected announcement regarding its AI-powered Gemini service. Starting July 7, Gemini is now integrated with popular apps like Phone, Messages, and WhatsApp, allowing users to command tasks like sending messages without needing to toggle Gemini Apps Activity. However, this convenience comes with a catch. While Google reassures users that Gemini won’t read or summarize WhatsApp messages under normal circumstances, integration with Google Assistant or Utility apps could enable access to your messages and notifications.

Naturally, privacy-concerned users were quick to act, with many opting to disable Gemini’s connected apps to safeguard their data. Despite turning off Gemini Apps Activity, Google maintains data for a brief 72 hours to "ensure safety and security.” Those hoping to completely extricate Gemini from their devices face a more complex endeavor, as Google representatives artfully dodged direct inquiries about permanent removal. However, an arduous path exists via ADB (Android Debug Bridge) to uninstall Gemini, albeit with mixed results due to its ties with the main Google app.

Tech enthusiasts looking to eliminate Gemini altogether are advised to roll back all updates and disable the Google app entirely, a move that effectively removes the AI agent but also disables Google’s broader functionalities.

This development has led to broader discussions about privacy, with questions circling the necessity and implications of AI’s growing integration into daily tech interactions. Amidst these concerns, users are reminded of the broader trade-offs involved when weighing functionality against privacy. Meanwhile, platforms like Neowin encourage community engagement and support through various means, including Amazon shopping links and virtual coffee contributions. Stay tuned to the ever-evolving landscape of tech privacy and AI integration—your voice, and vigilance, matter.

The discussion revolves around Google's integration of Gemini as an OS-level feature on Android, raising significant privacy and antitrust concerns. Key points include:

  1. Privacy Concerns: Users worry Gemini could access sensitive data (e.g., WhatsApp messages) via integrations like Google Assistant, despite Google’s assurances. Disabling Gemini is complicated, requiring ADB or disabling the Google app entirely, which breaks core functionalities.

  2. Antitrust Parallels: Comparisons are drawn to Microsoft’s Internet Explorer case and Apple’s ecosystem control. Critics argue Google’s OS-level integration stifles competition, echoing historical antitrust issues. The EU’s Digital Markets Act (DMA) is cited as a regulatory counterforce.

  3. OS Alternatives: Some advocate for alternatives like GrapheneOS or LineageOS to escape Google’s control, though practical hurdles (e.g., banking app compatibility) persist. Others mention decentralized projects like ApostrophyOS or Purism’s Librem 5.

  4. Apple Comparisons: Debates arise over Apple’s Siri and privacy reputation, with skepticism about both companies’ motives. While Apple is seen as more privacy-focused, critics note its ecosystem’s closed nature mirrors Google’s control.

  5. Broader Skepticism: Users express distrust in tech giants, emphasizing data monetization and AI overreach. Concerns about centralized control of personal information and AI’s role in daily tech interactions dominate.

The discussion highlights tensions between innovation/convenience and privacy/control, reflecting broader debates about corporate power and regulatory adequacy in the AI era.

AI Submissions for Mon Jul 07 2025

Mercury: Ultra-fast language models based on diffusion

Submission URL | 526 points | by PaulHoule | 217 comments

In the cutting-edge realm of language models, Inception Labs and a team of innovative researchers present "Mercury," a revolutionary line of large language models (LLMs) engineered on diffusion principles. This marks a significant leap in computational speed and efficiency, particularly emphasized in their first iteration, Mercury Coder, crafted specifically for coding tasks.

Mercury introduces a novel approach by deploying Transformer architecture to simultaneously predict multiple tokens, demonstrating its prowess in both speed and quality. Tested rigorously, Mercury Coder Mini and Small models blaze through at staggering rates of 1109 and 737 tokens per second, respectively, using NVIDIA H100 GPUs. These stats represent up to a tenfold improvement in speed compared to existing frontline models, all while maintaining competitive performance quality.

Beyond sheer technical achievements, Mercury models have already proved their mettle on an array of coding benchmarks across various languages and applications. They're winning real-world tests too, securing a high ranking on Copilot Arena's quality charts and currently holding the title as the fastest model.

For those eager to dive into Mercury's capabilities, public access is facilitated via a newly released API, and a free playground offers hands-on exploration opportunities. This paper isn't just about numbers and metrics; it's a showcase that beckons developers and researchers alike to witness and participate in the evolving narrative of language model advancements.

Discover more about Mercury's transformative potential, delve into their data, and perhaps join the conversation on arXiv to stay at the forefront of this technological frontier.

The discussion revolves around frustrations with slow Continuous Integration (CI) pipelines and testing bottlenecks in software development. Key points include:

  1. CI Pain Points
    Developers express annoyance with delays in PR validation, flaky tests, resource constraints, and inefficient caching systems. Some note feeling "collectively stuck" despite years of CI optimization efforts.

  2. Corporate vs. Small Teams
    Participants contrast Google’s massive parallel testing infrastructure (e.g., launching 10,000 machines) with smaller companies’ struggles to afford equivalent resources. High costs for cloud VMs and hardware are cited as limiting factors.

  3. Mercury’s Promise
    Mercury’s speed (e.g., 1,109 tokens/sec) sparks optimism for accelerating test execution and code generation. Users hope it could reduce CI bottlenecks, though skepticism exists around infrastructure/resource limitations.

  4. Technical Trade-offs
    Comments debate deterministic CI steps, caching reliability, and concurrency challenges in Java/testing. Tools like Bazel and cloud CI solutions are mentioned but critiqued for complexity/cost.

  5. Organizational Issues
    Some argue slow CI processes stem from organizational problems (e.g., prioritization, tooling choices) as much as technical ones. A Google employee lags in dependency management, highlighting internal inefficiencies.

  6. Cloud/Security Concerns
    Side discussions touch on cloud providers' security models (e.g., Azure’s confidential computing) and whether they truly mitigate risks like code theft.

Overall: The discourse underscores a tension between cutting-edge speed (Mercury) and systemic CI/CD challenges rooted in cost, complexity, and organizational inertia. While Mercury’s performance impresses, adoption hurdles remain for teams lacking Google-scale resources.

LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping

Submission URL | 117 points | by jw1224 | 23 comments

LookingGlass is taking the world of optical illusions to a new level by merging traditional art forms with cutting-edge technology. Created by a team of researchers from DisneyResearch|Studios and ETH, this groundbreaking work introduces the concept of generative anamorphosis. Unlike traditional anamorphic images, which require specific angles or devices for interpretation, LookingGlass leverages latent rectified flow models to produce images that can also be appreciated directly from the front.

The secret sauce? A new technique called Laplacian Pyramid Warping. This approach is frequency-aware, allowing for the creation of high-quality, visually stunning illusions. The method extends the reach of anamorphic images by integrating them with advanced latent space models and spatial transformations, offering an impressive array of new generative perceptual illusions.

The research holds significant implications for both artistic and scientific communities, offering fresh ways to create engaging visual experiences. It's a fascinating intersection of art, mathematics, and technology, all aimed at expanding the scope of how we view and interpret the world through images. Keep an eye out for this innovation as it brings the enchanting world of illusions to our everyday visual experiences.

The discussion around the LookingGlass submission highlights both technical fascination and broader reflections on innovation:

  1. Related Work & Comparisons: Users note similarities to existing projects like visual anagrams and diffusion illusions, referencing work by creators like Steve Mould, Daniel Geng, and others. Techniques such as pixel swapping, reflection-based puzzles, and multi-layer image transformations are seen as precursors to this research.

  2. Technical Nuances:

    • Compression artifacts (e.g., in low-detail areas like skies) are acknowledged as visible trade-offs.
    • The Laplacian Pyramid Warping method is contextualized within historical concepts like anamorphic encryption, with some pointing to EUROCRYPT 2022 research and morphic techniques spanning centuries.
  3. Artistic & Corporate Implications:

    • Praise for Disney Research’s involvement underscores the blend of art and tech driving progress.
    • Skeptical remarks liken Disney's innovation to a "Silicon Valley burnout" story, reflecting tension between corporate scale and grassroots creativity. Others defend the project’s achievements despite its small-team origins.
  4. AI’s Role Debated: A subthread questions whether such breakthroughs depend on AI, with one user cautioning against dismissing non-AI scientific contributions.

Overall, the discussion balances admiration for LookingGlass’s technical ingenuity with critiques of its novelty and reflections on corporate-driven innovation. The intersection of historical methods, modern generative models, and artistic application emerges as a key theme.

Adding a feature because ChatGPT incorrectly thinks it exists

Submission URL | 1110 points | by adrianh | 386 comments

At Soundslice, the company renowned for its sheet music scanner, something unexpected yet oddly intriguing occurred. Adrian Holovaty, the man behind the operation, noticed an unusual trend surfacing within the error logs. Instead of dealing exclusively with faulty images of sheet music, they were flooded with screenshots of ChatGPT sessions. These weren't traditional music notations but ASCII tablature—guitar music’s rather rudimentary notation style. Unlike other types of uploads, these images weren't supported by their system.

So, why were these ASCII tab screenshots gaining traction on their platform? The mystery unraveled when Holovaty delved into ChatGPT himself. The AI was erroneously advising users to visit Soundslice for audio playback of ASCII tabs, a feature that didn't actually exist. This miscommunication had inadvertently turned into a stream of users seeking a solution that the platform didn’t offer.

Faced with a unique challenge–a steady influx of users misled by AI—and an unconventional market demand, Soundslice had a choice. They could dismiss the misconceptions with disclaimers or innovate by meeting this unforeseen demand head-on. In a twist to the tale, they opted for the latter, developing an ASCII tab importer—a feature Holovaty humorously admitted was at the bottom of his 2025 expectation list.

This situation presents an intriguing conundrum for modern businesses: How should companies respond when misinformation about their product inadvertently creates customer demand? Should strategic decisions be influenced by incorrect external narratives? While Holovaty finds satisfaction in creating a tool beneficial to users, there’s a lingering ambivalence—was their hand forced into development by AI misinformation? A quandary that sparks broad reflection on the ethical interplay between AI, misinformation, and product development.

The discussion revolves around AI's impact on technology, society, and ethics, sparked by the Soundslice case where users followed ChatGPT’s misleading advice, leading the company to adapt by adding unsupported features. Key themes include:

  • AI’s Role in Development: Users noted GPT-4’s ability to guess API code, but emphasized that opaque APIs and AI’s unpredictability risk confusion.
  • Content Quality Concerns: Tools like Grammarly, while helpful, were criticized for stripping human nuance (e.g., passive voice fixes harming stylistic intent). AI-generated text’s reliability was debated—praised for SEO efficiency but derided for “hallucinations” and threats to authenticity.
  • Job Displacement Fears: Many worried AI could rapidly replace jobs, disproportionately affecting workers without political safeguards (e.g., universal basic income). Historical parallels to the Luddite movement and industrial revolution underscored resistance to disruptive tech.
  • Corporate Responsibility: Critics blamed “greedy business managers” for prioritizing short-term cost-cutting via AI over long-term societal health, risking job markets and quality outputs.
  • Ethical Regulation: Calls for frameworks to ensure AI benefits society, not just corporations. Some argued societal structures, not tech itself, are the root issue (e.g., unnecessary jobs vs. equitable redistribution).
  • Irony and Paradox: The Soundslice incident exemplified unintended demand creation via AI errors. Others humorously compared worshipping AI to deity-like reliance, highlighting unease with unchecked power.

Overall, the comments reflect cautious pessimism about AI’s rapid integration without ethical guardrails, stressing the need for human oversight, adaptive policies, and proactive regulation to mitigate disruption.

AI cameras change driver behavior at intersections

Submission URL | 51 points | by sohkamyung | 107 comments

In an effort to make roads safer and reduce traffic fatalities, U.S. cities are adopting Vision Zero, a strategy originally from Sweden that aims to eliminate road deaths by employing AI-driven camera systems. These systems, powered by companies like Stop for Kids and Obvio.ai, are being deployed at intersections to catch drivers who ignore stop signs and engage in risky behavior. Intersections are a critical focus since they are the site of about half of all car accidents, often resulting in severe outcomes.

One poignant story fueling this technological push is that of Kamran Barelli, CEO of Stop for Kids. Barelli founded the company after his wife and young son were nearly killed by an inattentive driver. Dissatisfied with traditional speed signs and intermittent police presence, Barelli and his team designed a more sophisticated solution. Their AI cameras, capable of operating around the clock and in all lighting conditions, automatically issue citations for violations while respecting driver privacy by not recording faces.

The system has shown promising results in pilot programs, such as in Saddle Rock, N.Y., where compliance with stop signs soared from 3% to 84% following implementation. These AI cameras not only encourage safer driving but also offer incentives like potential reductions in insurance rates, making them both a carrot and a stick for promoting road safety. As these technologies gain traction, they offer a glimpse into a future where AI plays a crucial role in transforming driver behavior and contributing to public safety.

Summary of Discussion:

The discussion revolves around the efficacy of stop signs vs. rolling stops, enforcement challenges, and comparisons of traffic safety infrastructure across regions (e.g., U.S., EU). Key points include:

  1. Stop Sign Compliance vs. Rolling Stops:

    • Critics argue many drivers perform "rolling stops" (slowing but not stopping completely), risking pedestrian safety, especially at intersections with poor visibility.
    • Some defend rolling stops in low-traffic scenarios (e.g., empty intersections), claiming full stops waste time and energy. Others counter that incremental injuries from non-compliance add up over time.
  2. Technology & Enforcement:

    • AI-driven cameras and computer vision are supported for 24/7 enforcement, particularly near schools/residential areas. Critics caution against over-reliance on tech without addressing systemic issues like road design.
    • Mixed opinions exist on automated citations: supporters emphasize fairness and deterrence, while skeptics highlight privacy concerns and potential misuse.
  3. Infrastructure Comparisons:

    • European designs (e.g., roundabouts, priority rules in Netherlands/Germany) are praised for reducing conflicts, contrasting with U.S. reliance on 4-way stops.
    • Debate on signage clarity: U.S. "yield" conventions differ from EU road markings, impacting driver predictability.
  4. Pedestrian Safety:

    • Poorly marked crosswalks, driver distraction, and lax enforcement create risks. Suggestions include better lighting, redesigned intersections, and stricter penalties for ignoring stop signs.
  5. Statistics & Cultural Factors:

    • Higher U.S. traffic fatalities (vs. EU) are linked to urban sprawl, longer driving distances, and car-dependent lifestyles. Calls for public transit investment and reduced sprawl to mitigate risks.
    • Cultural attitudes (e.g., prioritizing convenience over pedestrian safety) are seen as barriers to Vision Zero goals.

Conclusion: The thread highlights tensions between pragmatic driving habits and stringent enforcement, advocating for balanced solutions combining AI enforcement, infrastructure redesign, and systemic shifts toward pedestrian-centric planning.

tinymcp: Let LLMs control embedded devices via the Model Context Protocol

Submission URL | 49 points | by hasheddan | 10 comments

Are you ready for a technological leap that combines the power of AI with the physical world? Meet tinymcp, an experimental project that allows Large Language Models (LLMs) to control embedded devices through the Model Context Protocol (MCP). The project, hosted on GitHub by Golioth, demonstrates how this can be done seamlessly with existing microcontrollers using the Golioth management API.

Tinymcp is designed to work with Golioth's LightDB State and Remote Procedure Calls (RPCs), making it possible to expose device functionalities without modifying device firmware, a boon for developers looking to integrate AI into embedded systems efficiently. The project comes packed with handy examples, like the "blinky" demonstration, which shows how to manage LED control via LLMs.

For the curious developer ready to dive in, setting up tinymcp requires connecting a device to the Golioth platform, running a local MCP server, and configuring environment variables. A wealth of resources, including documentation for setup and interaction using different tools like MCP Inspector, Claude Code, and Gemini CLI, are provided.

Remember, while the integration of AI with physical devices holds immense potential, it also demands caution due to the experimental nature of the project and the capability delegation involved. Join the cutting edge of tech by exploring tinymcp, which seeks to unlock the full potential of LLMs in the world of microcontrollers. For further insights and to get started, head to the detailed guide available on Golioth's blog.

The Hacker News discussion explores a mix of technical, philosophical, and sci-fi-inspired reactions to tinymcp, a project enabling LLMs to control embedded systems. Key themes include:

  1. Sci-Fi Parallels: Users humorously reference Hal 9000 (from 2001: A Space Odyssey) and Ubik (Philip K. Dick’s novel), drawing parallels to scenarios where AI-controlled systems malfunction or refuse commands (e.g., "the doctor refused to open the door"), highlighting concerns about autonomous decision-making in physical devices.

  2. Token Limitations: Comments touch on LLM token constraints ("Youre f tkns"), noting challenges in prompt efficiency and context handling when integrating AI with microcontrollers.

  3. Metaphors for Control: Discussions metaphorically compare device operations to industrial processes (e.g., docking tankers, pumping oil), underscoring the complexity and potential risks of delegating control to AI models. Terms like "DAO" (Door Opens Job) hint at debates around access control and deterministic outcomes.

  4. Caution & Humor: While some users joke about AI "freezing" or behaving unpredictably, others raise implicit concerns about reliability, security, and the ethics of embedding LLMs in physical systems. The fragmented, coded language reflects playful experimentation aligned with the project’s experimental nature.

Overall, the discussion blends curiosity about tinymcp’s innovation with wariness about its implications, anchored in cultural references and technical critiques of AI determinism in embedded contexts.

I am uninstalling AI coding assistants from my personal computer

Submission URL | 75 points | by ssutch3 | 39 comments

In a heartfelt post, Samuel Sutch opens up about his decision to uninstall AI coding assistants from his personal workflow, marking a significant shift in his approach to coding. After spending months using AI tools like Claude Code to rapidly develop features for his startup, Roam, Samuel found himself in a coding frenzy reminiscent of a high-speed race fueled by these digital assistants. While the initial thrill was addictive and allowed for impressive productivity, he soon discovered a sense of emptiness that followed—an artistic void rooted in the lack of personal involvement in the coding process.

Samuel expresses concerns about how this AI-enhanced workflow impacted his psychology and creativity. As someone who views coding as a personal art form and a core expression of his values, he found himself missing the hands-on, deeply engaging process that coding once represented for him. The realization that he hadn't written a single line of code himself in weeks alarmed him, prompting a reconsideration of his methods.

In his essay, Samuel also acknowledges the broader industry pressure to adopt AI tools for increased efficiency but draws a personal line for his projects. Emphasizing the intrinsic satisfaction of creating without intermediaries, he decides to return to a more traditional approach for his personal endeavors. While recognizing the inevitability of AI in professional settings, he is committed to maintaining a hands-on relationship with code in his own ventures, recapturing the connection between mind and machine.

It's a thought-provoking reminder of the balance between innovation and authenticity in the tech world, leaving readers to ponder the true essence of creativity amid advancing technology. Samuel invites feedback and interaction from the community, indicating his openness to discussions around this evolving dynamic.

Summary of Discussion:
The discussion reflects polarized views on AI coding tools, with technical, creative, and workplace implications debated.

Key Themes:

  1. Creativity vs. Productivity:

    • Many relate to Samuel’s artistic dissatisfaction. NathanKP compares AI-assisted coding to gardening or raising children—structured but lacking deeper fulfillment. Others find AI stifles craftsmanship, likening it to outsourcing creativity.
    • Critics argue AI disrupts the personal connection to code, while proponents highlight efficiency gains (e.g., automating repetitive tasks).
  2. Technical Challenges:

    • Users like blfrbrnd describe inconsistent results with tools like Claude or Gemini, struggling to maintain code quality and control.
    • Debates arise about AI’s suitability for niche tasks (e.g., React development, GLSL shaders) versus mainstream coding.
  3. Workplace Pressures:

    • Employers increasingly mandate AI adoption for productivity metrics, creating tension with developers who prioritize hands-on work.
    • Concerns about AI replacing junior roles ("disposable interns") or enabling management to devalue skilled labor emerge.
  4. Debate Over Execution:

    • While tools like Aider or Claude streamline workflows for some, others criticize hallucinations, context limitations, and hidden costs (e.g., API expenses).
    • Skepticism persists around measuring ROI ("AI metrics feel performative") and ethical concerns (e.g., code security, reliance on training data).

Notable Perspectives:

  • NathanKP: Advocates for explicit prompt engineering but acknowledges AI’s "toddler phase" limitations.
  • clown_strike: Blasts AI metrics as gaslighting, fearing erosion of professional standards and job security.
  • geoka9: Shares mixed success with AI tools, praising efficiency but lamenting dwindling control.
  • 20after4: Questions whether AI aids or hampers experienced programmers, suggesting a "cop-out" for subpar outcomes.

Conclusion**:

The thread underscores a tension between embracing AI’s potential and preserving the craft of coding. While some herald efficiency gains, others warn of creative stagnation and workplace commodification. Samuel’s essay resonated as a catalyst for broader reflection on balancing innovation with authenticity.