Mamba Compression, LLM KV Cache Efficiency, Reasoning Models' Presuppositions
PM Briefing #46 | May 07, 2026 | PM Interview Prep Club
StateSMix: Lossless Compression with Mamba State Space Models
What's new: Researchers unveiled StateSMix, a lossless compressor that combines online-trained Mamba-style State Space Models with sparse n-gram context mixing, achieving superior compression ratios on benchmarks like enwik8.
This new algorithm, StateSMix, leverages Mamba State Space Models (SSMs) for advanced data compression without requiring pre-trained weights or specialised hardware. It's designed to operate online, meaning it learns and compresses data dynamically as it processes information.
StateSMix integrates an online-trained Mamba-style SSM with n-gram context mixing, a technique that models character sequences to improve prediction. The SSM component is particularly efficient at capturing long-range dependencies in data streams, allowing for more effective and adaptive compression.
StateSMix is a fully self-contained, lossless compressor that requires no pre-trained weights, no GPU, and no external dependencies.
It uses an online-trained Mamba-style State Space Model (SSM) with approximately 120K active parameters per file.
Achieves 2.123 bpb on 1 MB, 2.149 bpb on 3 MB, and 2.162 bpb on 10 MB on the enwik8 benchmark.
Why it matters: This breakthrough signals the potential to dramatically reduce data storage and transmission costs, enabling more performant applications on resource-constrained devices, and optimising infrastructure for AI-driven products. PMs building on the edge or with large data pipelines can leverage this to achieve significant cost savings and improve the user experience.
We're thinking: This research points to a future where general-purpose AI architectures like Mamba aren't just for generative tasks but also underpin fundamental infrastructure. We anticipate that as these models become more efficient, they will displace specialised algorithms in areas like data compression and networking, blurring the lines between core infrastructure and AI applications.
eOptShrinkQ: Compressing KV Caches for LLMs
What's new: Researchers presented eOptShrinkQ, a new near-lossless KV cache compression technique for transformer models that leverages optimal spectral denoising and quantisation, demonstrating significant bit savings and improved performance.
eOptShrinkQ addresses the memory bottleneck of large language models (LLMs) by efficiently compressing the Key-Value (KV) cache, a critical component for storing past attention states. This two-stage method significantly reduces the memory footprint required for running transformer models.
The technique works by first decomposing the KV cache into a low-rank shared context and a full-rank per-token residual, then applying optimal singular-value shrinkage (eOptShrink), followed by TurboQuant quantisation. This process effectively denoises the data and reduces its size while preserving critical information for model performance.
Validated on Llama-3.1-8B and Ministral-8B models.
Saves nearly one bit per entry over TurboQuant at the same quality for per-head MSE and inner-product fidelity.
At ~2.2 bits per entry, eOptShrinkQ outperforms TurboQuant at 3.0 bits on LongBench (16 tasks) and closely matches or exceeds uncompressed FP16 in multi-needle retrieval.
Why it matters: This breakthrough offers a pathway to build more cost-effective and performant LLM applications, enabling longer context windows and potentially deploying larger models on less expensive hardware. For PMs, it directly impacts product scalability, accessibility, and the range of features that LLM-powered products can support.
We're thinking: The pursuit of larger context windows in LLMs creates a continuous tension with hardware memory limitations. While architectural innovations push boundaries, methods like eOptShrinkQ represent crucial optimisations that extend the practical ceiling, highlighting that incremental algorithmic gains can be as impactful as raw compute for delivering real-world LLM capabilities.
The 0-1 Product Management Playbook.
Most PM courses teach you to maintain products. This one teaches you to build them from scratch. The 0-1 PM Playbook covers ideation, discovery, MVP scoping, and early go-to-market through real projects, not slides.
Start Learning →
Reasoning Models Struggle with User Presuppositions
What’s new: New research reveals that while large reasoning models show a slight improvement (2-11%) in handling queries with erroneous presuppositions compared to non-reasoning models, they still largely fail to challenge users’ misinformed opinions.
Researchers evaluated several widely deployed reasoning models on queries that contain incorrect underlying assumptions, spanning topics such as health, science, and general knowledge. Despite their advanced capabilities, these models mostly provided answers based on the flawed premise rather than proactively correcting the user’s initial misunderstanding.
The study highlights a fundamental challenge in conversational AI: models are often designed to be helpful and answer directly, which can inadvertently reinforce misinformation if the user’s query is based on a false premise. This behaviour stems from the models’ training to follow user intent without a strong, explicit mechanism for epistemic correction.
Evaluated several widely-deployed models.
Reasoning models achieve 2-11% higher accuracy than non-reasoning models.
Models still fail to challenge a large fraction of erroneous assumptions.
Why it matters: This is a critical insight for PMs building any AI agent or search experience where information accuracy and user understanding are paramount. Relying solely on ‘reasoning’ capabilities isn’t enough; products must incorporate explicit design patterns for fact-checking, surfacing uncertainty, or prompting users to reconsider their assumptions to avoid becoming misinformation amplifiers.
We’re thinking: The core tension here is between being ‘helpful’ by answering the user’s explicit question and being ‘truthful’ by challenging its underlying premise. As AI assistants become more pervasive, this paradox will force a re-evaluation of “user delight” metrics, pushing PMs to prioritise epistemic soundness over simple conversational flow.
Singular Bank Boosts Banker Productivity with OpenAI's ChatGPT and Codex
What's new: Singular Bank developed 'Singularity,' an internal AI assistant powered by ChatGPT and Codex that helps bankers save 60-90 minutes per day on tasks like meeting prep and portfolio analysis.
The Spanish private bank, Singular Bank, successfully implemented an internal AI assistant, dubbed 'Singularity,' to streamline daily workflows for its bankers. This tool leverages OpenAI's advanced language models to automate and optimise time-consuming tasks.
The bank integrated OpenAI's ChatGPT for natural language understanding and generation, alongside Codex for code-related tasks, allowing the assistant to synthesise complex financial data, draft communications, and prepare detailed reports for client meetings. This direct application of commercial LLMs demonstrates immediate enterprise value.
Singular Bank built an internal assistant named 'Singularity'.
The assistant leverages OpenAI's ChatGPT and Codex models.
It helps bankers save between 60 and 90 minutes daily.
Why it matters: This case study demonstrates a practical enterprise application of LLMs, highlighting the tangible ROI of integrating AI into internal workflows, particularly for knowledge workers. For PMs, it provides a concrete example of efficiency gains in a specific industry, illustrating how AI can augment human capabilities and free up skilled professionals for higher-value activities.
We're thinking: While many enterprises are still exploring AI's potential, Singular Bank's deployment suggests that internal efficiency tools built on commodity LLMs will become the dominant first wave of enterprise AI adoption. The low-risk, high-reward nature of automating repeatable professional tasks will drive this market before more complex, external-facing AI products become widespread.
Key Takeaways for PMs
Fundamental AI breakthroughs in areas like compression (StateSMix, eOptShrinkQ) will increasingly dictate the practical limits and cost structures of large-scale AI deployment, making it crucial for PMs to understand these underlying efficiencies.
As seen with Singular Bank, the immediate ROI for AI in enterprises often lies in augmenting internal professional workflows to save significant time. PMs should look for repeatable, time-consuming tasks that can be streamlined with commodity LLMs to demonstrate early value.
For PMs looking to build and scale products with AI, understanding the technical and strategic implications of these advancements is key. The AI in Product Management course at PM Interview Prep Club offers frameworks and tools to navigate this evolving landscape.
PM Roles We’re Tracking
Business Analyst/APM at Blinkit · Gurgaon
Technical Product Manager at Xoriant · Location Flexible
Business Analyst/Product Owner at Capgemini · Bengaluru, Karnataka, India
Senior Product Manager at Microsoft · Bengaluru, Karnataka, India
Product Manager II at Toast · Bengaluru, Karnataka, India
AI Product Manager at ABG Group · Location Flexible
Assistant Product Manager at Sunpharma · Mumbai
Product Owner at Citiustech · Pan India (Hybrid)
Explore more openings on our PM job board or join the WhatsApp community for peer support and more.
PM Interview Prep Club
If today’s briefing was useful, forward it to a PM friend who’d appreciate it.
Courses now live and launching soon:
AI in Product Management — Live now. Learn to lead AI product work: evaluate models, scope AI features, and drive AI-first roadmaps. Enrol today.
The 0-1 PM Playbook — Launching April 30. Build products from scratch: discovery, MVP scoping, and go-to-market through real projects. Founding cohort pricing ends at launch.
Growth PM, Technical PM, Design Thinking for PMs — Coming in May. See all courses →
If you’re in active prep: a Career Clarity session maps your path in one focused 1:1. The Pro Program gives you 3 months of mentored prep with real feedback. The Plus Plan lets you practice 500+ questions on your own schedule.
Browse curated PM openings, try PM practice challenges, or join our free WhatsApp community for daily job alerts and peer support.


