Last 24hr

3 articles

๐Ÿ“ฐ
r/LocalLLaMA Aggregators Jun 02, 2026
Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks

For two weeks I ran my multi-agent orchestrator entirely on Qwen3.6-27B via Ollama, on a single 3090. The goal: see if a local model could replace Claude as the reasoning layer for the lead/manager/โ€ฆ

For two weeks I ran my multi-agent orchestrator entirely on Qwen3.6-27B via Ollama, on a single 3090. The goal: see if a local model could replace Claude as the reasoning layer for the lead/manager/sub-agent loop. Here's where it worked and where it broke. Setup: - RTX 3090, 24GB VRAM - Qwen3.6-27B at Q6_K (~22GB on-GPU), 32k effective context - Ollama as the inference engine - Multi-agent orchestrator with structured-JSON plans, plan-approval modal, auto-review pass after sub-agent completion - Tested across 47 multi-step coding workflows over two real repos What worked (the reasoning layer): - Plan generation. Qwen3.6 generated multi-step plans roughly as well as Claude on these tasks. Slightly more conservative (fewer unsolicited "let me also refactor X" steps), but coherent and schema-valid at ~95% after a few prompt tweaks. The remaining 5% were schema fixable with one re-prompt. - Memory extraction. Mem0-style fact extraction every 6 turns worked fine. Qwen pulled out the same kinds of facts Claude does ("user prefers no comments unless they explain a 'why'") and stored them cleanly in Qdrant. - Auto-review of sub-agent output. A second Qwen instance reviewing the first one's code caught roughly 60% of the bugs Claude's review caught on the same set. Less savage. Still useful and free. Where it broke: - Tool-call reliability. Qwen3.6's JSON tool-call output had a ~12% format error rate across the 47 tasks. Claude was ~0.5% on the same workload. The errors weren't malformed JSON they were wrong field names, wrong types, hallucinated tool signatures. Outlines / strict-output mode reduced it but didn't kill it. - Long-context drift. Past ~14k tokens of accumulated session context, Qwen started misremembering decisions it had made earlier ("you said use Postgres" no, I said the opposite). Hard practical limit ~12k tokens, then aggressive summarize-and-reset. - Cascade-failure handling. When a sub-agent failed, Claude's planner usuall

๐Ÿ“ฐ
r/MachineLearning Aggregators Jun 02, 2026
MeshFlow: production-safe multi-agent orchestration โ€” SHA-256 audit chain, HIPAA/SOX/GDPR built in, 70-85% token cost reduction [Open Source][D]

79% of enterprises have adopted AI agents. Only 11% run them in production. We've spent the past year building agent systems for banks, clinical operations teams, and engineering orgs. The problem isโ€ฆ

79% of enterprises have adopted AI agents. Only 11% run them in production. We've spent the past year building agent systems for banks, clinical operations teams, and engineering orgs. The problem isn't that agents don't work โ€” they work fine. The problem is that every framework leaves compliance, cost governance, and crash recovery as exercises for the team. After the framework fails them in production. We built MeshFlow to close that gap. **The core idea:** treat governance as infrastructure, not middleware. Every agent step passes through a 15-step kernel that handles identity, rate limiting, budget enforcement, compliance profiles, input/output guardrails, PII detection, risk classification, tool permission, the LLM call itself, audit ledger write, and SLA recording โ€” in that order, always, without configuration. ```python from meshflow import Workflow, CostCap, Agent wf = Workflow(cost_cap=CostCap(usd=5.00)) wf.add(Agent('researcher'), Agent('analyst'), Agent('writer')) result = wf.run('Write a competitive analysis of our market') # Compliant. Durable. Audited. Cost-capped. Done. ``` ```bash pip install meshflow ``` **What's technically interesting:** **Token optimization layer** โ€” five compounding mechanisms that reduce LLM spend 70-85%: - `cache_control` on every system prompt and tool definition (Anthropic: 10% of normal price on cached tokens) - `ModelRouter`: task-type classification routes simple tasks to nano models (keyword + token-count heuristic, zero LLM call) - `ContextCompactor`: sliding window summarization activates at configurable token threshold - `RAGTokenBudget`: hard `max_chars` cap on knowledge injection with truncate/drop/tail strategies - `ContextDeduplicator`: shared context sent once for N parallel agents, not N times **SHA-256 audit chain** โ€” each step record stores `prev_hash` (SHA-256 of the previous record) and `entry_hash` (SHA-256 of its own canonical fields). Modify any log entry and `verify_chain()` breaks. This is the artifact

๐Ÿ“ฐ
r/MachineLearning Aggregators Jun 02, 2026
MeshFlow: An open-source orchestrator for governed, cost-optimized multi-agent workflows [D]

Hey ML community, Weโ€™ve just open-sourced **MeshFlow** , a code-first, framework-agnostic runtime designed for governing and optimizing multi-agent systems in production. Most agent frameworks focus โ€ฆ

Hey ML community, Weโ€™ve just open-sourced **MeshFlow** , a code-first, framework-agnostic runtime designed for governing and optimizing multi-agent systems in production. Most agent frameworks focus on rapid prototyping, but ML and platform engineering teams usually run into hard bottlenecks around LLM cost scaling, evaluation alignment, and execution safety. MeshFlow tackles these from a runtime/infrastructure perspective. Here are the key ML and system features: * **Task-Based Model Routing** : Before an agent executes a node, MeshFlow runs an evaluation on task complexity, routing the execution to one of four model tiers (`nano`, `small`, `medium`, `large`). This cuts overall API costs by 50-60% by utilizing smaller local models (e.g. LLaMA-3-8B) for standard formatting or extraction and reservation of frontier models (e.g. Claude Opus) for high-complexity reasoning. * **Context Compactor & Summary Pruning Middleware** : Implements sliding window summarization and context deduplication across parallel agent teams to limit prompt length growth. * **System Prompt Caching** : Native injection of Anthropic `cache_control` tags when system prompts exceed 1024 tokens. * **Cost Regression Evaluation Gate** : Integrates with CI pipelines to evaluate agent changes against a golden scenario baseline, throwing failures if code updates introduce token cost regressions. * **Resilient State Persistence** : Multi-backend state serialization (Redis, PostgreSQL, S3) that preserves checkpoint frames and allows resuming paused workflows. Here is the basic API contract: ```python from meshflow import Workflow, Agent, CostCap wf = Workflow(cost_cap=CostCap(usd=5.00)) wf.add(Agent('researcher'), Agent('critic'), Agent('writer')) result = wf.run('Compile comparative literature review of LLM reasoning pathways') print(result) ``` We'd love to discuss: 1. How do you handle token budget enforcement and model routing in your agent loops? 2. What evaluation pipelines do you use to detec