Last 24hr

6 articles

r/LocalLLaMA Aggregators Jun 02, 2026
1-bit Bonsai Image 4B and Ternary Bonsai Image 4B Image Generation for Local Devices with just 0.93 GB and 1.21 GB respectively of Diffusion Transformer Footprint. So tiny!

https://prismml.com/news/bonsai-image-4b   submitted by   /u/Addyad [link]   [comments]

๐Ÿ“ฐ
Hugging Face AI Labs Jun 02, 2026
Holo3.1: Fast & Local Computer Use Agents
๐Ÿ“ฐ
r/LocalLLaMA Aggregators Jun 02, 2026
What are you using to preprocess pdfs before feeding them to a local model?

I have been running a local setup for document QA and the output quality varies a lot depending on what the pdf looks like when it hits the LLM. clean prose docs are fine but anything with tables or โ€ฆ

I have been running a local setup for document QA and the output quality varies a lot depending on what the pdf looks like when it hits the LLM. clean prose docs are fine but anything with tables or multi column layouts comes out garbled and the model just works with whatever broken input it got. (No complaints, no demands sort of thing) I had tried pymupdf and pdfplumber and both were decent for simple stuff tho. now stuck trying to figure out whether to go with docling or llamaparse for the messier docs, both keep coming up but i cant tell which actually makes sense for my setup or if theres something else people are using locally that holds up better. Whats your take on these guys?? Which one would be more practical   submitted by   /u/TangeloOk9486 [link]   [comments]

๐Ÿ“ฐ
r/LocalLLaMA Aggregators Jun 02, 2026
Ignoring benchmarks, how do the newest local models (gemma 4 31B, 26BA4B, Qwen 3.6) โ€œfeelโ€ to you? What do you think they compare to?

I use local ai mainly for creative writing, and benchmarks are a bit iffy on that I feel like. Iโ€™d like to compare Gemma mainly to Gemini as I like their writing the best, I do know that qwen 3.6 is โ€ฆ

I use local ai mainly for creative writing, and benchmarks are a bit iffy on that I feel like. Iโ€™d like to compare Gemma mainly to Gemini as I like their writing the best, I do know that qwen 3.6 is amazing but mostly for coding and agentic work. Iโ€™d like to ask everyone how the new(er?) models feel to you personally rather than looking at benchmarks which they are likely optimised for. For me, I feel like Gemma 4 31B (even q4) still falls short of 2.5 pro, Iโ€™m most familiar with 2.5 pro since I used so much of it for free on ai studio when it was a preview. The style and prose are there but long context it still misremembers minor details. I think itโ€™s actually better than gpt 4.5, but tha could be personal preference since, again, I do mostly only creative writing   submitted by   /u/opoot_ [link]   [comments]

๐Ÿ“ฐ
r/LocalLLaMA Aggregators Jun 02, 2026
Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks

For two weeks I ran my multi-agent orchestrator entirely on Qwen3.6-27B via Ollama, on a single 3090. The goal: see if a local model could replace Claude as the reasoning layer for the lead/manager/โ€ฆ

For two weeks I ran my multi-agent orchestrator entirely on Qwen3.6-27B via Ollama, on a single 3090. The goal: see if a local model could replace Claude as the reasoning layer for the lead/manager/sub-agent loop. Here's where it worked and where it broke. Setup: - RTX 3090, 24GB VRAM - Qwen3.6-27B at Q6_K (~22GB on-GPU), 32k effective context - Ollama as the inference engine - Multi-agent orchestrator with structured-JSON plans, plan-approval modal, auto-review pass after sub-agent completion - Tested across 47 multi-step coding workflows over two real repos What worked (the reasoning layer): - Plan generation. Qwen3.6 generated multi-step plans roughly as well as Claude on these tasks. Slightly more conservative (fewer unsolicited "let me also refactor X" steps), but coherent and schema-valid at ~95% after a few prompt tweaks. The remaining 5% were schema fixable with one re-prompt. - Memory extraction. Mem0-style fact extraction every 6 turns worked fine. Qwen pulled out the same kinds of facts Claude does ("user prefers no comments unless they explain a 'why'") and stored them cleanly in Qdrant. - Auto-review of sub-agent output. A second Qwen instance reviewing the first one's code caught roughly 60% of the bugs Claude's review caught on the same set. Less savage. Still useful and free. Where it broke: - Tool-call reliability. Qwen3.6's JSON tool-call output had a ~12% format error rate across the 47 tasks. Claude was ~0.5% on the same workload. The errors weren't malformed JSON they were wrong field names, wrong types, hallucinated tool signatures. Outlines / strict-output mode reduced it but didn't kill it. - Long-context drift. Past ~14k tokens of accumulated session context, Qwen started misremembering decisions it had made earlier ("you said use Postgres" no, I said the opposite). Hard practical limit ~12k tokens, then aggressive summarize-and-reset. - Cascade-failure handling. When a sub-agent failed, Claude's planner usuall

๐Ÿ“ฐ
r/LocalLLaMA Aggregators Jun 02, 2026
Man trains local model to detect and kill mosquitos with a laser

Now this is local AI innovation we can all get behind. https://x.com/stevencheng/status/2059836738449854898   submitted by   /u/No_Information9314 [link]   [comments]