Last 24hr

8 articles

📰
The Verge General Tech Jun 02, 2026
Trump signs executive order to review AI models before they’re released

President Donald Trump signed an executive order Tuesday creating a "voluntary framework" for AI companies to share their frontier models with the federal government before they're released "to promo…

President Donald Trump signed an executive order Tuesday creating a "voluntary framework" for AI companies to share their frontier models with the federal government before they're released "to promote secure innovation and strengthen the cybersecurity of critical infrastructure." The order says the US AI industry has succeeded in part "because we refuse to stifle this […]

r/LocalLLaMA Aggregators Jun 02, 2026
Tiny LLM Benchmark: Jetson Orin Nano Super 8GB - Four Power Modes × Eight Models

Just released a deep benchmark of 8 tiny LLMs (135M → ~1B) on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA - across all 4 power modes: 7W, 15W, 25W, and MAXN Hardware: NVIDIA Ampere GPU …

Just released a deep benchmark of 8 tiny LLMs (135M → ~1B) on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA - across all 4 power modes: 7W, 15W, 25W, and MAXN Hardware: NVIDIA Ampere GPU - 1024 CUDA cores, 32 Tensor cores 6× Arm Cortex-A78AE CPU @ 1.728 GHz 8 GB LPDDR5 @ 204.8 GB/s (unified CPU + GPU - no VRAM split) Active fan cooling - peak junction temp stayed ≤ 73 °C across every run Stack: JetPack R36.4.7 (Ubuntu 22.04), CUDA 12.6 llama.cpp CUDA backend, all layers on GPU (-ngl 99) Load: NVIDIA aiperf — 20 requests per combo, 12 prompt × gen combos per model Power measured via tegrastats VDD_CPU_GPU_CV rail at 500ms intervals Brief methodology: Sweep: prompt ∈ {128, 512, 1024, 2048} tokens × gen ∈ {64, 128, 256} tokens × 4 power modes = 384 benchmark cells per model, 8 models. Key metric: output tok/J = tokens generated per joule of compute energy Findings: Key finding: 25W is the Pareto-optimal mode for every model we have tested. 36–47% more tok/s than 15W 3–26% better output tok/J than 15W 8–35% better output tok/J than MAXN More clocks ≠ more efficiency. MAXN costs ~17% more power for marginal throughput gains. Sub-1B standouts at 25W: SmolLM2-135M - 165 tok/s, 22.6 tok/J (best in suite), 101 MB, ~5.4W. LFM2.5-350M - 120 tok/s in 219 MB. Matches SmolLM2-360M (369 MB) at less than half the size. ~1B class at 25W (ctx=2048, gen=256): LFM2.5-1.2B: 54.1 tok/s, 5.26 tok/J, 698 MB - fastest + best output tok/J in ~1B class Gemma3-1B: edges ahead on total tok/J (118.5 vs 116.2) - lower power draw (6.87W vs 8.46W) compensates for slower decode Llama3.2-1B: 47.0 tok/s, 4.67 tok/J Full blog with all charts, heatmaps, latency tables, and raw HuggingFace datasets (384 cells × 4 modes) linked in the blog! Do check it out — and if you have a Jetson, what are you running on it? Would love to know! Blog   submitted by   /u/East-Muffin-6472 [link]   [comments]

📰
TechCrunch General Tech Jun 02, 2026
ZeroDrift raises $10 million to protect AI models from themselves

A new AI compliance service sits between AI models and end users to flag and replace any messages that might present a compliance problem.

📰
r/LocalLLaMA Aggregators Jun 02, 2026
Ignoring benchmarks, how do the newest local models (gemma 4 31B, 26BA4B, Qwen 3.6) “feel” to you? What do you think they compare to?

I use local ai mainly for creative writing, and benchmarks are a bit iffy on that I feel like. I’d like to compare Gemma mainly to Gemini as I like their writing the best, I do know that qwen 3.6 is …

I use local ai mainly for creative writing, and benchmarks are a bit iffy on that I feel like. I’d like to compare Gemma mainly to Gemini as I like their writing the best, I do know that qwen 3.6 is amazing but mostly for coding and agentic work. I’d like to ask everyone how the new(er?) models feel to you personally rather than looking at benchmarks which they are likely optimised for. For me, I feel like Gemma 4 31B (even q4) still falls short of 2.5 pro, I’m most familiar with 2.5 pro since I used so much of it for free on ai studio when it was a preview. The style and prose are there but long context it still misremembers minor details. I think it’s actually better than gpt 4.5, but tha could be personal preference since, again, I do mostly only creative writing   submitted by   /u/opoot_ [link]   [comments]

📰
India AI Policy India Tech Jun 02, 2026
Why is Gautam Adani betting so heavily on data centers right now, and what does he see coming that others might be missing? As AI reshapes industries, the real race may not be about building models, but building the infrastructure that powers them. From m - LinkedIn

Why is Gautam Adani betting so heavily on data centers right now, and what does he see coming that others might be missing? As AI reshapes industries, the real race may not be about building models, …

Why is Gautam Adani betting so heavily on data centers right now, and what does he see coming that others might be missing? As AI reshapes industries, the real race may not be about building models, but building the infrastructure that powers them. From m  LinkedIn

📰
r/LocalLLaMA Aggregators Jun 01, 2026
I hate to be this guy but: Any good, recent CODING models in the 70-80B range?

3x 24GB vram. Qwen-coder-next is not bad. I'll continue to use it if you yell enough at me. I do a lot of front-end work, which develops rapidly, so the most recent the model the better. Larger tha…

3x 24GB vram. Qwen-coder-next is not bad. I'll continue to use it if you yell enough at me. I do a lot of front-end work, which develops rapidly, so the most recent the model the better. Larger than 80B and I'll have to sacrifice the decentish Q6 quant, or the minimum (for coding) 256k context. I do NOT believe that the latest 27-31B dense models can realistically beat an 80B model, even if I stomach the slowness, but change my mind. Slowness is an issue since I do NOT yolo. I micro-manage the heck out of the agent. It's actually more efficient than letting it rip, then having it rip again the next day because it had been climbing the wrong ladder.   submitted by   /u/ParaboloidalCrest [link]   [comments]

📰
r/MachineLearning Aggregators Jun 01, 2026
Full duplex vs half duplex - the spectrum of AI voice models [D]

It seems that there are two ways to build voice AI: Half-duplex: strict turn-taking. You speak, the other side waits until you’re done, one direction of speech at a time. ← This is how almost every v…

It seems that there are two ways to build voice AI: Half-duplex: strict turn-taking. You speak, the other side waits until you’re done, one direction of speech at a time. ← This is how almost every voice assistant works today. Full-duplex: two channels, both sides can talk at any time - no more waiting for your “turn”. ← This is the way humans actually talk. In fact, there are three crucial things half-duplex voice models can't really do: Overlap - talking and listening at the same time without falling apart Backchannels - the "mhms," "rights," and "yeahs" you drop in while the other person is still going Barge-in - getting interrupted mid-sentence and recovering gracefully These three features are a big reason why voice agents still feel “robotic” to this day. But what exactly is the spectrum from half-duplex to full-duplex? Is a Moshi-style architecture the only way to approach full-duplex natural voice conversations? What are ways half-duplex systems could imitate full-duplex? Would love to hear others' thoughts on this.   submitted by   /u/Chilly5 [link]   [comments]

📰
HN 100+ points Aggregators Jun 01, 2026
OpenAI frontier models and Codex are now available on AWS

Article URL: https://openai.com/index/openai-frontier-models-and-codex-are-now-available-on-aws/ Comments URL: https://news.ycombinator.com/item?id=48363132 Points: 130 # Comments: 45