REDDIT·1H AGO·r/StableDiffusion
Obviously I don't want to inject spyware/malware in anything.
On the contrary, I want to protect myself from spyware/malware since AI is the hottest thing now so that's a major potential for abuse.
So going into the shoes of a bad guy, what would they attempt and how can it be avoided?
A typical user downloads ComfyUI, downloads models, then prompts. What weaknesses can a bad guy exploit?
0 pts·5 comments
REDDIT·1H AGO·r/ClaudeAI
Hi everyone,
I’m not a native English speaker and still have some trouble with the language, especially when it comes to speaking fluently in everyday conversation.
After numerous attempts to learn vocabulary, I figured it would be much more efficient to use AI to create a “conversation partner” who could guide me specifically through certain topics and grammar rules. However, I’d like to have a healthy mix of written conversation and verbal exchange. I imagine it working like this: I provide my input, and then it corrects me or gives me tips on which phrasing would be better.
First, a very general question: are there perhaps already projects that have implemented something like this and are freely available on Claude or other systems?
If not: what would be the best way for me to go about setting something like this up? I don’t need a fancy GUI or anything like that; the CLI would almost be enough for written communication.
Thanks!
3 pts·2 comments
REDDIT·2H AGO·r/LocalLLaMA
[https://arxiv.org/pdf/2507.17702](https://arxiv.org/pdf/2507.17702)
15 pts·14 comments
REDDIT·2H AGO·r/StableDiffusion
The way we use wan 2.1 video inpainting workflow to replace character clothes body parts, erase objects out of a shot or even change background based on a reference image, is there a way to do it like the way beeble's switch x is doing it? But with wan right now, the only limitation is, the kept parts don't get the new lighting from the reference image, if you want to change time and lighting, it will only apply maybe to generated body parts or background and not the kept parts. i was always wondering if there is a way to apply the lighting from the reference image also to the kept part with the same likeness and not just the generated parts kind of like how switch x is doing it, it also allows you to create an alpha mask to either keep the face and hands, but it applies the relight even to the kept parts while keeping the exact likeness, and everything matches pixel for pixel, I heard there was a way to use normals but I saw the videos on yt and it doesn't look very good
2 pts·1 comments
REDDIT·2H AGO·r/OpenAI
I'm a dad of two (8 and 10). As soon as my oldest struggled with his homework, he asks me to go on ChatGPT for help. The model serves up the answer, nods at whatever guess he throws, and moves on. Pedagogically, that's the inverse of what a 10-year-old needs.
So I've been building Pebble. It's a voice-first learning companion for kids 6-12, runs on OpenAI under the hood, Carmen-Sandiego-style: the kid steps into an adventure, talks to characters, solves the plot, and the agent is designed to withhold the answer, push them to think, and reward real effort.
OpenAI is what I've landed on for both the pedagogy layer and the image gen, and image gen is where I hit a wall last week. When testing it with my 8-year-old, half-French, obsessed with the Concorde, he asked the agent to draw "the real Concorde." The image came back with five engines. He caught it in two seconds: "there's only four engines. not five. the real life concorde. really existed." He was right. Real Concorde had four Olympus 593 engines, two under each wing.
The wall: when image gen hallucinates a numerical fact, the kid who already knows catches it. The kid who doesn't, absorbs it as truth. For a learning product, that's the inverse of what we want.
Why I'm posting here: how are people grounding multimodal output for factual accuracy when kids are the user? Retrieval-grounded image gen and a fact-lookup pass before generation are the two paths I'm weighing. Neither feels clean yet.
The ask: I'm opening 200 founding family seats, free, to test this with kids. If you're a parent (or a parent-engineer) and want a learning tool built on the opposite philosophy of commercial chat LLMs, sign up [Pebble here](https://withpebble.com/?utm_campaign=openai).
Feedback/questions welcome - thanks!
1 pts·0 comments
REDDIT·2H AGO·r/LocalLLaMA
I'm just looking for some advice on optimally setting up Qwen3.6 27B for OpenCode. The VRAM is a little bit scarce, but I ended up with this so far:
llama-server --model models/Qwen3.6-27B-IQ4_XS.gguf \
--port 8080 \
--host 127.0.0.1 \
--top-p 0.95 \
--top-k 20 \
--min-p 0.0 \
--temperature 0.6 \
--flash-attn on \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--presence-penalty 0.0 \
--repeat-penalty 1.0 \
--ctx-size 65536 \
--chat-template-kwargs '{"preserve_thinking": true}' \
With this my VRAM usage is around 18.6/20 GB. So potentially I could stretch it by about 0.5GB.
Of course there is Qwen3.6 35B that thanks to MoE can fit without KV cache quantization and in Q4\_K\_M or even K\_XL or maybe even Q5, but I don't think for this goal it would be of benefit over 27B.
7 pts·3 comments
HN·2H AGO
I've been scraping 241 UK council planning portals – 2.6M decisions so far<p>UK planning data is technically public. In practice it's locked behind 400+ different council portals, some still running bespoke ASP.NET that looks like it dates from 2004, some behind AWS WAF, all with subtly different schemas. I've spent four months scraping them. I'm now at 241 councils and 2.6 million decisions across England, Scotland and Wales.<p>The scraping problem<p>Most UK councils run one of a handful of portal systems, Idox being the most common. In theory this makes things easy. In practice every council has configured theirs differently, some block non-browser requests via TLS fingerprinting, some have rate limits that will get you banned inside 10 minutes, and a handful are running the aforementioned bespoke ASP.NET.<p>I ended up writing several scrapers: a standard requests-based one, a Playwright-based one for councils that block anything that doesn't look like a real browser, and a curl_cffi one for TLS fingerprinting. Some councils I still can't get. Liverpool's portal sits behind AWS WAF with a JavaScript challenge. I have a working Playwright-based scraper that solves the challenge once and reuses cookies, but the WAF rate-limits the IP after about 10 requests and then blocks me for a day. So I have 60k Liverpool decisions from an old scrape and no easy way to add more.<p>What I found<p>The approval rate stuff is what most people come for. Nationally it's around 88%, but it varies wildly by ward within a council, not just between councils.<p>The more interesting finding came from the time-to-decision data. Across 119 English and Welsh councils, 36.5% of home extension applications missed the statutory 8-week target in 2025, up from 27.9% in 2019. Guildford is the worst at scale: 66% of decisions over target, averaging 13.3 weeks.<p>What it is now<p>A postcode checker (free) and paid PDF reports (£19/£79). Zero paying customers so far, which is fine. I've been heads down on data quality and coverage.<p>Site is planninglens.co.uk if you want to poke around. AMA on the scraping side – that's where the interesting problems are.
21 pts·15 comments
REDDIT·2H AGO·r/OpenAI
here are some of the best images GPT Image 2 has produced from my prompts. let me know what you think.
17 pts·10 comments
REDDIT·2H AGO·r/OpenAI
I know OpenAI needs to make money,
but still leaving people with the 5.3 version is kinda shitty move.
If they meaningfully upgrade the instant model and bump the version to 5.5, that'd be great.
0 pts·12 comments
REDDIT·2H AGO·r/ClaudeAI
It's a recurring pattern that my Claude Code agent tends to take the shortcut solution in lieu of the right-but-more-work solution repeatedly. I tried to build my command into a skill, then it becomes now I set /loop 30m please apply /take-no-shortcut skill in coding. It's funny to see agents are such a slacker.
6 pts·12 comments
X·2H AGO·@
Keep your eyes on the road buddy <br><br>but if you're using Claude code, you might just hit a rate limit before you hit the curb. <br><br>😂😂😂<div class="rsshub-quote"><br><br>arjun chintapalli 🌐👾: @jxnlco 1 up this bro<br><br>https://x.com/i/status/2042868559068696622<br></div>
0 pts·0 comments
X·2H AGO·@
I had gpt-5.5 review something I vibe-coded with 5.2-5.4 and, I swear this is not staged, it said:<br><img width="490" height="156" style="" src="https://pbs.twimg.com/media/HG-YE8VbUAAVMJh?format=png&amp;name=orig" referrerpolicy="no-referrer">
0 pts·0 comments
REDDIT·2H AGO·r/OpenAI
Just want to get into agents a bit more. How do you create and deploy agens? Can i use Visual Studio Code? Need the Codex app?
Using Codex in VS Code but not multiple agents.
Just want to get in to it more.
1 pts·1 comments
REDDIT·3H AGO·r/singularity
78 pts·13 comments
REDDIT·3H AGO·r/LocalLLaMA
The gemma 4 E4B and E2B models have built-in multimodal capabilities. However, as far as I am aware, llama.cpp does not have proper support for vision and audio inputs (specially audio) for these models as of now.
I was able to extract the audio encoder from the official model repository on huggingface, and vibe-code a bridge that passes on the embeddings of the audio directly to the model, and it actually works as well. This system uses the Unsloth's GGUF version at Q4 and the audio encoder at full precision (pytorch), and takes up about 5.5-6GB VRAM.
The thing is that this entire thing feels like a workaround for what should be readily available, and built in a more robust way, and not vibe-coded by someone like me.
Maybe I am just unaware, but I am looking for a more complete and non-hacky way of using the model's multimodal capabilities under 6GB VRAM. So if anyone can guide me with this please it would be awesome!
P.s : I tried mistral.rs but for multimodal capabilities I guess it takes a lot of extra VRAM for some reason?
4 pts·4 comments
REDDIT·3H AGO·r/OpenAI
1 pts·1 comments
HN·3H AGO
42 pts·31 comments
HN·3H AGO
134 pts·41 comments
REDDIT·3H AGO·r/OpenAI
Please explain in simple terms, thank you.
3 pts·9 comments
X·3H AGO·@
An important message from @thsottiaux<br><video width="1638" height="2048" src="https://video.twimg.com/amplify_video/2049009005566296064/vid/avc1/2160x2700/hg7qSeqXJnRNQN17.mp4?tag=21" controls="controls" poster="https://pbs.twimg.com/amplify_video_thumb/2049009005566296064/img/1lnS4j8HCGtGxnbx.jpg"></video>
0 pts·0 comments
X·3H AGO·@
When you gotta bike home from work but your codex needs to finish a task.<br><video width="1080" height="1920" src="https://video.twimg.com/amplify_video/2049006136221999104/vid/avc1/1080x1920/rVJTpx2TCiJ98Ao2.mp4?tag=21" controls="controls" poster="https://pbs.twimg.com/amplify_video_thumb/2049006136221999104/img/Sr-yV2XLAw3mufKW.jpg"></video>
0 pts·0 comments
REDDIT·4H AGO·r/LocalLLaMA
I recently contributed an experimental HFQ4-G256 MMQ prefill path to hipfire, an RDNA-focused LLM inference engine.
**Disclaimer: I authored the PR, so this is partly a contribution note, but I am mainly looking for independent validation from other AMD users.**
Before this PR, HFQ4 prefill in hipfire was going through a more generic/slower path. On my Strix Halo system, prompt processing was clearly the bottleneck: longer prefills were around \~310–340 tok/s.
The new path adds an opt-in MMQ-style prefill implementation. In this context, MMQ means a specialized quantized matrix-multiplication path: instead of treating prefill like a less optimized sequence of operations, it packs the work into tiled matrix-matrix kernels that are better suited for GPU execution. The implementation pre-quantizes prefill activations into a Q8\_1 MMQ layout and uses i8 WMMA over 128×128 output/batch tiles with LDS staging.
After enabling it with:
`HIPFIRE_MMQ=1`
I see longer-prefill throughput around **\~1140–1260 tok/s** on Strix Halo / `gfx1151`.
What changed:
* Adds an opt-in `HIPFIRE_MMQ=1` path for HFQ4-G256 prefill.
* Targets RDNA3 / RDNA3.5 for now: `gfx1100`, `gfx1101`, `gfx1102`, `gfx1103`, `gfx1150`, `gfx1151`.
* Pre-quantizes prefill activations into a Q8\_1 MMQ layout.
* Uses i8 WMMA over 128×128 output/batch tiles with LDS staging.
* Similar in shape to llama.cpp’s AMD MMQ prompt-processing path.
* Not enabled by default.
Benchmark: Qwen3.5 9B HFQ4/MQ4 on Strix Halo / `gfx1151`
|KV mode|pp|MMQ off, tok/s|MMQ on, tok/s|Speedup|
|:-|:-|:-|:-|:-|
|q8|256|363.1|1127.6|3.11x|
|q8|512|352.0|1179.8|3.35x|
|q8|1024|328.9|1222.7|3.72x|
|q8|2048|318.2|1168.5|3.67x|
|asym4|256|368.6|1108.8|3.01x|
|asym4|512|360.7|1173.3|3.25x|
|asym4|1024|333.9|1223.0|3.66x|
|asym4|2048|312.3|1151.7|3.69x|
|asym3|256|361.4|1124.5|3.11x|
|asym3|512|359.8|1187.3|3.30x|
|asym3|1024|329.9|1259.1|3.82x|
|asym3|2048|314.1|1216.5|3.87x|
|asym2|256|374.0|1116.2|2.98x|
|asym2|512|356.6|1173.2|3.29x|
|asym2|1024|340.1|1208.5|3.55x|
|asym2|2048|311.4|1142.9|3.67x|
So on longer prefills, this moved my Strix Halo results from roughly \~311–340 tok/s to \~1143–1259 tok/s.
Correctness validation so far:
* batched prefill compared against sequential token-by-token forward pass
* final prefill top token match
* selected-logit drift within tolerance
* next decode step after prefill also checked, to catch KV-cache write problems
* tested across `q8`, `asym4`, `asym3`, `asym2` KV modes
**Caveats:**
* validated by me mainly on one Strix Halo / `gfx1151` system
* the path is experimental
* it is not enabled by default
* I would not call this the final/canonical MMQ implementation yet
* more coherence and long-context testing would be useful
The maintainer also tested the merged path on `gfx1100` and reported that `HIPFIRE_MMQ=1` runs cleanly there, with a smaller but still positive result: +19.8% on 4B pp256.
What I would especially like to check now is whether this implementation generalizes well across other AMD GPUs and APUs, or whether the current tuning is mostly favorable to Strix Halo / `gfx1151`.
The basic correctness checks pass, but I am not yet fully confident that the KV-cache behavior is completely bulletproof. Subtle KV-cache issues might only appear in longer real workloads, so I would especially appreciate validation on long-context and multi-turn runs.
I would be very interested in results from people with:
* 7900 XTX / `gfx1100`
* other RDNA3 cards
* Strix Halo / `gfx1151`
* RDNA3.5 APUs
* and more
* long-context agentic workloads where prefill matters more than short chat decode
PR: [https://github.com/Kaden-Schutt/hipfire/pull/73](https://github.com/Kaden-Schutt/hipfire/pull/73)
25 pts·10 comments
REDDIT·4H AGO·r/LocalLLaMA
159 pts·47 comments
REDDIT·4H AGO·r/singularity
53 pts·7 comments
X·4H AGO·@
RT Tibo<br>Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.<br><br>IYKYK
0 pts·0 comments
REDDIT·5H AGO·r/StableDiffusion
This is available in the SenseNova release ---
[https://huggingface.co/sensenova/SenseNova-U1-8B-MoT](https://huggingface.co/sensenova/SenseNova-U1-8B-MoT)
And I have to say I am quite excited to see that Z Image Edit is doing soo well as well. Just waiting for that team to open source hte Z Image Edit.
Any news on this ?
Also how does it compare to Flux Klein which is currently the best Image Edit model we are using.
18 pts·8 comments
REDDIT·5H AGO·r/MachineLearning
elementary-data (used in data pipelines feeding ML systems) was compromised via a GitHub Actions flaw that allowed a forged PyPI release. The malicious version used a .pth file to execute code automatically on Python startup—no import needed. Any environment with unpinned dependencies or latest pulls was exposed, highlighting supply chain risk in MLOps stacks.
More info: https://thecybersecguru.com/news/elementary-data-pypi-hack-infostealer/
2 pts·0 comments
REDDIT·5H AGO·r/LocalLLaMA
Trying to find the sweet-spot to tradeoff between power and tg/s.
250W seems to be a sweet spot for Qwen3.6-27B.
It's interesting that I got higher tg/s at 275W for 1 concurrent request
VLLM-server-config from [tedivm](https://github.com/tedivm/qwen36-27b-docker#server-flags)
```
vllm serve /models/Qwen3.6-27B-int4-AutoRound --tensor-parallel-size 2 --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder --gpu-memory-utilization 0.85 --served-model-name Qwen3.6-27B-int4-AutoRound --host 0.0.0.0 --port 8000 --enable-prefix-caching --speculative-config '{"method": "mtp", "num_speculative_tokens": 3}' --max-num-seqs 8 --quantization auto_round --kv-cache-dtype fp8 --enable-chunked-prefill --max-num-batched-tokens 4128 --disable-custom-all-reduce
```
Benchmark-cmd
```
vllm bench serve --backend openai --dataset-name sharegpt --max-concurrency 1 --num-prompts 100 --base-url http://192.168.254.10:8000 --tokenizer Lorbus/Qwen3.6-27B-int4-AutoRound --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json --seed 777
```
12 pts·9 comments
REDDIT·5H AGO·r/OpenAI
267 pts·341 comments
HN·5H AGO
21 pts·3 comments
REDDIT·5H AGO·r/singularity
Can AI Chatbots become like Miss Minutes from the Loki TV Series?
https://preview.redd.it/i12jcgakzuxg1.png?width=2006&format=png&auto=webp&s=5d26071875ea6aa28d798652ce4b1052a2d43917
0 pts·1 comments
X·5H AGO·@
Foul, only a man could send me something so cruel.<br><img width="1206" height="1811" style="" src="https://pbs.twimg.com/media/HG9wQMAaIAAO19n?format=jpg&amp;name=orig" referrerpolicy="no-referrer">
0 pts·0 comments
REDDIT·5H AGO·r/StableDiffusion
GitHub Link: https://github.com/OpenSenseNova/SenseNova-U1
Huggingface Repo: https://huggingface.co/sensenova/SenseNova-U1-8B-MoT
107 pts·31 comments
REDDIT·6H AGO·r/LocalLLaMA
I think gave it a fair shot over the past few weeks, forcing myself to use local models for non-work tech asks. I use Claude Code at my job so that's what I'm comparing to.
I used Qwen 27B and Gemma 4 31B, these are considered the best local models under the multi-hundred LLMs. I also tried multiple agentic apps. My verdict is that the loss of productivity is not worth it the advantages.
I'll give a brief overview of my main issues.
**Shitty decision-making and tool-calls**
This is a big one. Claude seems to read my mind in most cases, but Qwen 27B makes me give it the Carlo Ancelotti eyebrow more often than not. The LLM just isn't proceeding how I would proceed.
I was mainly using local LLMs for OS/Docker tasks. Is this considered much harder than coding or something?
To give an example, tasks like *"Here's a Github repo, I want you to Dockerize it."* I'd expect any dummy to follow the README's instructions and execute them. (EDIT: full prompt here: https://reddit.com/r/LocalLLaMA/comments/1sxqa2c/im_done_with_using_local_llms_for_coding/oiowcxe/ )
Issues like having a 'docker build' that takes longer than the default timeout, which sends them on unrelated follow-ups (as if the task failed), instead of checking if it's still running. I had Qwen try to repeat the installation commands on the host (also Ubuntu) to see what happens. It started assuming "it must have failed because of torchcodec" just like that, pulling this entirely out of its ass, instead of checking output.
I tried to meet the models half-way. Having this in AGENTS.md: *"If you run a Docker build command, or any other command that you think will have a lot of debug output, then do the following: 1. run it in a subagent, so we don't pollute the main context, 2. pipe the output to a temporary file, so we can refer to it later using tail and grep."* And yet twice in a row I came back to a broken session with 250k input tokens because the LLM is reading all the output of 'docker build' or 'docker compose up'.
I know there's huge AGENTS.md that treat the LLM like a programmable robot, giving it long elaborate protocols because they don't expect to have decent self-guidance, I didn't try those tbh. And tbh none of them go into details like not reading the output of 'docker build'. I stuck to the default prompts of the agentic apps I used, + a few guidelines in my AGENTS.md.
**Performance**
Not only are the LLMs slow, but no matter which app I'm using, the prompt cache frequently seems to break. Translation: long pauses where nothing seems to happen.
For Claude Code specifically, this is made worse by the fact that it doesn't print the LLM's output to the user. It's one of the reasons I often preferred Qwen Code. It's very frustrating when not only is the outcome looking bad, but I'm not getting rapid feedback.
**I'm not learning anything**
Other than changing the URL of the Chat Completions server, there's no difference between using a local LLM and a cloud one, just more grief.
There's definitely experienced to be gained learning how to prompt an LLM. But I think coding tasks are just too hard for the small ones, it's like playing a game on Hardcore. I'm looking for a sweetspot in learning curve and this is just not worth it.
**What now**
For my coding and OS stuff, I'm gonna put some money on OpenRouter and exclusively use big boys like Kimi. If one model pisses me off, move on to the next one. If I find a favorite, I'll sign up to its yearly plan to save money.
I'll still use small local models for automation, basic research, and language tasks. I've had fun writing basic automation skills/bots that run stuff on my PC, and these will always be useful.
I also love using local LLMs for writing or text games. Speed isn't an issue there, the prompt cache's always being hit. Technically you could also use a cloud model for this too, but you'd be paying out the ass because after a while each new turn is sending like 100k tokens.
Thanks for reading my blog.
328 pts·329 comments
REDDIT·6H AGO·r/ClaudeAI
37 pts·6 comments
REDDIT·6H AGO·r/singularity
The Crowded Interior Of A Cell:
It displays a bustling metropolis of cellular components, including mitochondria (left), the nucleus (bottom), and a complex cytoskeleton.
Model synthesizes real data from x-ray crystallography, NMR, and cryo-electron microscopy.
Artist/creator: developed by scientific animator Evan Ingersoll and Gael McGill at Digizyme, inspired by the work of David Goodsell.
(Re-upload as the original cross post was deleted)
179 pts·35 comments
REDDIT·6H AGO·r/ClaudeAI
I've been using Claude basically since it launched, and use Claude Code extensively (Swift, C++, Shaders, TS, AWS, etc)...
Maybe this is just tech twitter / LinkedIn garbage, but how on earth are people using so many tokens...
I use maybe \~20M tokens per month, with multiple sessions per day, across my 3-4 code bases. I'm very explicit with what I want, and take the time to think through the architecture, code styling, etc. I make use of Claude md heavily for code style, rules, etc.
I have about 12 years of software engineering experience, and Claude certainly makes me 10x more productive... No doubt.
However, even still, I cannot understand what on earth people are building where you're into the hundreds of millions or billions of tokens. Is this just extreme outliers, or am I the crazy one?
Like how many tokens do you need to use per month?????
100 pts·96 comments
REDDIT·6H AGO·r/singularity
Just a 3d representation of my DYI Home Assistant agents memory. Graph visuals can be changed. Thought it was cool. Not a lot of memories yet, but I’ve tested with larger setups I have it starts to look epic. Especially if you used the same memory setup with your code projects.
Here’s the repo incase anyone else gets inspired to see their agents memory and connections. I use Neo4j/Graphiti to manage memory it’s a “temporal knowledge graph setup”
Also this is just a phone video, it’s not quite mobile friendly yet.
https://github.com/cl0ud6uru/HestiaClaw
0 pts·3 comments
REDDIT·6H AGO·r/ClaudeAI
43 pts·5 comments
HN·6H AGO
26 pts·2 comments
REDDIT·6H AGO·r/LocalLLaMA
MIT license and fully open source. MiMo-V2.5-Pro was just 3 points from Opus 4.7 max and the normal V2.5 is only a step behind SOTA. But both produce 75% and 68% non-hallucination rate. Best intel/hallucination model yet.
V2.5 FP8 is like 316GB, you \*might\* be able to run a tight 3 bit quant with 128gb m5 max.
From Gemma to Qwen3.6 to Kimi2.6 to Deepseek v4 to MiMo2.5, this probably is the best April.
https://preview.redd.it/fvurbt2ekuxg1.png?width=1076&format=png&auto=webp&s=a62fa83e39d723a7e31c505e516f18074c90a186
https://preview.redd.it/s1vygazekuxg1.png?width=2093&format=png&auto=webp&s=51924f7a0bca951190395ee0d12405f6f1dc7089
17 pts·8 comments
X·7H AGO·@
in app browser is all you need<br><br>1. remotion<br>2. annotation tool<br>3. gpt 5.5
0 pts·0 comments
REDDIT·7H AGO·r/singularity
AI researchers (Nick Levine, David Duvenaud, Alec Radford) just released “talkie,” a 13B language model trained on 260B tokens of text from before 1931, so it basically talks like someone whose worldview is stuck around 1930. The point is to study how LLMs actually generalize vs just memorize, since this model wasn’t trained on the modern web. They trained it on old books, newspapers, scientific journals, patents, and other historical text, then test things like whether it can come up with ideas that were discovered later, forecast future events, or learn bits of Python from examples. Early results seem pretty interesting too, with the model doing surprisingly well on core language/numeracy tasks and showing early signs of learning simple Python despite not being pretrained on modern code.
951 pts·193 comments
X·7H AGO·@
RT Tibo<br>We will ship again this week. Codex has achieved escape velocity and will keep improving rapidly.
0 pts·0 comments
HN·7H AGO
Give your user a game while they wait for the LLM to return a result.
7 pts·4 comments
REDDIT·7H AGO·r/OpenAI
9 pts·3 comments
REDDIT·7H AGO·r/LocalLLaMA
How are you finding these models, which one do you find to be better for real use cases?
So far we're finding Kimi k2.6 better for coding, but want to hear your thoughts.
39 pts·19 comments
X·8H AGO·@
RT Romain Huet<br>The Realtime API is fast.<br><br>This demo by @jxnlco is running at real speed! No computer use, just realtime speech + function calling.<br><br>The model is listening, deciding when to call functions to update state, and yes, it doesn’t always have to talk back!<div class="rsshub-quote"><br><br>OpenAI Developers: You can build interactive applications with gpt-realtime-1.5, so users can control app state more naturally with voice.<br><br>Hi Chappy 👋<br><br><video width="1920" height="1080" src="https://video.twimg.com/amplify_video/2048871206531776512/vid/avc1/1280x720/YYYY-TbSIqeIw3MZ.mp4?tag=14" controls="controls" poster="https://pbs.twimg.com/media/HG8N5IyaoAAghvU?format=jpg&amp;name=orig"></video></div>
0 pts·0 comments
REDDIT·8H AGO·r/StableDiffusion
Has anyone heard/seen anything re: what may be next for Black Forest Labs? Not to be greedy, but they've been such a great open source friend, I was curious if they had anything in the works to complement their already great models?
10 pts·19 comments
REDDIT·8H AGO·r/LocalLLaMA
Built this because I wanted to hear what my coding agent was doing without (a) sending agent output to a third party or (b) staring at a terminal all day.
It's a small Python daemon + macOS app that hooks into Claude Code, Codex, or anything via \`heard run <command>\`, and speaks intermediate output as it streams. Tool calls, status lines, failures — not just final summaries.
\- Default backend is Kokoro, runs on-device, no key, no network calls for TTS
\- Optional ElevenLabs if you want the premium voices
\- Optional Anthropic key for in-character persona rewrites via Haiku — skip it and you get
neutral local templates
\- Zero telemetry. No analytics, no crash reporters, no phone-home. Verifiable in source.
\- Apache 2.0
Would love feedback as much as possible, DM or PR!
Repo: [https://github.com/heardlabs/heard](https://github.com/heardlabs/heard)
4 pts·0 comments