Nvidia delivers first Vera Rubin AI GPU samples to customers — 88-core Vera CPU paired with Rubin GPUs with 288 GB of HBM4 memory apiece

RegularJoe@lemmy.world · 2 months ago

Nvidia delivers first Vera Rubin AI GPU samples to customers — 88-core Vera CPU paired with Rubin GPUs with 288 GB of HBM4 memory apiece

in_my_honest_opinion@piefed.social · edit-2 2 months ago

will never achieve AGI or anything like it

On this we absolutely agree. I’m targeting a more efficient interactive wiki essentially. Something you could package and have it run on local consumer hardware. Similar to this https://codeberg.org/BobbyLLM/llama-conductor but it would be fully transform native and there would only need to be one LLM for interaction with the end user. Everything else would be done in machine code behind the scenes.

I was unclear I guess, I was talking about injecting other models, running their prediction pipeline for the specific topic, and then dropped out of the window to be replaced by another expert. This functionality handled by a larger model that is running the context window. Not nested models, but interchangeable ones dependent on the vector of the tokens. So a qwq RAG trained on python talking to a qwen3 quant4 RAG trained on bash wrapped in deepseekR1 as the natural language output to answer the prompt “How do I best package a python app with uv on a linux server to run a backend for a …”

Currently this type of workflow is often handled with MCP servers from some sort of harness and as I understand it those still use natural language as they are all separate models. But my proposal leverages the stagnation in the field and leverages it as interoperability.