• 7 Posts
  • 340 Comments
Joined 7 months ago
cake
Cake day: March 22nd, 2024

help-circle

  • Wouldn’t Turkey or someone sour this?

    But if it’s actually possible, that’s fascinating… if Ukraine can’t push back quickly, wouldn’t it “force” an end to the war? Russia would have a red line it absolutely can’t cross, no hope of advancement, and likely just claim everything on the other side. Surely they wouldn’t continue a grinding stalemate where Ukraine has a “safe zone” to operate out of.

    If Ukraine does retain its ability to push back hard by the time this happens, and doesn’t go for a truce, then that’s especially peculiar. Walling off a part of their territory as actually untouchable seems like a massive strategic advantage for Ukraine.



  • Well, it’s not over.

    This is coming next week. Path is unclear, and its not as big as Helene, but anything near a 930mb in Tampa Bay and plowing over Orlando at 950mb, especially at this angle, is a catastrophe.

    Katrina was 920mb at landfall, and these intensity forecasts have been undershooting hurricanes recently.

    And there’s another low pressure system at the edge of the GFS that I don’t like, taking a similar path to Helene:

    This is what the upcoming hurricane looked like a few days ago.


  • Pretty much everything has an API :P

    ollama is OK because its easy and automated, but you can get higher performance, better vram efficiency, and better samplers from either kobold.cpp or tabbyAPI, with the catch being that more manual configuration is required. But this is good, as it “forces” you to pick and test an optimal config for your system.

    I’d recommend kobold.cpp for very short context (like 6K or less) or if you need to partially offload the model to CPU because your GPU is relatively low VRAM. Use a good IQ quantization (like IQ4_M, for instance).

    Otherwise use TabbyAPI with an exl2 quantization, as it’s generally faster (but GPU only) and much better at long context through its great k/v cache quantization.

    They all have OpenAI APIs, though kobold.cpp also has its own web ui.





  • I have an old Lenovo laptop with an NVIDIA graphics card.

    @[email protected] The biggest question I have for you is what graphics card, but generally speaking this is… less than ideal.

    To answer your question, Open Web UI is the new hotness: https://github.com/open-webui/open-webui

    I personally use exui for a lot of my LLM work, but that’s because I’m an uber minimalist.

    And on your setup, I would host the best model you can on kobold.cpp or the built-in llama.cpp server (just not Ollama) and use Open Web UI as your front end. You can also use llama.cpp to host an embeddings model for RAG, if you wish.

    This is a general ranking of the “best” models for document answering and summarization: https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard

    …But generally, I prefer to not mess with RAG retrieval and just slap the context I want into the LLM myself, and for this, the performance of your machine is kind of critical (depending on just how much “context” you want it to cover). I know this is !selfhosted, but once you get your setup dialed in, you may consider making calls to an API like Groq, Cerebras or whatever, or even renting a Runpod GPU instance if that’s in your time/money budget.




  • I somehow didnt’ get a notification for its post, but thats a terrible idea lol.

    We already have AI horde, and it has nothing to do with blockchain. We also have APIs and GPU services… that have nothing to do with blockchain, and have no need for blockchain.

    Someone apparently already tried the scheme you are describing, and absolutely no one in the wider AI community uses it.



  • I would only use the open source models anyway, but it just seems rather silly from what I can tell.

    I feel like the last few months have been an inflection point, at least for me. Qwen 2.5, and the new Command-R, really make a 24GB GPU feel “dumb, but smart,” useful enough so I pretty much always keep Qwen 32B loaded on the desktop for its sheer utility.

    It’s still in the realm of enthusiast hardware (aka a used 3090), but hopefully that’s about to be shaken up with bitnet and some stuff from AMD/Intel.

    Altman is literally a vampire though, and thankfully I think he’s going to burn OpenAI to the ground.


  • Ideally they would subscribe and then watch a different service.

    Thats so cynical and self defeating. “They’ll use our competition and save us money.” But you’re not wrong, they could totally be thinking that rofl.

    Or maybe it’s a retroactive contract negotiation tactic. Basically negotiate or you won’t get any residuals.

    Very possible. I guess all that is even more behind-the-curtain than cable, as when shows disappear there is no reason given, no “protest” like some channels will do.

    I feel like streaming has made all this stuff even more opaque.