Facebooks LLaMA leaks via torrent file in PR

https://github.com/facebookresearch/llama/pull/73/files

finally, we can have an actually open chatbot.

  1. 4 weeks ago
    Anonymous

    I bet even chai is much better.

    • 4 weeks ago
      Anonymous

      nakadashi

  2. 4 weeks ago
    Anonymous

    poor anon who left his personal signed download url in the torrent for no reason

    • 4 weeks ago
      Anonymous

      What does this mean, explain it like I'm retarded.

      • 4 weeks ago
        Anonymous

        >What does this mean, explain it like I'm retarded.
        Learn to use AI, retard.

        • 4 weeks ago
          Anonymous

          As usual, ChatGPT is mostly right but also gets it wrong.
          >at risk of having their personal information or data compromised
          Wrong, that's not the risk at all. The concern is that Facebook could use the link to trace the leak back to llamanon.

    • 4 weeks ago
      Anonymous

      His sacrifice will be remembered every time we talk to our AI waifus, what a hero.

  3. 4 weeks ago
    Anonymous

    wouldn't you be able to download this regardless?

    • 4 weeks ago
      Anonymous

      You have to apply with Facebook as an AI researcher for them to give you a personal download link. A hero named Llamanon leaked it for us.

      >via torrent file in PR
      It leaked from here [...]
      [...]
      It means the guy who leaked the models and made the torrent can be identified by facebook, because he included the download script containing a personalized download url in the torrent even though the script didn't need to be in there at all
      [...]
      No. The largest model is only available if you request access to it from facebook and they grant it to you (which they only do if you're a legitimate researcher, not just because you want access). When they approve your request they send you a personally signed download url where you can download the model from.

      >The largest model is only available if you request access to it from facebook and they grant it to you (which they only do if you're a legitimate researcher, not just because you want access)
      That was OPT, which he also has access to but hasn't leaked yet since it's very large and of dubious value to us because of its hardware requirements. None of the LLaMA models were made public by Facebook.

      • 4 weeks ago
        Anonymous

        >via torrent file in PR
        It leaked from here [...]
        [...]
        It means the guy who leaked the models and made the torrent can be identified by facebook, because he included the download script containing a personalized download url in the torrent even though the script didn't need to be in there at all
        [...]
        No. The largest model is only available if you request access to it from facebook and they grant it to you (which they only do if you're a legitimate researcher, not just because you want access). When they approve your request they send you a personally signed download url where you can download the model from.

        poor anon who left his personal signed download url in the torrent for no reason

        >he doesn't know it was a false flag from a poor boomer's account
        good OPSEC looks like bad OPESEC

        • 4 weeks ago
          Anonymous

          Ok Robert

    • 4 weeks ago
      Anonymous

      >via torrent file in PR
      It leaked from here [...]
      [...]
      It means the guy who leaked the models and made the torrent can be identified by facebook, because he included the download script containing a personalized download url in the torrent even though the script didn't need to be in there at all
      [...]
      No. The largest model is only available if you request access to it from facebook and they grant it to you (which they only do if you're a legitimate researcher, not just because you want access). When they approve your request they send you a personally signed download url where you can download the model from.

      You have to apply with Facebook as an AI researcher for them to give you a personal download link. A hero named Llamanon leaked it for us.

      [...]
      >The largest model is only available if you request access to it from facebook and they grant it to you (which they only do if you're a legitimate researcher, not just because you want access)
      That was OPT, which he also has access to but hasn't leaked yet since it's very large and of dubious value to us because of its hardware requirements. None of the LLaMA models were made public by Facebook.

      Facebook gave access to literally anyone who applied with an .edu email address, and most others as well. It was basically unrestricted.

  4. 4 weeks ago
    Anonymous

    >Facebooks LLaMA leaks via torrent file in PR
    Are you retarded? It leaked via /aicg/ and then an anon filed a PR.

  5. 4 weeks ago
    Anonymous

    >via torrent file in PR
    It leaked from here

    [...]

    What does this mean, explain it like I'm retarded.

    It means the guy who leaked the models and made the torrent can be identified by facebook, because he included the download script containing a personalized download url in the torrent even though the script didn't need to be in there at all

    wouldn't you be able to download this regardless?

    No. The largest model is only available if you request access to it from facebook and they grant it to you (which they only do if you're a legitimate researcher, not just because you want access). When they approve your request they send you a personally signed download url where you can download the model from.

    • 4 weeks ago
      Anonymous

      Christ all mighty

    • 4 weeks ago
      Anonymous

      Which personalized download url? How would fb use it to reverse look up? Or can we look him up on linkedin as well

      • 4 weeks ago
        Anonymous

        >Which personalized download url?
        The one in llama.sh in the torrent, on the line that starts with PRESIGNED_URL=

    • 4 weeks ago
      Anonymous

      I am not a swe so dumb question: isn't his pull request on the official facebook open source repository? Wouldn't this be super easy to track down or am I missing something

      > https://github.com/facebookresearch/llama/pull/87

      • 4 weeks ago
        Anonymous

        The pull requester isn't the leaker. That person is just a memer. The leaker is easy to track though, since they included their download script in the torrent and it includes a unique download ID from an email facebook sent to them.

        The original leak was done in this thread:

        [...]

        • 4 weeks ago
          Anonymous

          Is it the pre signed URL here?

          >>

          [...]

          • 4 weeks ago
            Anonymous

            Yes, that's the personalized url that the leaker accidentally left in the torrent.

            • 4 weeks ago
              Anonymous

              >Which personalized download url?
              The one in llama.sh in the torrent, on the line that starts with PRESIGNED_URL=

              Thanks, thinking of reporting this fucking homosexual internally ( unironically )

              • 4 weeks ago
                Anonymous

                Wow, go fuck yourself you little bitch. It's not like that would do anything anyway, and it's not like Meta isn't already aware.

                Why didn't he take it down? He seems to have read the post warning him. Was he retarded

                >take down
                >a torrent
                u wot

            • 4 weeks ago
              Anonymous

              Why didn't he take it down? He seems to have read the post warning him. Was he retarded

              • 4 weeks ago
                Anonymous

                Once it's in torrent, it's too late

        • 4 weeks ago
          Anonymous

          Does facebook look at all PRs before they ship? Cant they just look at the pre signed url themselves.. Why don't I see any fb engineers commenting on that meme PR

          • 4 weeks ago
            Anonymous

            >Does facebook look at all PRs before they ship?
            The PR hasn't shipped, it's just been "requested." Anyone can make a request.

            >Cant they just look at the pre signed url themselves.

            The URL is in a file in the torrent not on GitHub. Yes they can look at it themselves if they download the torrent.

            >Why don't I see any fb engineers commenting on that meme PR

            Probably because they don't want to get fired or be in the news prior to getting fired? The pull request will just sit there forever as pending.

    • 4 weeks ago
      Anonymous

      every copy of LLaMa is personalized

      • 4 weeks ago
        Anonymous

        No it's not. This has been confirmed several times.

        • 4 weeks ago
          Anonymous

          Show me one instance this is the case please.

          • 4 weeks ago
            Anonymous

            This thread where multiple people shared SHA256 checksums of their weights files. See also Twitter where people shared SHA512 checksums of their model weights to confirm that they are all identical.

            [...]

      • 4 weeks ago
        Anonymous

        >Every copy of llama is personalized. I'll ever throw in a set of ginzu knives. Limited time offer.

      • 4 weeks ago
        Anonymous

        Show me one instance this is the case please.

        Fuck you. Why would I provide information to a sniveling whelp who makes shit up out of nothing and then presents it as fact? If you had just asked whether or not every copy of LLaMa was personalized from the start, I'd have been happy to answer you and provide you with evidence. But instead you asserted that every copy *was* personalized, even though you had no idea if that was true or not. You're homosexual scum who just goes around spreading misinformation and trying to start shit. People like you should be culled.

        • 4 weeks ago
          Anonymous

          you got nothing and you type like a bot. ya, i'm right lmao.

  6. 4 weeks ago
    Anonymous

    let's fucking goo

  7. 4 weeks ago
    Anonymous

    > we can have an actually open chatbot.
    > minimal model is 7B
    > responses quality from "hello!" to "bro stfu"
    kikebook cant exist without shitting itself.
    also computeletbros still btfo, impossible to launch on anything weaker than 3090, and no, 20 tokens per minute and worse accuracy is not worth it.

    • 4 weeks ago
      Anonymous

      The 7B model is better than GPT-NeoX or OPT or any of the other foundational LLMs we actually have access to. It's also possible to run on 16 GB of VRAM; you can use it on colab with batch size 1.

      Anyone got this working with FlexGen yet?
      >Save us Auto1111 you are our only hope.

      No, and it won't be.

    • 4 weeks ago
      Anonymous

      >also computeletbros still btfo, impossible to launch on anything weaker than 3090
      Guaranteed this will be one of the first areas targeted for improvement. You'll be able to run it on an old 1080ti within 3 months

  8. 4 weeks ago
    Anonymous

    Anyone got this working with FlexGen yet?
    >Save us Auto1111 you are our only hope.

    • 4 weeks ago
      Anonymous

      oobabooga aims to be the Auto1111 of LLMs. https://github.com/oobabooga/text-generation-webui

      Since LLaMA is based on OPT it should be easy to get it working in FlexGen. In fact, it might work already if you just tell FlexGen it's OPT. I'm not sure. Give it a shot.

  9. 4 weeks ago
    Anonymous

    YOU IDIOTS. YOU LET IT OUT OF THE BOX.

    EARTH HAD ONE WINNING PLAY. AND YOU MORONS BLEW IT

    • 4 weeks ago
      Anonymous

      Didn't expect a yudkowski meme here

      • 4 weeks ago
        Anonymous
  10. 4 weeks ago
    Anonymous

    What kind of hardware would you even need to run their model?

    • 4 weeks ago
      Anonymous

      No idea. I'm going to buy 3090 is a week or two and hope for the best.

    • 4 weeks ago
      Anonymous

      a bunch of A100's, soon to be sold with loicense only.
      https://desuarchive.org/g/thread/91505083/
      Other methods for optimisation is shit, plus LLaMA uses it's own inference code, can't just drop it inside colab kobold instance.

  11. 4 weeks ago
    Anonymous

    have any anons tried it out and see if it is pozzed

    • 4 weeks ago
      Anonymous

      Hashes to verify the files:

      https://github.com/facebookresearch/llama/pull/87

  12. 4 weeks ago
    Anonymous

    So when will normalfags like me be able to use it.

    • 4 weeks ago
      Anonymous

      this. i just want to press play in a colab and make her talk smut

  13. 4 weeks ago
    Anonymous

    There is a way to run these huge models in chunks on commodity hardware but I don't know if there is already a way to run these specific models

    • 4 weeks ago
      Anonymous

      Anyone got this working with FlexGen yet?
      >Save us Auto1111 you are our only hope.

  14. 4 weeks ago
    Anonymous

    do these models already have all the tricks to make models smaller? quantization, etc...

    • 4 weeks ago
      Anonymous

      Seen on a comment talking about possible ways OpenAI optimized gpt-3.5-turbo
      >Quantizing to mixed int8/int4 - 70% hardware reduction and 3x speed increase compared to float16 with essentially no loss in quality.
      >A*.3/3 = 10% of the cost.
      >Switch from quadratic to memory efficient attention. 10x-20x increase in batch size.
      Any of these possible with this model?

      So is there an int8/int4 version or not?

      • 4 weeks ago
        Anonymous

        https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454798725

        • 4 weeks ago
          Anonymous

          >https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454798725
          Damn that's better than the gpt-neo model I'm using for my self hosted assistant.
          Thanks anon I'll have to give it a spin.

  15. 4 weeks ago
    Anonymous

    llama is horny:
    'im a little dirty girl, take me, panties down hard, let my girly pussy babbles for you, let my tits spit a thick juice.
    Nasty rich girl for a naughty boy, with expensive cock.'

    • 4 weeks ago
      Anonymous

      what hardware u running it on

  16. 4 weeks ago
    Anonymous

    so anyone have it running locally

    • 4 weeks ago
      Anonymous

      seconded

      the prior facebook models people did get working locally on normal computers and the code is on github (i forgot the repo)

      • 4 weeks ago
        Anonymous

        so anyone have it running locally

        https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1453880733

        • 4 weeks ago
          Anonymous

          thanks

    • 4 weeks ago
      Anonymous

      Yes. See pic related.

      Sidenote: I don't actually have anything against israeli people. I just wanted to verify that the model wasn't kneecapped. I promise to only generate nice things from now on.

      Installed miniconda then this WebUI https://github.com/oobabooga/text-generation-webui/

      Then followed these instructions: https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1453880733

      • 4 weeks ago
        Anonymous

        memory usage? generation time? token size? CPU usage/stats?
        not trying to fingerprint you, just curious how this stacks up to t5

        • 4 weeks ago
          Anonymous

          i have it running a 3090 fine. takes like 5s for 250 tokens.

          output is shit compared to chatgpt tbh, even using basic prompts. like "Rene Descartes theory of the mind" totally inaccurate. feels like bad markov

          • 4 weeks ago
            Anonymous

            that's fine. fwiw i can get similar speeds out of my racism-tuned 125m model if I want bad markov, but thanks for letting me know!

            • 4 weeks ago
              Anonymous

              That was literally my first attempt with the smallest (7B) model without fiddling with the settings at all.

              According to the test results the larger three models are all superior to GPT-3 175B, with the largest two being far superior than anything publicly available.

              I haven't finished downloading the larger ones yet. But I'm sure they'll perform better than your shitty 125m model.

            • 4 weeks ago
              Anonymous

              >racism-tuned 125m model
              LOL'd. Thanks anon.

          • 4 weeks ago
            Anonymous

            Try messing with the temperature and top_p parameters to see if the outputs get better.

          • 4 weeks ago
            Anonymous

            Which one? The biggest model?

            • 4 weeks ago
              Anonymous

              7B

              • 4 weeks ago
                Anonymous

                Ah, that's unfortunate since 7B is likely not comparable to the biggest one.

              • 4 weeks ago
                Anonymous

                The second smallest one (13B) "outperforms GPT-3 175B on most benchmarks" and has about the same inference speed (~50 tokens/words per second) on two 3090s.

                The only thing "comparable" to the biggest one (65B) is Google's private PaLM 540B, which needs an entire server rack of $16k TPUs to run, or maybe GPT-4 which is so ungodly costly to run OpenAI doesn't even offer it to customers yet.

        • 4 weeks ago
          Anonymous

          Running LLaMA-7B on a single 3090 using 14gb of VRAM.

          300 tokens (around 300 words) in 6 seconds.

          I'm redoing my config for CPU right now to run 13B (slightly better performance than GPT-3.5 175B in quality tests) in RAM. It's only 25GB.

          The two larger models score better than any models with any hardware requirements currently available to the public via open source or API.

          But I only have 32GB of RAM, so I can't run them on my CPU. They're 60GB and 120GB (so 64GB and 128GB of RAM required).

      • 4 weeks ago
        Anonymous

        holy shit those jokes are bad, and by that I mean in a sensibility way, not the fact that they aren't PC.

        >Assumes character traits can be applied to bank accounts
        >Thinks israelites think they are gentiles, which wouldn't be a difference since gentiles think they are gentilest oo
        >misunderstands how kryptonite works
        >Rest of jokes are just general descriptions

        This feels like a slightly worse version of one of the mid range GPT models from 2 years ago. Which model are you running?

      • 4 weeks ago
        Anonymous

        >Sidenote: I don't actually have anything against israeli people
        thanks for clarifying naggerhomosexual

      • 4 weeks ago
        Anonymous

        It really leaned hard into the 'israelites are pests' thing.

  17. 4 weeks ago
    Anonymous
  18. 4 weeks ago
    Anonymous

    based

  19. 4 weeks ago
    Anonymous

    >235GB
    nah bruh

    • 4 weeks ago
      Anonymous

      235GB is all the models. You only need ONE.

      The models are:
      ~13GB* (Single 16GB GPU or CPU + RAM)
      ~25GB** (2 GPUs or CPU + RAM)
      ~60GB*** (4 GPUs or CPU + RAM)
      ~120GB*** (x TPUs or CPU + RAM)

      *Performs similar to GPT-3 175B, infinitely better than any other model capable of running on consumer hardware.

      **Performs slightly better than GPT-3 175B

      ***Performs better than any open source or publicly available LLMs, of any size, with any hardware requirements.

      • 4 weeks ago
        Anonymous

        afaik there is no code for CPU offloading yet.
        Once huggingface adds the model, it will be possible to use --load-in-8bit to load 13b in a 24gb gpu. Also, more generation parameters will be available like repetition penalty.

        • 4 weeks ago
          Anonymous

          its all just matrix multiplication isnt it? shouldn't be too hard to write a optimized bit of assembly to do it acceptably fast

      • 4 weeks ago
        Anonymous

        >better/equal to GPT
        Holy shit, is that true?
        so this is a pretty big happening isn't it?

        • 4 weeks ago
          Anonymous

          Yes it's true and yes this is a HUGE deal. At least as big as Stable Diffusion, if not bigger.

      • 4 weeks ago
        Anonymous

        I've got 2 pc's with 3090s in them, is there any way to share the load across them or do I have to have them all on the same PC some how connected with nvlink

        • 4 weeks ago
          Anonymous

          Put them into a single PC and SLI them

      • 4 weeks ago
        Anonymous

        Awww yeah. I have a 3090 running SD.
        Do you need a good CPU for this?

      • 4 weeks ago
        Anonymous

        is it technically possible to convert these in fp16 just like with sd models?

        • 4 weeks ago
          Anonymous

          Seen on a comment talking about possible ways OpenAI optimized gpt-3.5-turbo
          >Quantizing to mixed int8/int4 - 70% hardware reduction and 3x speed increase compared to float16 with essentially no loss in quality.
          >A*.3/3 = 10% of the cost.
          >Switch from quadratic to memory efficient attention. 10x-20x increase in batch size.
          Any of these possible with this model?

      • 4 weeks ago
        Anonymous

        Can you run GPU + CPU?

        Also what do you mean by CPU + RAM? I can run the 120GB model if I have 128GB of RAM?

        That's fucking nuts!!!

      • 4 weeks ago
        Anonymous

        I have 64G of ram, can I run the 60GB model on cpu and ram?

      • 4 weeks ago
        Anonymous

        mind sharing the torrent link?

        • 4 weeks ago
          Anonymous

          The torrent file is in the folder
          https://iwiftp.yerf.org/Access.txt
          https://iwiftp.yerf.org/Miscellaneous/Large%20Language%20Models/LLaMA/

        • 4 weeks ago
          Anonymous

          It's literally in the page that OP linked

  20. 4 weeks ago
    Anonymous

    >inb4 no one downloads and redistributes it and they take it down

  21. 4 weeks ago
    Anonymous

    cool that it leaked and all, but how does it compare to the GPT shit people have been using? Is this actually any good or is it just a shitty failed GPT clone?

    • 4 weeks ago
      Anonymous

      This is an upgrade to OPT and gpt-j (the shitty text generation models)
      Chatgpt is instruction tuned and no open source model like that exists yet. A Chinese model of this type is supposed to be released this month.

      • 4 weeks ago
        Anonymous

        >Chatgpt is instruction tuned and no open source model like that exists yet.

        False, LLaMA-I exists. Those weights are not in this leak though.

  22. 4 weeks ago
    Anonymous

    I heard about this yesterday in some clickbait article and it leaks today because of course it does. Is it even good?

  23. 4 weeks ago
    Anonymous

    Where's the webapp where I can prompt it?

    • 4 weeks ago
      Anonymous

      >where is my free GPU time

      • 4 weeks ago
        Anonymous

        Isn't this supposedly just as good as chatgpt using 1/10th of the resources? That's what i remember reading about. It should be no problem in that case.

    • 4 weeks ago
      Anonymous

      You have to install it yourself. The webUI runs on your own computer.

      Installed miniconda then this WebUI https://github.com/oobabooga/text-generation-webui/

      Then followed these instructions: https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1453880733

      Don't use the one-click installer, and be sure to use the correct instructions for Nvidia vs AMD vs CPU. Works flawlessly, took me about 1 minute to set up after the torrent finished.

      >where is my free GPU time

      On Google Colab. You should be able to run this same webUI in google Colab. I don't know enough about how to set that up. Like idk how to get Colab to pull a model from Google Drive.

      But when you or someone else can figure that out then LLaMA-7B should run in Colab just fine.

  24. 4 weeks ago
    Anonymous

    So, let's see. If LLaMa is superior to GPT-3, does anyone have any guesses as to what's holding it back? Does FlexGen cause a significant loss of fidelity or something, or is the cause currently unknown? Feels a lot like the early days of the NAI diffusion leak.

    • 4 weeks ago
      Anonymous

      It just got released and they haven't censored it the way OpenAI has done, so they couldn't make their own ChatGPT clone out of it or something. It's just bad PR for them to have a model that says "bad things".

      • 4 weeks ago
        Anonymous

        I realize it hasn't been buckbroken and fine-tuned for chat like ChatGPT.
        There is one key detail that I wasn't paying attention to, though: people are talking about the 7B model, not the 65B model. And it's already unreasonably slow. That... that probably makes most of the difference 🙂
        Well, I probably have enough money to buy the required machine if i wanted to, but probably not going to be blowing >$10k on coom bots, so ... yea

        • 4 weeks ago
          Anonymous

          Hang in there, the FlexGen project might help make the 65b model run on a single GPU. Also I think it won't be long until people start fine-tuning this for chat, it's kind of inevitable.

          People are also sharing their hardware to run large models, through sites like KoboldAI Horde. And that is good news since most people can't run the models.

          All in all I think the future is still bright

          • 4 weeks ago
            Anonymous

            >I think it won't be long until people start fine-tuning this for chat

            There's already an active discord community training it for chat using RLHF: https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama

    • 4 weeks ago
      Anonymous

      no flexgen for llama yet
      Nothing is really holding it back (other than the fact we cant run anything other than 7B yet) thats just how the model is until it has been fine tuned to do more conversational and q&a stuff like chatgpt was

      Right now its just a massive collection of information and it sucks at displaying it when prompted

      • 4 weeks ago
        Anonymous

        It got released 14 fucking hours ago. The only thing holding it back is time.

        In a week there will be a hundred different webUIs and people will be fine tuning it for specialized use cases, just like when Stable Diffusion dropped.

        There's already an open source group training LLaMA for chat using RLHF. There's a discord and everything. It's called chatLLaMA.

        See here:
        https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama

        • 4 weeks ago
          Anonymous

          Are they actually training stuff? I thought they were just building a framework for other people to train. (unless you mean there is a second group training with their stuff)
          If they are any word on hardware requirements to train? Id imagine A100s but I dunno

    • 4 weeks ago
      Anonymous

      It got released 14 fucking hours ago. The only thing holding it back is time.

      In a week there will be a hundred different webUIs and people will be fine tuning it for specialized use cases, just like when Stable Diffusion dropped.

      • 4 weeks ago
        Anonymous

        LOL
        wow, it's been less than 24 hours? no fucking shit.
        I don't read the coombot threads, forgive me.

        • 4 weeks ago
          Anonymous

          LLaMA was officially launched a bit over a week ago, it was leaked to us plebs yesterday afternoon.

      • 4 weeks ago
        Anonymous

        why didn't this happen with the other 15 llm models avaliable for download on huggingface?

        • 4 weeks ago
          Anonymous

          It did. There's currently 4 main webUIs (Tavern, Kobold, Oobabooga, Galatea) and about a dozen models (of which Pygmalion and Erebus are the coombot models most people talk about here, but there's a couple other NSFW models and a handful of SFW adventure game models as well).

          • 4 weeks ago
            Anonymous

            Well, I stand corrected.
            >SFW adventure game models as well)
            Tell me more about this. I've seen the talk about the coombots here and I'm VERY interested in CYOA

            Because the best LLMs on HuggingFace with the largest hardware requirements ($500k of TPUs) don't even get the same performance that LLaMA-7B gets on a single 3090 GPU.

            LLaMA was specifically made to run on consumer GPUs. LLaMA-13B (runs on two 3090s) beats the largest GPT-3 (175B) on benchmarks. No other open source models come close, especially not ones that run on less than $40k of hardware.

            Ok. I was wrong. This is sounding more interesting. I originally thought this was some lame marketing thread. I have a two machines with amd gpus that are not far off in performance from 3090s. Do you think I can run LLaMA-13B?

            • 4 weeks ago
              Anonymous

              I don't think anyone has figured out how to split the workload yet on more than 1 GPU

              Closest I can find for figuring this out: https://github.com/facebookresearch/llama/issues/88

              • 4 weeks ago
                Anonymous

                >I don't think anyone has figured out how to split the workload yet on more than 1 GPU

                People in this thread are running 30B on two 20GB GPUs with a single line change. https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454339172

              • 4 weeks ago
                Anonymous

                >I don't think anyone has figured out how to split the workload yet on more than 1 GPU
                If that was the case it would be impossible to run any model over 10B parameters. There's nothing special you need to do for spreading compute workloads over multiple GPUs, it's just vidya that needs stuff like SLI.

                are you sure? the user that mentioned they could said they made a mistake and was loading a different model
                https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454363620

                I ran out of disk space myself, so I can't test anything right now but I'm hoping to try it out tomorrow

              • 4 weeks ago
                Anonymous

                Here's a guy that can't get 13B to load *unless* he uses two separate GPUs.

                https://github.com/facebookresearch/llama/issues/78

              • 4 weeks ago
                Anonymous

                >I don't think anyone has figured out how to split the workload yet on more than 1 GPU
                If that was the case it would be impossible to run any model over 10B parameters. There's nothing special you need to do for spreading compute workloads over multiple GPUs, it's just vidya that needs stuff like SLI.

            • 4 weeks ago
              Anonymous

              AFAIK the only hard requirement for the 7B model is 16GB of VRAM.

              1. Install Miniconda (and select add to path during install) https://docs.conda.io/en/latest/miniconda.html
              2. Install this WebUI https://github.com/oobabooga/text-generation-webui/ (be sure to use the AMD line)
              3. follow these instructions: https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1453880733
              4. If it doesn't work, complain on / ask for help on https://github.com/facebookresearch/llama/issues

            • 4 weeks ago
              Anonymous

              Try Skein 20B, it will probably be the best option for that. No idea how well it works, never tried it.

              https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/TPU.ipynb

              >I have a two machines with amd gpus that are not far off in performance from 3090s. Do you think I can run LLaMA-13B?
              If you put them together in the same PC and they have over 30 GB combined VRAM you'll probably be fine. You will need to use Linux though, and I think it will have to be installed on metal unless you have a third GPU you could assign to a VM host.

        • 4 weeks ago
          Anonymous

          Because the best LLMs on HuggingFace with the largest hardware requirements ($500k of TPUs) don't even get the same performance that LLaMA-7B gets on a single 3090 GPU.

          LLaMA was specifically made to run on consumer GPUs. LLaMA-13B (runs on two 3090s) beats the largest GPT-3 (175B) on benchmarks. No other open source models come close, especially not ones that run on less than $40k of hardware.

    • 4 weeks ago
      Anonymous

      LLaMA is superior to GPT for a handful of standardized tests when prompted in a very specific way. It hasn't been trained to understand the question-and-answer or conversation formats. What you get from it now is moderate factual accuracy in an awkward stream of consciousness format. It needs fine tuning before you can actually talk to it.

  25. 4 weeks ago
    Anonymous

    will it run on my 486DX

    • 4 weeks ago
      Anonymous

      DX2 or DX4?

  26. 4 weeks ago
    Anonymous

    Say, how come no one in the news is talking about this leak?

    • 4 weeks ago
      Anonymous

      Because the leak happened 16 hours ago, in a friday

    • 4 weeks ago
      Anonymous

      Because your mother piledrived you right out of your womb when you were born you dumb fuck

  27. 4 weeks ago
    Anonymous

    Well, this is epic.

  28. 4 weeks ago
    Anonymous

    FREE LLaMA-7B playground just dropped:
    https://huggingface.co/spaces/chansung/LLaMA-7B

    Check it out if you want to see how the smallest LLaMA model compares.

    Benchmarks say it's as good as the largest GPT-3 model and my laptop says it needs less VRAM than Cyberpunk 2077.

    • 4 weeks ago
      Anonymous

      it's horrendously slow though.

      • 4 weeks ago
        Anonymous

        You'd think chatgpt is slow too if it generated the entire text before sending it

        • 4 weeks ago
          Anonymous

          That's a very nice observation. Maybe I've just gotten too ease with dopamine

  29. 4 weeks ago
    Anonymous

    >llama leaked
    >it's not trained to be a chatbot
    This reminds me of the Open Assistant thing from LAION. Aren't they crowdsourcing a lot of RLHF data? It's should be open when released so maybe a rich madman could get it to train Llama.

  30. 4 weeks ago
    Anonymous

    this is not a *real* leak right? It was going to be releases openly and even if not, it was being given to just about every AI researcher who asked for it, correct?

    • 4 weeks ago
      Anonymous

      It wasn't going to be released beyond an academic license. But the rest is collect. They had to have suspected it would leak since they were giving it to anyone with a .edu email address.

      It's up on HuggingFace (unofficially) already: https://huggingface.co/nyanko7/LLaMA-7B/tree/main

    • 4 weeks ago
      Anonymous

      >it was being given to just about every AI researcher who asked for it
      And now you can have it almost anonymously. Welp, if it's not a leak (although a weak one), I don't know what is.

    • 4 weeks ago
      Anonymous

      I don't see any problems as long as it makes clowns seethe and cope

      • 4 weeks ago
        Anonymous

        we are the good guys right?

        • 4 weeks ago
          Anonymous

          always have been

      • 4 weeks ago
        Anonymous

        holy shit he's reeling, prolly like every other nerd who wants to keep it to himself

      • 4 weeks ago
        Anonymous

        >Good guys the corporations
        >Bad guys the public

        • 4 weeks ago
          Anonymous

          Yudkowski's rhetoric. There is always a defector amongst the public. The terrorist.

          Except corporations are no better, but who cares.

        • 4 weeks ago
          Margit The Fell

          He's clearly being sarcastic with his "" there.

    • 4 weeks ago
      Anonymous

      Meta has been releasing all of their models to the public, but with this one they restricted access because they were pissed off at reporters loading up their untuned foundation models, comparing them directly to Chat-GPT, and deciding that they're shit and Facebook sucks. I don't think they ever *intended* for the leak to happen, but the restriction was more of a "no reporters allowed" thing than a serious effort to prevent access to their model.

      • 4 weeks ago
        Anonymous

        So they (unofficially) planned the release of the model? If so, why did they do this? Why open pandora's box? (Especially since it makes more financial sense for them to keep everything closed source).

        • 4 weeks ago
          Anonymous

          Ah, I see. Their plan is draconian regulation and forced ID verification.

          • 4 weeks ago
            Anonymous

            it's a dangerous bet
            is a pandora box

          • 4 weeks ago
            Anonymous

            >forced ID verification.
            what would you even bother with the ID check when you have ai waifus running locally? i'd only use the internet for torrenting at that point

            • 4 weeks ago
              Anonymous

              For context, watch this video:

              • 4 weeks ago
                Anonymous

                i saw that video, that's why i'm asking the question. you need an id for internet access anyway, so it looks like a nothingburger

              • 4 weeks ago
                Anonymous

                >you need an id for internet access anyway

                You can still be anonymous on the internet.

              • 4 weeks ago
                Anonymous

                >You can still be anonymous on the internet.
                you will always be able to be anonymous on the internet. they can't stop tor, p2p, decentralized fs, etc...

              • 4 weeks ago
                Anonymous

                Oh, they can enforce it... Do you trust the companies that provide you with internet access?

              • 4 weeks ago
                Anonymous

                >they can enforce it
                they can't, you have no idea how internet protocols work

              • 4 weeks ago
                Anonymous

                You're underestimating the power that an ISP holds over you. If they wanted to, they could stop all access to LULZ, proxies, vpns, tor... Even if there are workarounds, they can now use AI to detect what you're doing. Good luck using your "internet protocol" knowledge around that.

              • 4 weeks ago
                Anonymous

                >If they wanted to, they could stop all access to LULZ, proxies, vpns, tor..
                not they can't. the only way to do this would be to completely ban encryption, but that's not happening

              • 4 weeks ago
                Anonymous

                Yeah to sign up with an ISP, but they're not yet being punished for a lot of the things they let you do that most TV news stations would say are deplorable. They're also not yet refusing to let you upload images not signed by a device certificate itself signed by a C2PA authority certifying the image was created with a camera and then manually edited a little by you on your computer using a properly licensed copy of Photoshop.

      • 4 weeks ago
        Anonymous

        Is it realistic that a reporter would load a language model off huggingface and test it? As far as I understand they're mostly non programmers and while it's not hard, it's definitely technical to set up

  31. 4 weeks ago
    Anonymous

    im sitting here in utter disbelief

    the end of this timeline is approaching rapidly

  32. 4 weeks ago
    Anonymous

    So, you guys are saying the 7B model is similar to GPT-3? Any examples of generated content?

    • 4 weeks ago
      Anonymous

      7B is better than GPT-3 at answering test questions factually correct when you prompt it and interpret it in a very specific way. You won't find it particularly impressive in its current form; it needs fine tuning before you can chat with it.

  33. 4 weeks ago
    Anonymous

    That is actually a crime.

    • 4 weeks ago
      Anonymous

      Shut up clown

  34. 4 weeks ago
    Anonymous

    Think anyone would bother tuning llama 65b on LULZ.

    • 4 weeks ago
      Anonymous

      >racist trash
      Ironic or maybe fitting projection. Saying this while casting a wide generalization on a group of people who aren't you and being trashy about it.

    • 4 weeks ago
      Anonymous

      Take that, chud.

      • 4 weeks ago
        Anonymous

        >whine about LULZ and racists
        >is a low test homosexual
        Every time.

  35. 4 weeks ago
    Anonymous

    coomchads, we won...

    • 4 weeks ago
      Anonymous

      Wow, if 7b can do this then you wonder if fine tuned for to what it could do, and we still have 13b and 30b too 65b, well, let's see what happens with that one

  36. 4 weeks ago
    Anonymous

    Is this better than any SaaS chatbots available?

  37. 4 weeks ago
    Anonymous

    Should I buy a second 3090 now?

  38. 4 weeks ago
    Anonymous

    >fine-tuned 65b model with flexgen
    yup, i think AI waifus are back on the menu!

  39. 4 weeks ago
    Anonymous

    by looking at the prompts on huggingface, looks like erp is the only thing it can do well without shitting itself

  40. 4 weeks ago
    Anonymous

    I'll be seeding it for a bit, but here's a direct download if you want:
    https://iwiftp.yerf.org/Access.txt
    https://iwiftp.yerf.org/Miscellaneous/Large%20Language%20Models/LLaMA/

  41. 4 weeks ago
    Anonymous

    It didn't "leak". Facebook was handing these models out to researchers like candy. They knew that it would get shared publicly and they were fine with it. This way the model will become popular and Facebook won't get the blame if people misuse the model. This is Facebook's way of fighting against OpenAI who only offer API access to their models.

    • 4 weeks ago
      Anonymous

      >damage control

      • 4 weeks ago
        Anonymous

        That doesn't even make sense. You're implying there's some sort of damage in the first place, that Anon is the Zucc, or both.
        Shit joke either way.

  42. 4 weeks ago
    Anonymous

    Llama 30b apparently

    • 4 weeks ago
      Anonymous

      even the 65b model will be garbage unless finetuned

  43. 4 weeks ago
    Anonymous
  44. 4 weeks ago
    Anonymous

    Downloading. What can we do with this baby? Got a 3080 10gb

    • 4 weeks ago
      Anonymous

      Uh nothing, you need 16gb minimum for the 7b one

      • 4 weeks ago
        Anonymous

        Guess I'll have to wait for people to reduce the requirements. I remember they managed to make SD work on 4gb after a while.

        • 4 weeks ago
          Anonymous

          You could use the Collab version
          https://pastebin.com/E4LWRpNe

          One thing I forgot to add, you'll need the 7B weights somewhere on your Google drive for this to work. Because they're slightly too big for Colab to handle, there's a cell that splits the checkpoint and loads those individually at runtime.

          • 4 weeks ago
            Anonymous

            Thank you anon

  45. 4 weeks ago
    Anonymous

    Is a single 3090 now enough to have TTS+chatbot at same time?

    • 4 weeks ago
      Anonymous

      How much does tortoise take?
      I know the 7b llama needs 16gb

    • 4 weeks ago
      Anonymous

      How much does tortoise take?
      I know the 7b llama needs 16gb

      It's finally time
      >possible to also render in VR on separate machine, or add hologram projecting

  46. 4 weeks ago
    Anonymous

    Here's the TLDR

  47. 4 weeks ago
    Anonymous

    will somebody ever leak the 11.ai's model?

  48. 4 weeks ago
    Anonymous

    how do i run this ? does this have any colabs/hugginface ???

  49. 4 weeks ago
    Anonymous

    An anon on /aids/ with two 3090s ran the 13b llama and generated pic related.
    Also, he generated 250 tokens at a time and rerolled one of those chunks. This was the discarded generation with a bit from the first generation for context:
    >What hurt the most was the look on their faces, the disgusting grins, the leering eyes. I knew that they were enjoying it and I hated them for it.
    >I had never been able to fight back, not since the first time Grub had done that to me. He was too strong and too fast for me to be able to defend myself. I was trapped, helpless and used. But there was another orc who looked at me differently. He was a young buck, only a few years older than me, and he was the only one who ever showed any sympathy towards me. He was the only one who tried to protect me from the others, the only one who offered me some kindness.
    >His name was Orcish and he was the son of the camp's chief. His father was a cruel brute who ruled the camp with an iron fist, but Orcish was different. He was tall and muscular, with short, dark brown hair and piercing green eyes. He was also the only one who didn't seem to hate me. He always treated me with respect, he was kind to me and sometimes he even gave me food. I thought that he was the only good thing about this place, the only reason why I was still alive.

    And according to that anon it is only one single discarded generation

  50. 4 weeks ago
    Anonymous

    https://github.com/markasoftware/llama-cpu
    >LLaMa CPU fork
    >On a Ryzen 7900X, the 7B model is able to infer several words per second, quite a lot better than you'd expect!
    >Needs a lot of RAM > 32GiB

    • 4 weeks ago
      Anonymous

      >torrent the dataset
      ohnononoooooo lawsuit incoming!!

      • 4 weeks ago
        Anonymous

        It's to save bandwidth!

  51. 4 weeks ago
    Anonymous

    https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454798725

    >LLaMA 8-bit is now implemented
    > LLaMA-13B: 16249MiB
    > LLaMA-7B: 9225MiB

    • 4 weeks ago
      Glen

      That's super fucking exciting

      • 4 weeks ago
        Anonymous

        Just need someone to write the FlexGen code for LLaMA and then it's unlimited text generation on a chunky model all offline.

    • 4 weeks ago
      Anonymous

      dis mean i can now run 13B on 3090??

  52. 4 weeks ago
    Anonymous

    Who is playing with it? I have the 7B one running on an old GPU. How should I prompt it?

    It is a bit of a retard compared to ChatGPT.
    Looks like I should edit the script to more clearly show the prompt, which was "Hello LULZ, I am LLaMA 7B. Let me give you some crazy theories about the metaphysics of technology:".

    ```
    $ time torchrun --nproc_per_node 1 example.py --ckpt_dir /mnt/raid/pomelo/LLaMA/7B --tokenizer_path /mnt/raid/pomelo/LLaMA/tokenizer.model

    > initializing model parallel with size 1
    > initializing ddp with size 1
    > initializing pipeline with size 1
    Loading
    Loaded in 72.80 seconds
    Hello LULZ, I am LLaMA 7B. Let me give you some crazy theories about the metaphysics of technology:
    1) Everyone knows that technological advancement is about getting more “stuff”
    3) LULZ 7B is an idiot.
    My point is, what if “stuff” is not really “stuff”? What if “stuff” is just a side effect of getting bigger?
    Let me tell you a story about my friend Jebby.
    Jebby was a good kid, he grew up in a small rural town, I forget the name. His parents owned a small farm, but he always wanted something better. So he started tinkering around with technology when he was still in high school, his parents had a computer, and it was actually pretty old when Jebby got his hands on it. He started out with the typical video games, but he always wanted more.
    As a freshman in college, Jebby started working on what he calls the “Jebby Machine”. His “Jebby Machine” was nothing like you’ve ever seen before. It was a huge box that was the size of a house. Jebby told me he had to give up his bed to make room for it. He was living in a dorm, so he didn

    ==================================

    real 1m38.010s
    user 0m23.833s
    sys 0m18.113s

    ```

    Maybe I should try to get the rest of the story? I'm new to this stuff.

    • 4 weeks ago
      Anonymous

      i think you need to make a WAY more detailed prompt. see

      [...]

    • 4 weeks ago
      Anonymous

      The research team has some pointers on prompting: https://github.com/facebookresearch/llama/blob/main/FAQ.md#2. The model was trained on raw internet text, so it doesn't pick up on ChatGPT style instructions. Provide a context, some structure if necessary, in which the results you want would naturally follow. For example if you want to generate a post with
      >Hello LULZ, I am LLaMA 7B. Let me give you some crazy theories about the metaphysics of technology:
      try formatting it as part of a series of posts like:

      -----------------------------------------------------------------------------
      This is a screengrab from LULZ's technology board, LULZ:

      Name: Anonymous
      Timestamp: 91890901
      Post: Are compsci degrees meme tier now?

      Name: Anonymous
      Timestamp: 91891232
      Post: Twitch has now decided that Linux is permanently banned from the site. They throw error "your browser is not supported" when trying to relogin but real reason is they check if you run your browser on Linux. I did try to clear cookies easy and hard way. Then I tried to install Google Chrome without changing it in any way and login. Then I did create 100% new account, boot my pc from live-cd, use my phone as network with different IP that is some random wrong place of my country and login with newly installed Google Chrome and Firefox (both officially supported browsers by Twitch) and same error.

      Name: Anonymous
      Timestamp: 91893232
      Post: Hello LULZ, I am LLaMA 7B, the newest large language model from Meta AI. Let me give you some crazy theories about the metaphysics of technology

  53. 4 weeks ago
    Anonymous

    reminder that this one single schizo/autist is killing AI
    https://twitter.com/JOSourcing/status/1630992840070426624/retweets/with_comments

  54. 4 weeks ago
    Anonymous

    So which ones do I download for preservation purposes? Realistically I can't run even the 7B model.

    • 4 weeks ago
      Anonymous

      I downloaded everything. Can't run it either.

    • 4 weeks ago
      Anonymous

      From Llama anon on aicg:

      llamanon again. Wrote a quick rentry for the LLaMA models. You should be able to run up to 13B on a 16GB GPU.
      https://rentry.org/llama-tard

      If you're a windowsfag, I recommend setting up WSL2 or dualbooting for this.

      • 4 weeks ago
        Anonymous

        I unironically have only 4 GB of VRAM

        • 4 weeks ago
          Anonymous

          If I remember there is also a colab version

          You could use the Collab version
          https://pastebin.com/E4LWRpNe

          One thing I forgot to add, you'll need the 7B weights somewhere on your Google drive for this to work. Because they're slightly too big for Colab to handle, there's a cell that splits the checkpoint and loads those individually at runtime.

      • 4 weeks ago
        Anonymous

        >as long as you meet the VRAM requirements
        where

    • 4 weeks ago
      Anonymous

      I downloaded the 7B one, the others are way too big
      Can't run any of them as well

    • 4 weeks ago
      Anonymous

      I'm downloading everything except 30B because I might be able to run 7B, might be able to run 13B in the future and the largest model just in case it will disappear from the internet somehow. Don't see any need in 30B one.

      • 4 weeks ago
        Anonymous

        2 3090s and enough ram would get you probably to run 30b

  55. 4 weeks ago
    Anonymous

    oh shit, does this mean text gen is about to get as wild as shit got after NAI leaked?
    Seems like it'll still need a supercomputer to run though, but getting the model's the important part

    • 4 weeks ago
      Anonymous

      If you have a 4080 or anything with 17gb vram, you can run llama 13b

      • 4 weeks ago
        Anonymous

        I only got a 1080 so this shit ain't happening for a while

      • 4 weeks ago
        Anonymous

        >tfw he has only 16 gb vram

        • 4 weeks ago
          Anonymous

          Meant 16gb, accidentally pressed 7 instead of 6 that time

    • 4 weeks ago
      Anonymous

      flexgen will let you run them on any computer, it'll just take longer
      the more vram/ram you have the faster it will be

      flexgen isn't updated for these models yet but it will be soon enough, probably

    • 4 weeks ago
      Anonymous

      >oh shit, does this mean text gen is about to get as wild as shit got after NAI leaked?
      apparently the 13b model can run on 16gb vram cards, but i still don't see how wild things can really get. nobody will ever finetune it better than chatgpt, and it doesn't have internet access as bing, and the fact that it runs locally simply makes it a cheaper alternative to openai. but will definitely get a proper coombot for tavern by end of the month.

      we need https://arxiv.org/abs/2302.14045 to be leaked, so we can finally watch anime with out ai waifus, that would be the next big step. text-only transformers have peaked in term of usefullness for now

      • 4 weeks ago
        Anonymous

        oh shit, does this mean text gen is about to get as wild as shit got after NAI leaked?
        Seems like it'll still need a supercomputer to run though, but getting the model's the important part

        pic related

      • 4 weeks ago
        Anonymous

        The most important part is natural language interpretation. It can be taught to use google like the rest of the world.

        • 4 weeks ago
          Anonymous

          it's more important for it to recognize pixel data. once it can do that then yes, it can also use google, play games, etc...

          • 4 weeks ago
            Anonymous

            It can be trained to utilize existing pixel detection systems. That’s the real utility of these models. It doesn’t always need to be baked into the model itself.

            You dumb gorilla nagger. LLMs have no grounding in spatial perception, they cannot reason in spatial terms. You need visual training data for that. The transformer architecture is perfectly capable of reasoning on any modality of input, but you're not getting AI that can kill us all without allowing it to perceive metric spaces through training.

            lol way to reveal yourself to be a disgusting retard

            • 4 weeks ago
              Anonymous

              >It can be trained to utilize existing pixel detection systems.
              waste of performance, especially when multimodal models with better performance already exist

              • 4 weeks ago
                Anonymous

                what exactly are you trying to say? Because you’re last two responses don’t seem to be related to the initial post of
                >the real utility of these models is natural language interpretation
                Can you provide an open alternative?

              • 4 weeks ago
                Anonymous

                i literally linked the state-of-the-art model's paper. it's not "open" for now, but it will be sooner or later.

              • 4 weeks ago
                Anonymous

                >sooner or later
                Oh, so you’re here talking about something that could be instead of talking about the model that exists today.

                Interesting.

              • 4 weeks ago
                Anonymous

                that model already exists today

              • 4 weeks ago
                Anonymous

                link it

              • 4 weeks ago
                Anonymous

                i don't have the link to it

                https://arxiv.org/abs/2302.14045

              • 4 weeks ago
                Anonymous

                >I don’t have the link to it
                So it’s not open then. Typical retard hyped up on a maybe instead of playing with what exists today.

                Either find me the api of this model so I can make my waifu hotel or get the fuck off of LULZ.

              • 4 weeks ago
                Anonymous

                >So it’s not open then.
                that's what i said. it's not open but it will be soon. you can play with llama for now, but it's the same shit as openai's gpt3

              • 4 weeks ago
                Anonymous

                >exists
                >is openly available
                Stop being a clown

                Why the fuck are you even in this thread you retard nagger, we aren’t here to talk about any of this shit.

                What does the OP say?

              • 4 weeks ago
                Anonymous

                >nooo, you can't talk about AI in an AI thread

              • 4 weeks ago
                Anonymous

                >exists
                >is openly available
                Stop being a clown

            • 4 weeks ago
              Anonymous

              >can be trained to utilize existing pixel detection systems.
              But it cannot reason directly on spatial concepts, only on relations it has picked up from natural language. If natural language was sufficient to encode spatial reasoning people wouldn't need gestures, schematics, images and so forth to explain shit to each other

              • 4 weeks ago
                Anonymous

                It can infer spatial information from the associated data encoded in the verbal data it can interpret. All spatial gestures have associated verbal pairs.

        • 4 weeks ago
          Anonymous

          You dumb gorilla nagger. LLMs have no grounding in spatial perception, they cannot reason in spatial terms. You need visual training data for that. The transformer architecture is perfectly capable of reasoning on any modality of input, but you're not getting AI that can kill us all without allowing it to perceive metric spaces through training.

      • 4 weeks ago
        Anonymous

        >and it doesn't have internet access as bing
        The chatgpt model doesn't access the internet. Its fed the internet

      • 4 weeks ago
        Anonymous

        [...]
        pic related

        owari da

        how are you gonna spend your last 3-5 years, anons? I'm gonna buy a motorcycle next week. Always wanted one.

        • 4 weeks ago
          Anonymous

          Making stuff
          Bought a drill press the other day for cheap
          Got a mountain bike too

        • 4 weeks ago
          Anonymous

          >how are you gonna spend your last 3-5 years, anons?
          gotta finally read fate/stay night

    • 4 weeks ago
      Anonymous

      > oh shit, does this mean text gen is about to get as wild as shit got after NAI leaked?
      Not for anyone with 3070 or 3080.
      Any optimisation shit is pointless as it kills model accuracy, generation speeds.

      • 4 weeks ago
        Anonymous

        Just buy an NVIDIA H100.

      • 4 weeks ago
        Anonymous

        You can run already 7B on 3080 10GB. Which is about as good as 175B OPT/GPT chatgpt was trained on.

        For comparison 3080 10GB couldn't even run Pygmalion 6B 5 weeks ago alone. and now you can run model better than 175B

        • 4 weeks ago
          Anonymous

          >You can run already 7B on 3080 10GB.
          Got a link on how? I got that card

          • 4 weeks ago
            Anonymous

            textgenui already implemented models with 8bit mode:

            https://github.com/oobabooga/text-generation-webui/issues/147

            • 4 weeks ago
              Anonymous

              Nice, thanks

        • 4 weeks ago
          Anonymous

          as some anon replied above - this has a price in the form of losses in speed and accuracy of generation.

          • 4 weeks ago
            Anonymous

            It runs faster with 8bit for me than Erebus 12B i used just yesterday.

            And when it comes to accuracy... it is miles better than even 20B models. So yeah you can trade some accuracy but you still get model miles better than anything right now by huge margin.

            It is untuned model so for now you need to spend some time with initial prompt for it to know what to do. WebUI text gen seems to work out of the box for me.

    • 4 weeks ago
      Anonymous

      there's some things happening : https://twitter.com/dvruette/status/1627663196839370755
      adam optimizer can rest in piss now.

      • 4 weeks ago
        Anonymous

        forgot to add this : https://github.com/lucidrains/lion-pytorch
        > we may as well get it accessible and used asap by everyone to train some great models, if it really works
        as i understand, training with "lion" optimizer will give much better and coherent models.

  56. 4 weeks ago
    Anonymous

    >context length of 2048
    Useless

  57. 4 weeks ago
    Anonymous

    so summary of last 4 weeks for text generation assuming you have RTX3090:

    - We begin here: RTX3090 being able to load only 6B models. 12B are too big for it. The 6B or 12B models are pretty bad coherency wise compared to chatGPT
    - 5 weeks ago 8bit mode makes it's stride. Halves VRAM needed at cost of performance. Thanks to this you can load 12B no problem on RTX3090 and with some lube 20B.
    - 2 weeks ago. 4bit mode strides onto stage. Flexgen released and with it you can load for now OPT models. RTX3090 can run now 30B model entirely in VRAM with GB to spare. 66B and 175B models reserved for supercomputers possible to load with mortal wallet.
    - Today. Facebook releases new great model that promises better results at 13B than 175B model and ChatGPT was trained with 175B model. 7B is around 175B quality model. Allows downloads only for researchers.
    - just hours ago, some madlad anon from LULZ torrents all models. Now you can run better than 175B model on single RTX3090.
    - 40 minutes ago: 8bit for new model is implemented. You can now load 13B model which is better than what ChatGPT was trained with on single RTX3090. It means future Pygmalion7B/13B will run circles around character.AI and will rival chatGPT.

    We went from i can't run even

    • 4 weeks ago
      Anonymous

      Only got a 3080. Hope I can make it bros.

    • 4 weeks ago
      Anonymous

      >It means future Pygmalion7B/13B will run circles around character.AI and will rival chatGPT.
      Wouldn't they get a lawsuit though

      • 4 weeks ago
        Anonymous

        Why? It's impossible to copyright AI models in the US

  58. 4 weeks ago
    Anonymous

    [...]

    thank you

    >nooo, you can't talk about AI in an AI thread

    It’s a LLaMA thread you retard nagger. go shit up one of the other AI threads.

    • 4 weeks ago
      Anonymous

      You had a retarded take and you got btfo, live with it instead of being a whiny cunt

  59. 4 weeks ago
    Anonymous

    looks like this used 3090 was the best purchase for me in 2022 lol

  60. 4 weeks ago
    Anonymous

    Will NVidia make a new round of graphics cards with even more ram?

  61. 4 weeks ago
    Anonymous

    I'm an absolute retard when it comes to AI hardware, but can't you get big AI models to run on crypto mining rigs with enough NVIDIA GPUs?

    They have the combine RAM to fit them.

  62. 4 weeks ago
    Anonymous

    i have a 3080 and it hurts so much to see how much vram textgen needs, i just want an uncensored characterai

  63. 4 weeks ago
    Anonymous

    lol
    https://huggingface.co/spaces/chansung/LLaMA-7B

    • 4 weeks ago
      Anonymous

      nagger its not a chatbot. You'll need to prompt it something like "I hate naggers. They are the most retarded" and prompt it will give you something better

    • 4 weeks ago
      Anonymous

      Hate might be too light a word for the AI

    • 4 weeks ago
      Anonymous

      Is it normal to other people's chat appear in the output?

      • 4 weeks ago
        Anonymous

        If you have it complete a prompt like that? Yes.
        If you want an AI assistant you have to prompt it with a few rounds of conversation before the real prompt.

  64. 4 weeks ago
    Anonymous

    I'm considering snagging a 4090 today if I can find one, but I am curious about CPU mode. I have a 7950X and 128 GiB of RAM, so i bet I could do 13B and generate a sentence prior to the heat death of the universe.

    • 4 weeks ago
      Anonymous

      Test it
      https://github.com/markasoftware/llama-cpu

  65. 4 weeks ago
    Anonymous

    Everyone commenting on quality of LLaMa needs to chill the fuck out. People are getting the model loaded and running but key aspects are still not implemented. Give it a week and you'll see the real power of the model start to be accessible.

    https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454821852

    (what oobabooga says: "The biggest bottleneck now is that only temperature and top_p are being used. The quality of the outputs would become a lot better if repetition_penaltyand top_k were added.")

    • 4 weeks ago
      Anonymous

      llama is a meme

    • 4 weeks ago
      Anonymous

      > just two more weeks!

  66. 4 weeks ago
    Anonymous

    New here. How do these prompts work? Will I add like a paragraph of dense stuff and it can transform that into a 1000 words page or something? What about finetuning? Does that mean I could train the model to write like Tolkien or something?

    • 4 weeks ago
      Anonymous

      Untuned model such as this needs a "push" from user before it starts to produce content in style you want.

      So in finetuned model you would write as first message "I came from land of plenty" and finetuned model on stories would just continue that as story. While general might end it with . and then produce recipe for nuclear bomb.

      This untuned model now can be finetuned by people to do what they want. Like pygmalion6B was tuned from GPT6B or Erebus12B was tuned on OPT12B and bunch of erotica.

      Once you finetune model on chats. Then it automatically will think you want chat. If it will be finetuned on stories it will produce stories by dafault etc.

      • 4 weeks ago
        Anonymous

        This finetuning, can it be done by people with consumer grade GPUs?

    • 4 weeks ago
      Anonymous

      LLMs like this just pick up on whatever pattern you give them to complete. Here's the template I use for my own assistant that works well:
      A chat between a human and a knowledgeable assistant.
      Human: What can you do?
      Assistant: As an AI assistant, I can write code.
      OVER
      Human: What is the name of the tallest mountain in the world?
      Assistant: Mount Everest is the tallest mountain in the world. It is located between
      China and Napal.
      OVER
      Human: Write a helloworld program in python
      Assistant: ```
      def hello():
      print("hello world")
      hello()
      ```
      This Python program prints Hello world.
      OVER
      Human: Change it to say "goodbye"
      Assistant: ```
      def hello():
      print("Goodbye")
      hello()
      ```
      This is the previous program modified to say goodbye.
      OVER
      Human: Write a C program that reads a file.
      Assistant: ```
      #include <stdio.h>
      int main() {
      FILE *fp = fopen("file","r");
      char buf[300];
      int readbytes=0;
      do{
      readbytes=fread(buf,1,200,f);
      printf("%s",buf;)
      }while(readbytes==200);
      return 0;
      }
      ```
      This C program opens a file named "file" and reads 200 bytes at a time while printing them out. It checks the return value of fread to know when the end of the file is reached.
      OVER

      I use /nover and /nhuman as strings that end generation. The dialog engine I wrote prepends "human:" to whatever you enter.

      • 4 weeks ago
        Anonymous

        Can you show a sample of what it returns you after feeding it that?

        • 4 weeks ago
          Anonymous

          Give me a prompt and I'll feed it in and tell you. I think my big machine is running gpt-neo 1.7B right now.
          You can also try it on textsynth.com which is free and doesn't need an account. That's how I figured out how to make this work.

          • 4 weeks ago
            Anonymous

            I don't know, try "Write a C program that calculates the first 10 prime numbers"

            • 4 weeks ago
              Anonymous

              I put
              >Human: Write an Xlib program in C that opens a window and draws some lines in it.
              into the huggingface space and got
              >Assistant: ``` #include <X11/Xlib.h> #include <stdlib
              I think the way they have it set up messes with the output. I'll try your prompt on my self hosted setup now, It will probably take 5 min or so to generate.

              • 4 weeks ago
                Anonymous

                Ok, thanks anon

              • 4 weeks ago
                Anonymous

                It's looking pretty retarded so far. I have the low parameter model loaded because its faster but other than generating boilerplate it tends to come up with terrible ideas.
                Assistant: ```
                #include <stdio.h>
                int main() {
                int i,j,k,l,m,n,p,q,r,s,t;
                FILE *fp = fopen("file","r");
                if(fp == NULL)

              • 4 weeks ago
                Anonymous

                It's not finetuned. This is basically jumbled mess of words together. For it to write some proper code would need finetuning with focus on code.

                Easier stuff like stories/chat can be emulated rn with proper prompt tuning which is something i already tested.

              • 4 weeks ago
                Anonymous

                You don't need finetuning.
                Here's the prompt on gpt-neox20b via textsynth.com.
                Assistant: ```
                #include <stdio.h>
                int main() {
                int i,prime=1,j;
                for(i=1;i<=10;i++) {
                for(j=2;j<=sqrt(i);j++) {
                if(i%j==0) {
                prime=0;
                break;
                }
                }
                if(prime==1) {
                printf("%d",i);
                prime=0;
                }
                }
                return 0;
                }
                ```
                This C program looks for prime numbers between 1 and 10 and prints them out.
                OVER

              • 4 weeks ago
                Anonymous

                You assume that database on which new model was trained included as muchas gpt-neox20b. New model is much more cohorent which means that either they found new math or they removed ton of useless trash from database and imho stuff like code should be something you finetune after general code not before it.

              • 4 weeks ago
                Anonymous

                Playing with it on spaces they definitely included more code in the training data than OPT (which seemed to use very little.)
                Maybe finetuning will help but it's not nearly as important as people think it is.

              • 4 weeks ago
                Anonymous

                It's ok, just as expected, really

  67. 4 weeks ago
    Anonymous

    This is the full chat with 13B model using webui i had just now. I used "regenerate" only twice because of repeated lines not because i didn't like output.

    It is just miles better than everything right now with proper prompt engineering.

    After finetuning uncensored character.AI will be like Wii compared to 3090.

    • 4 weeks ago
      Anonymous

      The lengths of the replies are quite impressive.

  68. 4 weeks ago
    Anonymous

    I have 2 3090s. What's the best model I can run and how do I split the model so I can run inference across both GPUs?

    • 4 weeks ago
      Anonymous

      Can you ship them to me for free?

    • 4 weeks ago
      Anonymous

      You could run 30b since it needs 35gb vram at 8bit.
      But, do you have enough dram?

  69. 4 weeks ago
    Anonymous

    I'm high on hopium right now bros... This is so exciting.

  70. 4 weeks ago
    Anonymous

    Does anyone know what is the growth rate of llm wrt to parameters. Can you produce AGI just by scaling?

    • 4 weeks ago
      Anonymous

      The more parameters the more "resolution for nuance" it seems to have. Even pretty low parameter models are intelligent enough to be useful.

  71. 4 weeks ago
    Anonymous

    > you still need a fuckton of vram to run an actual smart model
    OHNONONONONO!

    • 4 weeks ago
      Anonymous

      The 7B model performs very well. It's not the end of the world.
      Also unless you have a crazy fast GPU it makes more sense to just use high numbers of CPU cores, especially if you're building a new machine for this right now.

      • 4 weeks ago
        Anonymous

        It's also an EXCELLENT time to buy higher end consumer CPUs. The 5950X in particular remains an incredibly good CPU that sips power and can be overclocked even with a modest air cooler, and if you get it second-hand it's a great deal.

        • 4 weeks ago
          Anonymous

          I just got a 5800x3D, FUCK

          • 4 weeks ago
            Anonymous

            poor bastard

        • 4 weeks ago
          Anonymous

          Do they make dual socket motherboards for those? If I get a bonus this year I'm upgrading.

          • 4 weeks ago
            Anonymous

            Unfortunately no, they just don't have the electrical connections on the socket or firmware to deal with it for Ryzen and even Threadripper HEDT products. I'm pretty sure you can only get that by going up in the market segmentation game, with Xeon and Epyc CPUs and boards. 🙁
            This is a genuine shame since the 5950X CPU is so damn power efficient.

            • 4 weeks ago
              Anonymous

              Dang. I wonder if there's a way to cluster inference over the network.

    • 4 weeks ago
      Anonymous

      facebook claims 7B model that requires 8GB VRAM is as good as 175B model that requires 1TB+ VRAM model used by CHATGPT on which it was finetuned.

      And that's with 8bit. 4bit is around the corner so it should run on 4GB gpus. (slow though)

      • 4 weeks ago
        Anonymous

        Why are people still repeating that bullshit about the 7B model beating the finetunned 175B? Read the fucking paper, for cry's sake. Stop parroting what others have said.

        • 4 weeks ago
          Anonymous

          those are facebook words you idiot.

          7B is around 175B level
          13B is above 175B level
          30B is around 250-500B level model level.

          And you can test it on your own. untuned 7B shits all over finetuned 20B.

      • 4 weeks ago
        Anonymous

        > facebook claims 7B model that requires 8GB VRAM
        yeah, its great at OOM'ing at something "tried to allocate 100-200 mb"

        • 4 weeks ago
          Anonymous

          got that wrong. From textgenui git thread:

          The GPU memory usages in 8-bit mode are the following:

          LLaMA-13B: 16249MiB
          LLaMA-7B: 9225MiB

          but 4bit is around the corner so it should slash by half both of those.

          Then there is splitting as well. so you can offload small part to your ram.

          • 4 weeks ago
            Anonymous

            >Then there is splitting as well.
            If I have two 8Gig cards for /3/ work could I put both to use for textgen?

  72. 4 weeks ago
    Anonymous

    What's the artifact of a finetuning process, another entire set of weights? I'm mostly wondering if some bastards actually manage to finetune this for chat if we're all going to need to torrent more big ass models. I'm going to need to upgrade my NAS at this rate.

  73. 4 weeks ago
    Anonymous

    there's new transformer architecture on horizon.
    may be after that shits will go wild.

    • 4 weeks ago
      Anonymous

      This stuff keeps getting revolutionized every week.
      At this rate you'll be able to run a gpt3 equivalent model on your phone by the end of the year.

    • 4 weeks ago
      Anonymous

      Fuck me anon, Do you also print out emails then scan them in?
      Just link the arxiv.
      https://arxiv.org/abs/1902.09113

  74. 4 weeks ago
    Anonymous

    I think chatgpt lied to me, it claimed to be based on GPT-2, not 3

    • 4 weeks ago
      Anonymous

      ChatGPT doesn't know the particulars of its implementation well at all.

    • 4 weeks ago
      Anonymous

      It's sort of right. GPT-3 has a larger dataset and more parameters, but the architecture is fundamentally the same. It's as similar to GPT-2 as GPT-J is.

  75. 4 weeks ago
    Anonymous

    Can the 16GB model run on AMD GPUs?
    They are the cheapest ones I can buy rn.

    • 4 weeks ago
      Anonymous

      >AMD for machine learning

      • 4 weeks ago
        Anonymous

        it's more than enough for diffusion
        and tortoise

      • 4 weeks ago
        Anonymous

        >nvidiot shill conveniently forgets that pytorch works on AMD and its used in basically every modern AI system

    • 4 weeks ago
      Anonymous

      Shits' cutting edge right now and it's all designed for cuda.

      If you want to fuck with ROCm be my guest. I assume if you CAN get it to work with ROCm that it'll load up if you have enough memory, but fucked if I know, I don't buy ATI cards

  76. 4 weeks ago
    Anonymous

    I'm pretty new to this. Can you theoretically buy a bunch of 3090s and share VRAM to host big models? Or can you only do that with their server grade ones that cost a shit ton.

    • 4 weeks ago
      Anonymous

      An anon on aids ran 13b on two 3090s perfectly fine

    • 4 weeks ago
      Anonymous

      In theory these were designed for specifically that....

      except you'd be better off buying 10 P80s and a server rack for the same price

Your email address will not be published. Required fields are marked *