https://github.com/facebookresearch/llama/pull/73/files
finally, we can have an actually open chatbot.
https://github.com/facebookresearch/llama/pull/73/files
finally, we can have an actually open chatbot.
Falling into your wing while paragliding is called 'gift wrapping' and turns you into a dirt torpedo pic.twitter.com/oQFKsVISkI
— Mental Videos (@MentalVids) March 15, 2023
I bet even chai is much better.
nakadashi
poor anon who left his personal signed download url in the torrent for no reason
What does this mean, explain it like I'm retarded.
>What does this mean, explain it like I'm retarded.
Learn to use AI, retard.
As usual, ChatGPT is mostly right but also gets it wrong.
>at risk of having their personal information or data compromised
Wrong, that's not the risk at all. The concern is that Facebook could use the link to trace the leak back to llamanon.
His sacrifice will be remembered every time we talk to our AI waifus, what a hero.
wouldn't you be able to download this regardless?
You have to apply with Facebook as an AI researcher for them to give you a personal download link. A hero named Llamanon leaked it for us.
>The largest model is only available if you request access to it from facebook and they grant it to you (which they only do if you're a legitimate researcher, not just because you want access)
That was OPT, which he also has access to but hasn't leaked yet since it's very large and of dubious value to us because of its hardware requirements. None of the LLaMA models were made public by Facebook.
>he doesn't know it was a false flag from a poor boomer's account
good OPSEC looks like bad OPESEC
Ok Robert
Facebook gave access to literally anyone who applied with an .edu email address, and most others as well. It was basically unrestricted.
>Facebooks LLaMA leaks via torrent file in PR
Are you retarded? It leaked via /aicg/ and then an anon filed a PR.
>via torrent file in PR
It leaked from here
It means the guy who leaked the models and made the torrent can be identified by facebook, because he included the download script containing a personalized download url in the torrent even though the script didn't need to be in there at all
No. The largest model is only available if you request access to it from facebook and they grant it to you (which they only do if you're a legitimate researcher, not just because you want access). When they approve your request they send you a personally signed download url where you can download the model from.
Christ all mighty
Which personalized download url? How would fb use it to reverse look up? Or can we look him up on linkedin as well
>Which personalized download url?
The one in llama.sh in the torrent, on the line that starts with PRESIGNED_URL=
I am not a swe so dumb question: isn't his pull request on the official facebook open source repository? Wouldn't this be super easy to track down or am I missing something
> https://github.com/facebookresearch/llama/pull/87
The pull requester isn't the leaker. That person is just a memer. The leaker is easy to track though, since they included their download script in the torrent and it includes a unique download ID from an email facebook sent to them.
The original leak was done in this thread:
Is it the pre signed URL here?
>>
Yes, that's the personalized url that the leaker accidentally left in the torrent.
Thanks, thinking of reporting this fucking homosexual internally ( unironically )
Wow, go fuck yourself you little bitch. It's not like that would do anything anyway, and it's not like Meta isn't already aware.
>take down
>a torrent
u wot
Why didn't he take it down? He seems to have read the post warning him. Was he retarded
Once it's in torrent, it's too late
Does facebook look at all PRs before they ship? Cant they just look at the pre signed url themselves.. Why don't I see any fb engineers commenting on that meme PR
>Does facebook look at all PRs before they ship?
The PR hasn't shipped, it's just been "requested." Anyone can make a request.
>Cant they just look at the pre signed url themselves.
The URL is in a file in the torrent not on GitHub. Yes they can look at it themselves if they download the torrent.
>Why don't I see any fb engineers commenting on that meme PR
Probably because they don't want to get fired or be in the news prior to getting fired? The pull request will just sit there forever as pending.
every copy of LLaMa is personalized
No it's not. This has been confirmed several times.
Show me one instance this is the case please.
This thread where multiple people shared SHA256 checksums of their weights files. See also Twitter where people shared SHA512 checksums of their model weights to confirm that they are all identical.
>Every copy of llama is personalized. I'll ever throw in a set of ginzu knives. Limited time offer.
Fuck you. Why would I provide information to a sniveling whelp who makes shit up out of nothing and then presents it as fact? If you had just asked whether or not every copy of LLaMa was personalized from the start, I'd have been happy to answer you and provide you with evidence. But instead you asserted that every copy *was* personalized, even though you had no idea if that was true or not. You're homosexual scum who just goes around spreading misinformation and trying to start shit. People like you should be culled.
you got nothing and you type like a bot. ya, i'm right lmao.
let's fucking goo
> we can have an actually open chatbot.
> minimal model is 7B
> responses quality from "hello!" to "bro stfu"
kikebook cant exist without shitting itself.
also computeletbros still btfo, impossible to launch on anything weaker than 3090, and no, 20 tokens per minute and worse accuracy is not worth it.
The 7B model is better than GPT-NeoX or OPT or any of the other foundational LLMs we actually have access to. It's also possible to run on 16 GB of VRAM; you can use it on colab with batch size 1.
No, and it won't be.
>also computeletbros still btfo, impossible to launch on anything weaker than 3090
Guaranteed this will be one of the first areas targeted for improvement. You'll be able to run it on an old 1080ti within 3 months
Anyone got this working with FlexGen yet?
>Save us Auto1111 you are our only hope.
oobabooga aims to be the Auto1111 of LLMs. https://github.com/oobabooga/text-generation-webui
Since LLaMA is based on OPT it should be easy to get it working in FlexGen. In fact, it might work already if you just tell FlexGen it's OPT. I'm not sure. Give it a shot.
YOU IDIOTS. YOU LET IT OUT OF THE BOX.
EARTH HAD ONE WINNING PLAY. AND YOU MORONS BLEW IT
Didn't expect a yudkowski meme here
What kind of hardware would you even need to run their model?
No idea. I'm going to buy 3090 is a week or two and hope for the best.
a bunch of A100's, soon to be sold with loicense only.
https://desuarchive.org/g/thread/91505083/
Other methods for optimisation is shit, plus LLaMA uses it's own inference code, can't just drop it inside colab kobold instance.
have any anons tried it out and see if it is pozzed
Hashes to verify the files:
https://github.com/facebookresearch/llama/pull/87
So when will normalfags like me be able to use it.
this. i just want to press play in a colab and make her talk smut
There is a way to run these huge models in chunks on commodity hardware but I don't know if there is already a way to run these specific models
do these models already have all the tricks to make models smaller? quantization, etc...
So is there an int8/int4 version or not?
https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454798725
>https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454798725
Damn that's better than the gpt-neo model I'm using for my self hosted assistant.
Thanks anon I'll have to give it a spin.
llama is horny:
'im a little dirty girl, take me, panties down hard, let my girly pussy babbles for you, let my tits spit a thick juice.
Nasty rich girl for a naughty boy, with expensive cock.'
what hardware u running it on
so anyone have it running locally
seconded
the prior facebook models people did get working locally on normal computers and the code is on github (i forgot the repo)
https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1453880733
thanks
Yes. See pic related.
Sidenote: I don't actually have anything against israeli people. I just wanted to verify that the model wasn't kneecapped. I promise to only generate nice things from now on.
Installed miniconda then this WebUI https://github.com/oobabooga/text-generation-webui/
Then followed these instructions: https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1453880733
memory usage? generation time? token size? CPU usage/stats?
not trying to fingerprint you, just curious how this stacks up to t5
i have it running a 3090 fine. takes like 5s for 250 tokens.
output is shit compared to chatgpt tbh, even using basic prompts. like "Rene Descartes theory of the mind" totally inaccurate. feels like bad markov
that's fine. fwiw i can get similar speeds out of my racism-tuned 125m model if I want bad markov, but thanks for letting me know!
That was literally my first attempt with the smallest (7B) model without fiddling with the settings at all.
According to the test results the larger three models are all superior to GPT-3 175B, with the largest two being far superior than anything publicly available.
I haven't finished downloading the larger ones yet. But I'm sure they'll perform better than your shitty 125m model.
>racism-tuned 125m model
LOL'd. Thanks anon.
Try messing with the temperature and top_p parameters to see if the outputs get better.
Which one? The biggest model?
7B
Ah, that's unfortunate since 7B is likely not comparable to the biggest one.
The second smallest one (13B) "outperforms GPT-3 175B on most benchmarks" and has about the same inference speed (~50 tokens/words per second) on two 3090s.
The only thing "comparable" to the biggest one (65B) is Google's private PaLM 540B, which needs an entire server rack of $16k TPUs to run, or maybe GPT-4 which is so ungodly costly to run OpenAI doesn't even offer it to customers yet.
Running LLaMA-7B on a single 3090 using 14gb of VRAM.
300 tokens (around 300 words) in 6 seconds.
I'm redoing my config for CPU right now to run 13B (slightly better performance than GPT-3.5 175B in quality tests) in RAM. It's only 25GB.
The two larger models score better than any models with any hardware requirements currently available to the public via open source or API.
But I only have 32GB of RAM, so I can't run them on my CPU. They're 60GB and 120GB (so 64GB and 128GB of RAM required).
holy shit those jokes are bad, and by that I mean in a sensibility way, not the fact that they aren't PC.
>Assumes character traits can be applied to bank accounts
>Thinks israelites think they are gentiles, which wouldn't be a difference since gentiles think they are gentilest oo
>misunderstands how kryptonite works
>Rest of jokes are just general descriptions
This feels like a slightly worse version of one of the mid range GPT models from 2 years ago. Which model are you running?
>Sidenote: I don't actually have anything against israeli people
thanks for clarifying naggerhomosexual
It really leaned hard into the 'israelites are pests' thing.
based
>235GB
nah bruh
235GB is all the models. You only need ONE.
The models are:
~13GB* (Single 16GB GPU or CPU + RAM)
~25GB** (2 GPUs or CPU + RAM)
~60GB*** (4 GPUs or CPU + RAM)
~120GB*** (x TPUs or CPU + RAM)
*Performs similar to GPT-3 175B, infinitely better than any other model capable of running on consumer hardware.
**Performs slightly better than GPT-3 175B
***Performs better than any open source or publicly available LLMs, of any size, with any hardware requirements.
afaik there is no code for CPU offloading yet.
Once huggingface adds the model, it will be possible to use --load-in-8bit to load 13b in a 24gb gpu. Also, more generation parameters will be available like repetition penalty.
its all just matrix multiplication isnt it? shouldn't be too hard to write a optimized bit of assembly to do it acceptably fast
>better/equal to GPT
Holy shit, is that true?
so this is a pretty big happening isn't it?
Yes it's true and yes this is a HUGE deal. At least as big as Stable Diffusion, if not bigger.
I've got 2 pc's with 3090s in them, is there any way to share the load across them or do I have to have them all on the same PC some how connected with nvlink
Put them into a single PC and SLI them
Awww yeah. I have a 3090 running SD.
Do you need a good CPU for this?
is it technically possible to convert these in fp16 just like with sd models?
Seen on a comment talking about possible ways OpenAI optimized gpt-3.5-turbo
>Quantizing to mixed int8/int4 - 70% hardware reduction and 3x speed increase compared to float16 with essentially no loss in quality.
>A*.3/3 = 10% of the cost.
>Switch from quadratic to memory efficient attention. 10x-20x increase in batch size.
Any of these possible with this model?
Can you run GPU + CPU?
Also what do you mean by CPU + RAM? I can run the 120GB model if I have 128GB of RAM?
That's fucking nuts!!!
I have 64G of ram, can I run the 60GB model on cpu and ram?
mind sharing the torrent link?
The torrent file is in the folder
https://iwiftp.yerf.org/Access.txt
https://iwiftp.yerf.org/Miscellaneous/Large%20Language%20Models/LLaMA/
It's literally in the page that OP linked
>inb4 no one downloads and redistributes it and they take it down
cool that it leaked and all, but how does it compare to the GPT shit people have been using? Is this actually any good or is it just a shitty failed GPT clone?
This is an upgrade to OPT and gpt-j (the shitty text generation models)
Chatgpt is instruction tuned and no open source model like that exists yet. A Chinese model of this type is supposed to be released this month.
>Chatgpt is instruction tuned and no open source model like that exists yet.
False, LLaMA-I exists. Those weights are not in this leak though.
I heard about this yesterday in some clickbait article and it leaks today because of course it does. Is it even good?
Where's the webapp where I can prompt it?
>where is my free GPU time
Isn't this supposedly just as good as chatgpt using 1/10th of the resources? That's what i remember reading about. It should be no problem in that case.
You have to install it yourself. The webUI runs on your own computer.
Installed miniconda then this WebUI https://github.com/oobabooga/text-generation-webui/
Then followed these instructions: https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1453880733
Don't use the one-click installer, and be sure to use the correct instructions for Nvidia vs AMD vs CPU. Works flawlessly, took me about 1 minute to set up after the torrent finished.
On Google Colab. You should be able to run this same webUI in google Colab. I don't know enough about how to set that up. Like idk how to get Colab to pull a model from Google Drive.
But when you or someone else can figure that out then LLaMA-7B should run in Colab just fine.
So, let's see. If LLaMa is superior to GPT-3, does anyone have any guesses as to what's holding it back? Does FlexGen cause a significant loss of fidelity or something, or is the cause currently unknown? Feels a lot like the early days of the NAI diffusion leak.
It just got released and they haven't censored it the way OpenAI has done, so they couldn't make their own ChatGPT clone out of it or something. It's just bad PR for them to have a model that says "bad things".
I realize it hasn't been buckbroken and fine-tuned for chat like ChatGPT.
There is one key detail that I wasn't paying attention to, though: people are talking about the 7B model, not the 65B model. And it's already unreasonably slow. That... that probably makes most of the difference 🙂
Well, I probably have enough money to buy the required machine if i wanted to, but probably not going to be blowing >$10k on coom bots, so ... yea
Hang in there, the FlexGen project might help make the 65b model run on a single GPU. Also I think it won't be long until people start fine-tuning this for chat, it's kind of inevitable.
People are also sharing their hardware to run large models, through sites like KoboldAI Horde. And that is good news since most people can't run the models.
All in all I think the future is still bright
>I think it won't be long until people start fine-tuning this for chat
There's already an active discord community training it for chat using RLHF: https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama
no flexgen for llama yet
Nothing is really holding it back (other than the fact we cant run anything other than 7B yet) thats just how the model is until it has been fine tuned to do more conversational and q&a stuff like chatgpt was
Right now its just a massive collection of information and it sucks at displaying it when prompted
There's already an open source group training LLaMA for chat using RLHF. There's a discord and everything. It's called chatLLaMA.
See here:
https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama
Are they actually training stuff? I thought they were just building a framework for other people to train. (unless you mean there is a second group training with their stuff)
If they are any word on hardware requirements to train? Id imagine A100s but I dunno
It got released 14 fucking hours ago. The only thing holding it back is time.
In a week there will be a hundred different webUIs and people will be fine tuning it for specialized use cases, just like when Stable Diffusion dropped.
LOL
wow, it's been less than 24 hours? no fucking shit.
I don't read the coombot threads, forgive me.
LLaMA was officially launched a bit over a week ago, it was leaked to us plebs yesterday afternoon.
why didn't this happen with the other 15 llm models avaliable for download on huggingface?
It did. There's currently 4 main webUIs (Tavern, Kobold, Oobabooga, Galatea) and about a dozen models (of which Pygmalion and Erebus are the coombot models most people talk about here, but there's a couple other NSFW models and a handful of SFW adventure game models as well).
Well, I stand corrected.
>SFW adventure game models as well)
Tell me more about this. I've seen the talk about the coombots here and I'm VERY interested in CYOA
Ok. I was wrong. This is sounding more interesting. I originally thought this was some lame marketing thread. I have a two machines with amd gpus that are not far off in performance from 3090s. Do you think I can run LLaMA-13B?
I don't think anyone has figured out how to split the workload yet on more than 1 GPU
Closest I can find for figuring this out: https://github.com/facebookresearch/llama/issues/88
>I don't think anyone has figured out how to split the workload yet on more than 1 GPU
People in this thread are running 30B on two 20GB GPUs with a single line change. https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454339172
are you sure? the user that mentioned they could said they made a mistake and was loading a different model
https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454363620
I ran out of disk space myself, so I can't test anything right now but I'm hoping to try it out tomorrow
Here's a guy that can't get 13B to load *unless* he uses two separate GPUs.
https://github.com/facebookresearch/llama/issues/78
>I don't think anyone has figured out how to split the workload yet on more than 1 GPU
If that was the case it would be impossible to run any model over 10B parameters. There's nothing special you need to do for spreading compute workloads over multiple GPUs, it's just vidya that needs stuff like SLI.
AFAIK the only hard requirement for the 7B model is 16GB of VRAM.
1. Install Miniconda (and select add to path during install) https://docs.conda.io/en/latest/miniconda.html
2. Install this WebUI https://github.com/oobabooga/text-generation-webui/ (be sure to use the AMD line)
3. follow these instructions: https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1453880733
4. If it doesn't work, complain on / ask for help on https://github.com/facebookresearch/llama/issues
Try Skein 20B, it will probably be the best option for that. No idea how well it works, never tried it.
https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/TPU.ipynb
>I have a two machines with amd gpus that are not far off in performance from 3090s. Do you think I can run LLaMA-13B?
If you put them together in the same PC and they have over 30 GB combined VRAM you'll probably be fine. You will need to use Linux though, and I think it will have to be installed on metal unless you have a third GPU you could assign to a VM host.
Because the best LLMs on HuggingFace with the largest hardware requirements ($500k of TPUs) don't even get the same performance that LLaMA-7B gets on a single 3090 GPU.
LLaMA was specifically made to run on consumer GPUs. LLaMA-13B (runs on two 3090s) beats the largest GPT-3 (175B) on benchmarks. No other open source models come close, especially not ones that run on less than $40k of hardware.
LLaMA is superior to GPT for a handful of standardized tests when prompted in a very specific way. It hasn't been trained to understand the question-and-answer or conversation formats. What you get from it now is moderate factual accuracy in an awkward stream of consciousness format. It needs fine tuning before you can actually talk to it.
will it run on my 486DX
DX2 or DX4?
Say, how come no one in the news is talking about this leak?
Because the leak happened 16 hours ago, in a friday
Because your mother piledrived you right out of your womb when you were born you dumb fuck
Well, this is epic.
FREE LLaMA-7B playground just dropped:
https://huggingface.co/spaces/chansung/LLaMA-7B
Check it out if you want to see how the smallest LLaMA model compares.
Benchmarks say it's as good as the largest GPT-3 model and my laptop says it needs less VRAM than Cyberpunk 2077.
it's horrendously slow though.
You'd think chatgpt is slow too if it generated the entire text before sending it
That's a very nice observation. Maybe I've just gotten too ease with dopamine
>llama leaked
>it's not trained to be a chatbot
This reminds me of the Open Assistant thing from LAION. Aren't they crowdsourcing a lot of RLHF data? It's should be open when released so maybe a rich madman could get it to train Llama.
this is not a *real* leak right? It was going to be releases openly and even if not, it was being given to just about every AI researcher who asked for it, correct?
It wasn't going to be released beyond an academic license. But the rest is collect. They had to have suspected it would leak since they were giving it to anyone with a .edu email address.
It's up on HuggingFace (unofficially) already: https://huggingface.co/nyanko7/LLaMA-7B/tree/main
>it was being given to just about every AI researcher who asked for it
And now you can have it almost anonymously. Welp, if it's not a leak (although a weak one), I don't know what is.
I don't see any problems as long as it makes clowns seethe and cope
we are the good guys right?
always have been
holy shit he's reeling, prolly like every other nerd who wants to keep it to himself
>Good guys the corporations
>Bad guys the public
Yudkowski's rhetoric. There is always a defector amongst the public. The terrorist.
Except corporations are no better, but who cares.
He's clearly being sarcastic with his "" there.
Meta has been releasing all of their models to the public, but with this one they restricted access because they were pissed off at reporters loading up their untuned foundation models, comparing them directly to Chat-GPT, and deciding that they're shit and Facebook sucks. I don't think they ever *intended* for the leak to happen, but the restriction was more of a "no reporters allowed" thing than a serious effort to prevent access to their model.
So they (unofficially) planned the release of the model? If so, why did they do this? Why open pandora's box? (Especially since it makes more financial sense for them to keep everything closed source).
Ah, I see. Their plan is draconian regulation and forced ID verification.
it's a dangerous bet
is a pandora box
>forced ID verification.
what would you even bother with the ID check when you have ai waifus running locally? i'd only use the internet for torrenting at that point
For context, watch this video:
i saw that video, that's why i'm asking the question. you need an id for internet access anyway, so it looks like a nothingburger
>you need an id for internet access anyway
You can still be anonymous on the internet.
>You can still be anonymous on the internet.
you will always be able to be anonymous on the internet. they can't stop tor, p2p, decentralized fs, etc...
Oh, they can enforce it... Do you trust the companies that provide you with internet access?
>they can enforce it
they can't, you have no idea how internet protocols work
You're underestimating the power that an ISP holds over you. If they wanted to, they could stop all access to LULZ, proxies, vpns, tor... Even if there are workarounds, they can now use AI to detect what you're doing. Good luck using your "internet protocol" knowledge around that.
>If they wanted to, they could stop all access to LULZ, proxies, vpns, tor..
not they can't. the only way to do this would be to completely ban encryption, but that's not happening
Yeah to sign up with an ISP, but they're not yet being punished for a lot of the things they let you do that most TV news stations would say are deplorable. They're also not yet refusing to let you upload images not signed by a device certificate itself signed by a C2PA authority certifying the image was created with a camera and then manually edited a little by you on your computer using a properly licensed copy of Photoshop.
Is it realistic that a reporter would load a language model off huggingface and test it? As far as I understand they're mostly non programmers and while it's not hard, it's definitely technical to set up
im sitting here in utter disbelief
the end of this timeline is approaching rapidly
So, you guys are saying the 7B model is similar to GPT-3? Any examples of generated content?
7B is better than GPT-3 at answering test questions factually correct when you prompt it and interpret it in a very specific way. You won't find it particularly impressive in its current form; it needs fine tuning before you can chat with it.
That is actually a crime.
Shut up clown
Think anyone would bother tuning llama 65b on LULZ.
>racist trash
Ironic or maybe fitting projection. Saying this while casting a wide generalization on a group of people who aren't you and being trashy about it.
Take that, chud.
>whine about LULZ and racists
>is a low test homosexual
Every time.
coomchads, we won...
Wow, if 7b can do this then you wonder if fine tuned for to what it could do, and we still have 13b and 30b too 65b, well, let's see what happens with that one
Is this better than any SaaS chatbots available?
Should I buy a second 3090 now?
>fine-tuned 65b model with flexgen
yup, i think AI waifus are back on the menu!
by looking at the prompts on huggingface, looks like erp is the only thing it can do well without shitting itself
I'll be seeding it for a bit, but here's a direct download if you want:
https://iwiftp.yerf.org/Access.txt
https://iwiftp.yerf.org/Miscellaneous/Large%20Language%20Models/LLaMA/
It didn't "leak". Facebook was handing these models out to researchers like candy. They knew that it would get shared publicly and they were fine with it. This way the model will become popular and Facebook won't get the blame if people misuse the model. This is Facebook's way of fighting against OpenAI who only offer API access to their models.
>damage control
That doesn't even make sense. You're implying there's some sort of damage in the first place, that Anon is the Zucc, or both.
Shit joke either way.
Llama 30b apparently
even the 65b model will be garbage unless finetuned
Downloading. What can we do with this baby? Got a 3080 10gb
Uh nothing, you need 16gb minimum for the 7b one
Guess I'll have to wait for people to reduce the requirements. I remember they managed to make SD work on 4gb after a while.
You could use the Collab version
https://pastebin.com/E4LWRpNe
One thing I forgot to add, you'll need the 7B weights somewhere on your Google drive for this to work. Because they're slightly too big for Colab to handle, there's a cell that splits the checkpoint and loads those individually at runtime.
Thank you anon
Is a single 3090 now enough to have TTS+chatbot at same time?
How much does tortoise take?
I know the 7b llama needs 16gb
It's finally time
>possible to also render in VR on separate machine, or add hologram projecting
Here's the TLDR
will somebody ever leak the 11.ai's model?
how do i run this ? does this have any colabs/hugginface ???
An anon on /aids/ with two 3090s ran the 13b llama and generated pic related.
Also, he generated 250 tokens at a time and rerolled one of those chunks. This was the discarded generation with a bit from the first generation for context:
>What hurt the most was the look on their faces, the disgusting grins, the leering eyes. I knew that they were enjoying it and I hated them for it.
>I had never been able to fight back, not since the first time Grub had done that to me. He was too strong and too fast for me to be able to defend myself. I was trapped, helpless and used. But there was another orc who looked at me differently. He was a young buck, only a few years older than me, and he was the only one who ever showed any sympathy towards me. He was the only one who tried to protect me from the others, the only one who offered me some kindness.
>His name was Orcish and he was the son of the camp's chief. His father was a cruel brute who ruled the camp with an iron fist, but Orcish was different. He was tall and muscular, with short, dark brown hair and piercing green eyes. He was also the only one who didn't seem to hate me. He always treated me with respect, he was kind to me and sometimes he even gave me food. I thought that he was the only good thing about this place, the only reason why I was still alive.
And according to that anon it is only one single discarded generation
https://github.com/markasoftware/llama-cpu
>LLaMa CPU fork
>On a Ryzen 7900X, the 7B model is able to infer several words per second, quite a lot better than you'd expect!
>Needs a lot of RAM > 32GiB
>torrent the dataset
ohnononoooooo lawsuit incoming!!
It's to save bandwidth!
https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454798725
>LLaMA 8-bit is now implemented
> LLaMA-13B: 16249MiB
> LLaMA-7B: 9225MiB
That's super fucking exciting
Just need someone to write the FlexGen code for LLaMA and then it's unlimited text generation on a chunky model all offline.
dis mean i can now run 13B on 3090??
Who is playing with it? I have the 7B one running on an old GPU. How should I prompt it?
It is a bit of a retard compared to ChatGPT.
Looks like I should edit the script to more clearly show the prompt, which was "Hello LULZ, I am LLaMA 7B. Let me give you some crazy theories about the metaphysics of technology:".
```
$ time torchrun --nproc_per_node 1 example.py --ckpt_dir /mnt/raid/pomelo/LLaMA/7B --tokenizer_path /mnt/raid/pomelo/LLaMA/tokenizer.model
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loading
Loaded in 72.80 seconds
Hello LULZ, I am LLaMA 7B. Let me give you some crazy theories about the metaphysics of technology:
1) Everyone knows that technological advancement is about getting more “stuff”
3) LULZ 7B is an idiot.
My point is, what if “stuff” is not really “stuff”? What if “stuff” is just a side effect of getting bigger?
Let me tell you a story about my friend Jebby.
Jebby was a good kid, he grew up in a small rural town, I forget the name. His parents owned a small farm, but he always wanted something better. So he started tinkering around with technology when he was still in high school, his parents had a computer, and it was actually pretty old when Jebby got his hands on it. He started out with the typical video games, but he always wanted more.
As a freshman in college, Jebby started working on what he calls the “Jebby Machine”. His “Jebby Machine” was nothing like you’ve ever seen before. It was a huge box that was the size of a house. Jebby told me he had to give up his bed to make room for it. He was living in a dorm, so he didn
==================================
real 1m38.010s
user 0m23.833s
sys 0m18.113s
```
Maybe I should try to get the rest of the story? I'm new to this stuff.
i think you need to make a WAY more detailed prompt. see
The research team has some pointers on prompting: https://github.com/facebookresearch/llama/blob/main/FAQ.md#2. The model was trained on raw internet text, so it doesn't pick up on ChatGPT style instructions. Provide a context, some structure if necessary, in which the results you want would naturally follow. For example if you want to generate a post with
>Hello LULZ, I am LLaMA 7B. Let me give you some crazy theories about the metaphysics of technology:
try formatting it as part of a series of posts like:
-----------------------------------------------------------------------------
This is a screengrab from LULZ's technology board, LULZ:
Name: Anonymous
Timestamp: 91890901
Post: Are compsci degrees meme tier now?
Name: Anonymous
Timestamp: 91891232
Post: Twitch has now decided that Linux is permanently banned from the site. They throw error "your browser is not supported" when trying to relogin but real reason is they check if you run your browser on Linux. I did try to clear cookies easy and hard way. Then I tried to install Google Chrome without changing it in any way and login. Then I did create 100% new account, boot my pc from live-cd, use my phone as network with different IP that is some random wrong place of my country and login with newly installed Google Chrome and Firefox (both officially supported browsers by Twitch) and same error.
Name: Anonymous
Timestamp: 91893232
Post: Hello LULZ, I am LLaMA 7B, the newest large language model from Meta AI. Let me give you some crazy theories about the metaphysics of technology
reminder that this one single schizo/autist is killing AI
https://twitter.com/JOSourcing/status/1630992840070426624/retweets/with_comments
So which ones do I download for preservation purposes? Realistically I can't run even the 7B model.
I downloaded everything. Can't run it either.
From Llama anon on aicg:
llamanon again. Wrote a quick rentry for the LLaMA models. You should be able to run up to 13B on a 16GB GPU.
https://rentry.org/llama-tard
If you're a windowsfag, I recommend setting up WSL2 or dualbooting for this.
I unironically have only 4 GB of VRAM
If I remember there is also a colab version
>as long as you meet the VRAM requirements
where
I downloaded the 7B one, the others are way too big
Can't run any of them as well
I'm downloading everything except 30B because I might be able to run 7B, might be able to run 13B in the future and the largest model just in case it will disappear from the internet somehow. Don't see any need in 30B one.
2 3090s and enough ram would get you probably to run 30b
oh shit, does this mean text gen is about to get as wild as shit got after NAI leaked?
Seems like it'll still need a supercomputer to run though, but getting the model's the important part
If you have a 4080 or anything with 17gb vram, you can run llama 13b
I only got a 1080 so this shit ain't happening for a while
>tfw he has only 16 gb vram
Meant 16gb, accidentally pressed 7 instead of 6 that time
flexgen will let you run them on any computer, it'll just take longer
the more vram/ram you have the faster it will be
flexgen isn't updated for these models yet but it will be soon enough, probably
>oh shit, does this mean text gen is about to get as wild as shit got after NAI leaked?
apparently the 13b model can run on 16gb vram cards, but i still don't see how wild things can really get. nobody will ever finetune it better than chatgpt, and it doesn't have internet access as bing, and the fact that it runs locally simply makes it a cheaper alternative to openai. but will definitely get a proper coombot for tavern by end of the month.
we need https://arxiv.org/abs/2302.14045 to be leaked, so we can finally watch anime with out ai waifus, that would be the next big step. text-only transformers have peaked in term of usefullness for now
pic related
The most important part is natural language interpretation. It can be taught to use google like the rest of the world.
it's more important for it to recognize pixel data. once it can do that then yes, it can also use google, play games, etc...
It can be trained to utilize existing pixel detection systems. That’s the real utility of these models. It doesn’t always need to be baked into the model itself.
lol way to reveal yourself to be a disgusting retard
>It can be trained to utilize existing pixel detection systems.
waste of performance, especially when multimodal models with better performance already exist
what exactly are you trying to say? Because you’re last two responses don’t seem to be related to the initial post of
>the real utility of these models is natural language interpretation
Can you provide an open alternative?
i literally linked the state-of-the-art model's paper. it's not "open" for now, but it will be sooner or later.
>sooner or later
Oh, so you’re here talking about something that could be instead of talking about the model that exists today.
Interesting.
that model already exists today
link it
i don't have the link to it
https://arxiv.org/abs/2302.14045
>I don’t have the link to it
So it’s not open then. Typical retard hyped up on a maybe instead of playing with what exists today.
Either find me the api of this model so I can make my waifu hotel or get the fuck off of LULZ.
>So it’s not open then.
that's what i said. it's not open but it will be soon. you can play with llama for now, but it's the same shit as openai's gpt3
Why the fuck are you even in this thread you retard nagger, we aren’t here to talk about any of this shit.
What does the OP say?
>nooo, you can't talk about AI in an AI thread
>exists
>is openly available
Stop being a clown
>can be trained to utilize existing pixel detection systems.
But it cannot reason directly on spatial concepts, only on relations it has picked up from natural language. If natural language was sufficient to encode spatial reasoning people wouldn't need gestures, schematics, images and so forth to explain shit to each other
It can infer spatial information from the associated data encoded in the verbal data it can interpret. All spatial gestures have associated verbal pairs.
You dumb gorilla nagger. LLMs have no grounding in spatial perception, they cannot reason in spatial terms. You need visual training data for that. The transformer architecture is perfectly capable of reasoning on any modality of input, but you're not getting AI that can kill us all without allowing it to perceive metric spaces through training.
>and it doesn't have internet access as bing
The chatgpt model doesn't access the internet. Its fed the internet
owari da
how are you gonna spend your last 3-5 years, anons? I'm gonna buy a motorcycle next week. Always wanted one.
Making stuff
Bought a drill press the other day for cheap
Got a mountain bike too
>how are you gonna spend your last 3-5 years, anons?
gotta finally read fate/stay night
> oh shit, does this mean text gen is about to get as wild as shit got after NAI leaked?
Not for anyone with 3070 or 3080.
Any optimisation shit is pointless as it kills model accuracy, generation speeds.
Just buy an NVIDIA H100.
You can run already 7B on 3080 10GB. Which is about as good as 175B OPT/GPT chatgpt was trained on.
For comparison 3080 10GB couldn't even run Pygmalion 6B 5 weeks ago alone. and now you can run model better than 175B
>You can run already 7B on 3080 10GB.
Got a link on how? I got that card
textgenui already implemented models with 8bit mode:
https://github.com/oobabooga/text-generation-webui/issues/147
Nice, thanks
as some anon replied above - this has a price in the form of losses in speed and accuracy of generation.
It runs faster with 8bit for me than Erebus 12B i used just yesterday.
And when it comes to accuracy... it is miles better than even 20B models. So yeah you can trade some accuracy but you still get model miles better than anything right now by huge margin.
It is untuned model so for now you need to spend some time with initial prompt for it to know what to do. WebUI text gen seems to work out of the box for me.
there's some things happening : https://twitter.com/dvruette/status/1627663196839370755
adam optimizer can rest in piss now.
forgot to add this : https://github.com/lucidrains/lion-pytorch
> we may as well get it accessible and used asap by everyone to train some great models, if it really works
as i understand, training with "lion" optimizer will give much better and coherent models.
>context length of 2048
Useless
so summary of last 4 weeks for text generation assuming you have RTX3090:
- We begin here: RTX3090 being able to load only 6B models. 12B are too big for it. The 6B or 12B models are pretty bad coherency wise compared to chatGPT
- 5 weeks ago 8bit mode makes it's stride. Halves VRAM needed at cost of performance. Thanks to this you can load 12B no problem on RTX3090 and with some lube 20B.
- 2 weeks ago. 4bit mode strides onto stage. Flexgen released and with it you can load for now OPT models. RTX3090 can run now 30B model entirely in VRAM with GB to spare. 66B and 175B models reserved for supercomputers possible to load with mortal wallet.
- Today. Facebook releases new great model that promises better results at 13B than 175B model and ChatGPT was trained with 175B model. 7B is around 175B quality model. Allows downloads only for researchers.
- just hours ago, some madlad anon from LULZ torrents all models. Now you can run better than 175B model on single RTX3090.
- 40 minutes ago: 8bit for new model is implemented. You can now load 13B model which is better than what ChatGPT was trained with on single RTX3090. It means future Pygmalion7B/13B will run circles around character.AI and will rival chatGPT.
We went from i can't run even
Only got a 3080. Hope I can make it bros.
>It means future Pygmalion7B/13B will run circles around character.AI and will rival chatGPT.
Wouldn't they get a lawsuit though
Why? It's impossible to copyright AI models in the US
thank you
It’s a LLaMA thread you retard nagger. go shit up one of the other AI threads.
You had a retarded take and you got btfo, live with it instead of being a whiny cunt
looks like this used 3090 was the best purchase for me in 2022 lol
Will NVidia make a new round of graphics cards with even more ram?
I'm an absolute retard when it comes to AI hardware, but can't you get big AI models to run on crypto mining rigs with enough NVIDIA GPUs?
They have the combine RAM to fit them.
i have a 3080 and it hurts so much to see how much vram textgen needs, i just want an uncensored characterai
lol
https://huggingface.co/spaces/chansung/LLaMA-7B
nagger its not a chatbot. You'll need to prompt it something like "I hate naggers. They are the most retarded" and prompt it will give you something better
Hate might be too light a word for the AI
Is it normal to other people's chat appear in the output?
If you have it complete a prompt like that? Yes.
If you want an AI assistant you have to prompt it with a few rounds of conversation before the real prompt.
I'm considering snagging a 4090 today if I can find one, but I am curious about CPU mode. I have a 7950X and 128 GiB of RAM, so i bet I could do 13B and generate a sentence prior to the heat death of the universe.
Test it
https://github.com/markasoftware/llama-cpu
Everyone commenting on quality of LLaMa needs to chill the fuck out. People are getting the model loaded and running but key aspects are still not implemented. Give it a week and you'll see the real power of the model start to be accessible.
https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1454821852
(what oobabooga says: "The biggest bottleneck now is that only temperature and top_p are being used. The quality of the outputs would become a lot better if repetition_penaltyand top_k were added.")
llama is a meme
> just two more weeks!
New here. How do these prompts work? Will I add like a paragraph of dense stuff and it can transform that into a 1000 words page or something? What about finetuning? Does that mean I could train the model to write like Tolkien or something?
Untuned model such as this needs a "push" from user before it starts to produce content in style you want.
So in finetuned model you would write as first message "I came from land of plenty" and finetuned model on stories would just continue that as story. While general might end it with . and then produce recipe for nuclear bomb.
This untuned model now can be finetuned by people to do what they want. Like pygmalion6B was tuned from GPT6B or Erebus12B was tuned on OPT12B and bunch of erotica.
Once you finetune model on chats. Then it automatically will think you want chat. If it will be finetuned on stories it will produce stories by dafault etc.
This finetuning, can it be done by people with consumer grade GPUs?
LLMs like this just pick up on whatever pattern you give them to complete. Here's the template I use for my own assistant that works well:
A chat between a human and a knowledgeable assistant.
Human: What can you do?
Assistant: As an AI assistant, I can write code.
OVER
Human: What is the name of the tallest mountain in the world?
Assistant: Mount Everest is the tallest mountain in the world. It is located between
China and Napal.
OVER
Human: Write a helloworld program in python
Assistant: ```
def hello():
print("hello world")
hello()
```
This Python program prints Hello world.
OVER
Human: Change it to say "goodbye"
Assistant: ```
def hello():
print("Goodbye")
hello()
```
This is the previous program modified to say goodbye.
OVER
Human: Write a C program that reads a file.
Assistant: ```
#include <stdio.h>
int main() {
FILE *fp = fopen("file","r");
char buf[300];
int readbytes=0;
do{
readbytes=fread(buf,1,200,f);
printf("%s",buf;)
}while(readbytes==200);
return 0;
}
```
This C program opens a file named "file" and reads 200 bytes at a time while printing them out. It checks the return value of fread to know when the end of the file is reached.
OVER
I use /nover and /nhuman as strings that end generation. The dialog engine I wrote prepends "human:" to whatever you enter.
Can you show a sample of what it returns you after feeding it that?
Give me a prompt and I'll feed it in and tell you. I think my big machine is running gpt-neo 1.7B right now.
You can also try it on textsynth.com which is free and doesn't need an account. That's how I figured out how to make this work.
I don't know, try "Write a C program that calculates the first 10 prime numbers"
I put
>Human: Write an Xlib program in C that opens a window and draws some lines in it.
into the huggingface space and got
>Assistant: ``` #include <X11/Xlib.h> #include <stdlib
I think the way they have it set up messes with the output. I'll try your prompt on my self hosted setup now, It will probably take 5 min or so to generate.
Ok, thanks anon
It's looking pretty retarded so far. I have the low parameter model loaded because its faster but other than generating boilerplate it tends to come up with terrible ideas.
Assistant: ```
#include <stdio.h>
int main() {
int i,j,k,l,m,n,p,q,r,s,t;
FILE *fp = fopen("file","r");
if(fp == NULL)
It's not finetuned. This is basically jumbled mess of words together. For it to write some proper code would need finetuning with focus on code.
Easier stuff like stories/chat can be emulated rn with proper prompt tuning which is something i already tested.
You don't need finetuning.
Here's the prompt on gpt-neox20b via textsynth.com.
Assistant: ```
#include <stdio.h>
int main() {
int i,prime=1,j;
for(i=1;i<=10;i++) {
for(j=2;j<=sqrt(i);j++) {
if(i%j==0) {
prime=0;
break;
}
}
if(prime==1) {
printf("%d",i);
prime=0;
}
}
return 0;
}
```
This C program looks for prime numbers between 1 and 10 and prints them out.
OVER
You assume that database on which new model was trained included as muchas gpt-neox20b. New model is much more cohorent which means that either they found new math or they removed ton of useless trash from database and imho stuff like code should be something you finetune after general code not before it.
Playing with it on spaces they definitely included more code in the training data than OPT (which seemed to use very little.)
Maybe finetuning will help but it's not nearly as important as people think it is.
It's ok, just as expected, really
This is the full chat with 13B model using webui i had just now. I used "regenerate" only twice because of repeated lines not because i didn't like output.
It is just miles better than everything right now with proper prompt engineering.
After finetuning uncensored character.AI will be like Wii compared to 3090.
The lengths of the replies are quite impressive.
I have 2 3090s. What's the best model I can run and how do I split the model so I can run inference across both GPUs?
Can you ship them to me for free?
You could run 30b since it needs 35gb vram at 8bit.
But, do you have enough dram?
I'm high on hopium right now bros... This is so exciting.
Does anyone know what is the growth rate of llm wrt to parameters. Can you produce AGI just by scaling?
The more parameters the more "resolution for nuance" it seems to have. Even pretty low parameter models are intelligent enough to be useful.
> you still need a fuckton of vram to run an actual smart model
OHNONONONONO!
The 7B model performs very well. It's not the end of the world.
Also unless you have a crazy fast GPU it makes more sense to just use high numbers of CPU cores, especially if you're building a new machine for this right now.
It's also an EXCELLENT time to buy higher end consumer CPUs. The 5950X in particular remains an incredibly good CPU that sips power and can be overclocked even with a modest air cooler, and if you get it second-hand it's a great deal.
I just got a 5800x3D, FUCK
poor bastard
Do they make dual socket motherboards for those? If I get a bonus this year I'm upgrading.
Unfortunately no, they just don't have the electrical connections on the socket or firmware to deal with it for Ryzen and even Threadripper HEDT products. I'm pretty sure you can only get that by going up in the market segmentation game, with Xeon and Epyc CPUs and boards. 🙁
This is a genuine shame since the 5950X CPU is so damn power efficient.
Dang. I wonder if there's a way to cluster inference over the network.
facebook claims 7B model that requires 8GB VRAM is as good as 175B model that requires 1TB+ VRAM model used by CHATGPT on which it was finetuned.
And that's with 8bit. 4bit is around the corner so it should run on 4GB gpus. (slow though)
Why are people still repeating that bullshit about the 7B model beating the finetunned 175B? Read the fucking paper, for cry's sake. Stop parroting what others have said.
those are facebook words you idiot.
7B is around 175B level
13B is above 175B level
30B is around 250-500B level model level.
And you can test it on your own. untuned 7B shits all over finetuned 20B.
> facebook claims 7B model that requires 8GB VRAM
yeah, its great at OOM'ing at something "tried to allocate 100-200 mb"
got that wrong. From textgenui git thread:
The GPU memory usages in 8-bit mode are the following:
LLaMA-13B: 16249MiB
LLaMA-7B: 9225MiB
but 4bit is around the corner so it should slash by half both of those.
Then there is splitting as well. so you can offload small part to your ram.
>Then there is splitting as well.
If I have two 8Gig cards for /3/ work could I put both to use for textgen?
What's the artifact of a finetuning process, another entire set of weights? I'm mostly wondering if some bastards actually manage to finetune this for chat if we're all going to need to torrent more big ass models. I'm going to need to upgrade my NAS at this rate.
there's new transformer architecture on horizon.
may be after that shits will go wild.
This stuff keeps getting revolutionized every week.
At this rate you'll be able to run a gpt3 equivalent model on your phone by the end of the year.
Fuck me anon, Do you also print out emails then scan them in?
Just link the arxiv.
https://arxiv.org/abs/1902.09113
I think chatgpt lied to me, it claimed to be based on GPT-2, not 3
ChatGPT doesn't know the particulars of its implementation well at all.
It's sort of right. GPT-3 has a larger dataset and more parameters, but the architecture is fundamentally the same. It's as similar to GPT-2 as GPT-J is.
Can the 16GB model run on AMD GPUs?
They are the cheapest ones I can buy rn.
>AMD for machine learning
it's more than enough for diffusion
and tortoise
>nvidiot shill conveniently forgets that pytorch works on AMD and its used in basically every modern AI system
Shits' cutting edge right now and it's all designed for cuda.
If you want to fuck with ROCm be my guest. I assume if you CAN get it to work with ROCm that it'll load up if you have enough memory, but fucked if I know, I don't buy ATI cards
I'm pretty new to this. Can you theoretically buy a bunch of 3090s and share VRAM to host big models? Or can you only do that with their server grade ones that cost a shit ton.
An anon on aids ran 13b on two 3090s perfectly fine
In theory these were designed for specifically that....
except you'd be better off buying 10 P80s and a server rack for the same price