Is there a currently an accurate way to say how much power per prompt LLMs use?

SnausagesinaBlanket@lemmy.world · 2 days ago

Is there a currently an accurate way to say how much power per prompt LLMs use?

chicken@lemmy.dbzer0.com · 1 day ago

You can use a wattage meter between your outlet and computer. I’ve tried that, and the usage is around the same as a graphically intensive videogame while it is generating.

SnausagesinaBlanket@lemmy.world · 14 hours ago

How does a wattage meter on my computer measure power used by an LLM server someplace else per prompt?

chicken@lemmy.dbzer0.com · 14 hours ago

I have a LLM server on my computer, so I can tell how much electricity it is using this way. It’s not somewhere else is how

SnausagesinaBlanket@lemmy.world · 14 hours ago

Cool but that is not the question I asked is it then mate.

chicken@lemmy.dbzer0.com · 14 hours ago

You asked if there’s a way to tell how much power LLMs use, you didn’t specify LLMs on a server you don’t have physical access to.

SnausagesinaBlanket@lemmy.world · 13 hours ago

It says LLMs meaning in general so you can take it any way you like.

Michal@programming.dev · 1 day ago

There’s a huge difference between a model you can run locally and a chatgpt model.

Tangent5280@lemmy.world · 1 day ago

Sure, but without actually knowing what kind of hardware the servers are running, what kind of software too, and what their service backend looks like we can’t say whether it is going to be higher or lower.

Michal@programming.dev · 1 day ago

I think we can assume it’s nvidia H200 which peaks at 700W what what I saw on Google. Multiply that by the turnaround time from your prompt to full response and you have a ceiling value. There’s probably some queueing and other delays so in reality the time GPU spends on your query will be much less. If you use the API, it may include the timing information.