• fizzle@quokk.au
    link
    fedilink
    English
    arrow-up
    30
    ·
    2 days ago

    Most of the power consumption comes from training and optimising models. You only interact with the finished product, so power per query is very low compared to that required to develop the LLM.

    • spectrums_coherence@piefed.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 hours ago

      As far as I know it is still much more expensive compare to alternatives like grammar checker and web search. Especially give that model already searches the web on its own in many queries.

      Just because inference is more efficient than an training, which consumes energy on the scale of nation states, doesn’t mean inference itself is econonical.

    • lime!@feddit.nu
      link
      fedilink
      arrow-up
      12
      arrow-down
      1
      ·
      2 days ago

      while this is true in isolation, the amount of users means that inference now uses more power than training for the large actors.

      • Michal@programming.dev
        link
        fedilink
        arrow-up
        8
        ·
        1 day ago

        The question is about per-prompt, so number of users is not relevant. What may be more relevant is number of tokens in and out.

        If anything, number of users will decrease power use per prompt due to economy of scale.