jaykrown@lemmy.world to Technology@lemmy.worldEnglish · 1 day agoDeepSeek Permanently Reduces The Price Of Its Flagship V4 Model By 75 Percenttech.yahoo.comexternal-linkmessage-square144linkfedilinkarrow-up1503arrow-down121
arrow-up1482arrow-down1external-linkDeepSeek Permanently Reduces The Price Of Its Flagship V4 Model By 75 Percenttech.yahoo.comjaykrown@lemmy.world to Technology@lemmy.worldEnglish · 1 day agomessage-square144linkfedilink
minus-squareTja@programming.devlinkfedilinkEnglisharrow-up2·1 day agoHow are they running it? Doesn’t the model have to fit in (V)RAM? Does Nvidia have such huge memories in the H cards?
minus-squareBlackLaZoR@lemmy.worldlinkfedilinkEnglisharrow-up4·23 hours agoThere’s tech for splitting model to run on multiple cards, but it requires really fast interconnect between GPUs.
minus-squareTaasz/Woof@lemmy.blahaj.zonelinkfedilinkEnglisharrow-up2·23 hours agoLots of GPUs together.
minus-squareboonhet@sopuli.xyzlinkfedilinkEnglisharrow-up1·23 hours agoFor self hosting it essentially needs to fit in VRAM + RAM but it’ll take a lot of CPU for the part in RAM Deepseek probably uses those big fancy H cards and not one but several together to increase VRAM.
FYI the flash model is ~158 GB
How are they running it? Doesn’t the model have to fit in (V)RAM? Does Nvidia have such huge memories in the H cards?
There’s tech for splitting model to run on multiple cards, but it requires really fast interconnect between GPUs.
Lots of GPUs together.
For self hosting it essentially needs to fit in VRAM + RAM but it’ll take a lot of CPU for the part in RAM
Deepseek probably uses those big fancy H cards and not one but several together to increase VRAM.
The destiled models?