8 GPU Server Setup for AI/ML/DL: Supermicro SuperServer 4028GR-TRT

Подписаться 4,4 тыс.

Просмотров 11 тыс.

50% 1

In Part 2 of our series, we're diving into the nuts and bolts of setting up the Supermicro SuperServer SYS-4028GR-TRT for optimal AI/ML/DL performance. Discover the step-by-step process to configure this powerhouse server, from installing the GPUs to optimizing software and hardware settings. Learn how to harness the full potential of up to 8 dual-slot GPUs for unparalleled computational power in your AI projects. We'll also cover best practices for maintaining and monitoring your server to ensure peak performance. Whether you're a seasoned professional or new to the world of AI, this guide will equip you with the knowledge to build and manage your own high-performance AI/ML/DL rig. Subscribe for more insights and tips on leveraging cutting-edge technology in your AI endeavors.
Specific Topics Covered:
Installing hardware (CPU, RAM, GPU, drives, etc.)
Racking the server
Optimizing BIOS configurations
Setting up RAID Arrays
Installing and setting up the OS (Ubuntu 22.04)
GUI setup with KDE plasma
Setting up relevant software for AI/ML/DL (Including NVIDIA drivers)
Configuring IPMI for remote management
#SupermicroServerSetup #AIServerConfiguration #MLServerSetup #DlServerConfiguration #SuperServerSYS4028GRTRT #AIMLServerGuide #HighPerformanceAIServer #GPUServerSetup #SupermicroGuide #AIHardwareSetup #MachineLearningServer #DeepLearningServer #BIOSOptimization #ServerSetupForAI #AIServerOptimization #SupermicroConfiguration #AIInfrastructure #MLDLHardware #ServerInstallation
🎥 Other Videos in the Series:
Part 1 | Introduction | Your New 8 GPU AI Daily Driver Rig: Supermicro SuperServer 4028GR-TRT | • Your New 8 GPU AI Dail...
Part 3 | External GPU Setup | Setting Up External Server GPUs for AI/ML/DL - RTX 3090 | • Setting Up External Se...
📚 Additional Resources:
Relevant Parts To Increase Number of Drives
SDD Caddy Mount Screws - a.co/d/3D3rMnD
Assorted SDD Caddy Mount Screws - a.co/d/80JB8fU
Supermicro 8 Ports 6Gb/s PCI-E RAID Controller: AOC-S2208L-H8IR
www.ebay.com/i...
Cross-Over MiniSAS HD to 4 SATA: CBL-SAST-0591
www.ebay.com/i...
ISO Files For Flask Drive Toolkit
Clonezilla - clonezilla.org...
GParted - gparted.org/do...
Ubuntu 22.04 Server - ubuntu.com/dow...
Software Setup Commands
drive.google.c...
IPMI/BMC Password Docs
www.supermicro...
Link to Cost Breakdown Spreadsheet
docs.google.co...
Supermicro SuperServer 4028GR-TRT Specs
www.supermicro...
AI/ML/DL GPU Buying Guide 2023: Get the Most AI Power for Your Budget
• AI/ML/DL GPU Buying Gu...
HOW TO GET IN TOUCH WITH ME
For the most up-to-date contact details, please visit my RU-vid bio. Open to any and all inquiries, collaborations, questions, or feel free just to say hello! 👋 Thanks for your interest!
HOW TO SUPPORT MY CHANNEL
If you found this content useful, please consider buying me a coffee at the link below. This goes a long way in helping me through grad school and allows me to continue making the best content possible.
Buy Me a Coffee
www.buymeacoff...
As a cryptocurrency enthusiast, I warmly welcome donations in crypto. If you're inclined to support my work this way, please feel free to use the following addresses:
Bitcoin (BTC) Address: bc1q3hh904l4uttmge6p58kjhrw4v9clnc6ec0jns7
Ethereum (ETH) Address: 0x733471ED0A46a317A10bf5ea71b399151A4bd6BE
Should you prefer to donate in a cryptocurrency other than Bitcoin or Ethereum, please don't hesitate to reach out, and I'll provide you with the appropriate wallet address.
Thanks for your support!

Опубликовано:

20 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 98

@jaredisaacs7626 6 месяцев назад

I have yet to find better content on hardware for ai/llm!!! I love how to the point yet detailed the information is. You don't feel like your being sold something. Instead your being empowered through knowledge to better make decisions for ones self!!! Keep it up man!!!!

@TheDataDaddi 6 месяцев назад

Hi there! Thank you so much for your positive feedback. Yeah I really feel like AI hardware is a topic that is really under addressed in general and on RU-vid specifically. Hardware is such an integral part of AI/ML/DL that many people don't about. I suppose that it is a bit less interesting (to some) than the software side and bit more nuanced, but I am surprised there is not more content being created here. I guess that is what I have to offer the community. Anyway, thank you so much again for your kind words, and I am so glad you are getting a lot out of the content.

@bradt5426 3 месяца назад

@@TheDataDaddi Seriously, great videos and couldn't agree more. That rig is 700 on ebay now, crazy! Would you go for this rig again at this point in time, or would you get something newer; PCIe 4+? Also, crazy this was sent freight. Do you remember the shipping weight?

@rapalstudios63 6 месяцев назад

i build tons of workstations for AI/ML, this is a good use of old hardware - i would love to see epyc 96 core builds with 8 gpus.

@TheDataDaddi 6 месяцев назад

Oh that is awesome! What is your most common build? If you don't mind me asking. I'd be super curious to know what most people are wanting these days. Man that would be so awesome! I am not sure it if would be compatible with the mobo, but if so, that would be insane! Absolute beast of a CPU for sure. Maybe one day when/if the channel grows, I will be able to do that and test it out.

@smoofwah3552 5 месяцев назад

Bump

@DrDipsh1t 3 месяца назад

Colton from hardware haven showed a nice trick in his videos for removing thermal paste: use a coffee filter instead of paper towel. It's more abrasive, durable, and doesn't leave any flakes on whatever you're cleaning!

@TheDataDaddi 3 месяца назад

Ah this is an excellent tip! Will definitely come in handy in the future. Thanks so much for the comment here!

@avinash0072355 Месяц назад

Thank you so much for the video!

@TheDataDaddi Месяц назад

Of course! So glad you enjoyed the content!

@rapalstudios63 6 месяцев назад

nvlink wont work on 4090, is it? at 4:30 you clearly said 4090s.

@TheDataDaddi 6 месяцев назад

Yep you are totally right. I have no idea why I said 4090s. Those are 3090s. One of the main reasons I went with 3090s was because of NVLink. Please excuse my mistake. I will see if I can go back and make that more clear in the video. Thanks so much for catching that.

@bashwang Месяц назад

Great video and an inspiring build. Couple of questions if you don’t mind. Where can I find information on the onboard SATA configuration for the front drive bays? You’ve gone over this in the video but I do not see it in the server manual - did you pretty much figure it out yourself? Second question is what AI software/flavor of AI do you plan to run on the server? Thank you!

@TheDataDaddi Месяц назад

Hey there! Thanks so much for the kind words. So, its really tough to find out info on this. I don't think I could find anything here. I am pretty sure this part I figure out through trail and error. Lol. So right now I am mainly using PyTorch to run computer vision models like ResNet 50, MMLMs like CLIP and FLAVA, Graph databases and GNNs with NEO4J, and I plan very soon to start working on an anomaly detection research direction and also work with local LLMs (probably Llama to start).

@gdmax5 3 месяца назад

Grate job dude one of the clean explanations I have seen lately ❤

@TheDataDaddi 3 месяца назад

Hi there. So glad you enjoy the content! Really appreciate the kind words!

@VenturaPiano 6 месяцев назад

How loud is this server? Would I be able to be within 5 feet of such a setup?

@TheDataDaddi 6 месяцев назад

Hi there. Thanks so much for the comment. On boot this guys runs over 90 dB. For reference, this is equivalent to standing next to a lawnmower in terms of comparable dB. It is the loudest server I have ever been around on boot. Also, when running the GPUs full blast it can get loud though not as bad as on boot. However, most of the time during normal operation is not bad. You could work next to it no problem. Just be aware you will always hear it. Hope this helps!

@VenturaPiano 6 месяцев назад

@@TheDataDaddi thanks, that helps a lot. I am working remotely and didn't want to go deaf :). Thanks for the guidance, I followed your guide (mostly) to a T.

@TheDataDaddi 6 месяцев назад

@@VenturaPiano Glad I could help! Honestly, if you work with headphones in it should be no problem. For myself personally, I like a little bit of noise when I work so it working next to servers has never really bothered me. I would just leave the room on boot.

@GrossGeneralization 4 месяца назад

Did you look into whether the 1U bump lid (supermicro part number MCP-230-41806-0N) clear an rtx 3090? Looks like that Zotac card is a lot lower than some other 3090 cards, you might get by with the MCP-230-41803-0N which was designed to clear the GTX cables. (Note that these were part numbers for the 4029GP-TRT2, but the chassis looks pretty much the same).

@TheDataDaddi 4 месяца назад

Hi there. Thank you so much for the comment! So, I have not looked into this, but this is a great idea. I do not know 100% if it would work, but my gut tells me it would. I think this would be a great way to keep the GPUs in the server. One problem for me is that I do not have 1U worth of space in my race because it is completely full. However, for others it would likely be a great option. I might order one just to try it and let viewer know for sure if it works or not. Also, if you go this route please let me know the results. Also, I heard from a viewer recently that the RTX 3090 founders edition fix in the chassis with the standard lid. Might be worth checking that out as well.

@oscarmejia2174 3 месяца назад

This is great. what other GPUs re supported in this platform? can you run any modern Nvidia car like a 3090?

@TheDataDaddi 3 месяца назад

Hi there. Thanks so much for the positive feedback! Yes, you can run more modern GPUs. In fact, I use the RTX 3090 in the server currently. In another video, I go over how I got the RTX working with this server. Check it out here: Setting Up External Server GPUs for AI/ML/DL - RTX 3090 ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zrcKGF156bA.html

@stunchbox7564 2 месяца назад

I love the dad part

@TheDataDaddi Месяц назад

Haha. Yeah. He is awesome. He helps me with some of my projects occasionally. I am lucky to have him around.

@birdstrikes 6 месяцев назад

Ventoy USBs work on Mac as long as it's created elsewhere. You can use Lima a vm or boot the iso. I know youre not enabling pxe but iventoy is pretty nifty!

@TheDataDaddi 6 месяцев назад

Man this is awesome. I was not aware that iventoy was even a thing. I toyed with the idea of setting up a TFTP server, but I figured it was not worth the extra effort. This looks easy and effective. I will check this out. Thanks so much for the suggestion!

@chimpera1 4 месяца назад

I'm thinking of building one of these with 8xp100. My knowledge is fairly limited but I want to explore llms. Would you recommend

@TheDataDaddi 4 месяца назад

Hi there! I think this would be a good cheaper way to start experimenting! Do be aware through that training or fine tuning most of the largeer open source LLMs will be out of reach even with a setup of this magnitude. However, you could likely host some of the smaller one or quantized versions locally. Hope this helps. Cheers!

@marcsanmillan3165 2 месяца назад

Really enjoyed the video! Do you know if the drive bays on that server are compatible with NVMe drives? I’m thinking about getting that server but I can’t find that piece of information. Thanks!

@TheDataDaddi 2 месяца назад

Hey there! Thanks so much for your question. So glad to hear you enjoyed the video! Unfortunately, I am not sure if the drive bays in the server are compatible with NVMe drives. I have used a PCIE NVME adapter card to add 4 NVME drives to the server, but I have not tried to used any of the drive bays for this. My gut says that the back plane would not support it, but I am really not sure. I tried doing some research on this and found absolutely nothing. I wish I could be of more help. If you find the answer here, please do let me know. This would be great info to know going forward.

@emiribrahimbegovic813 5 месяцев назад

what kind of AI work do you do? are you able to run and/or train on llama3 with 70B parameter? also wow that is loud, is it this server only or all your servers together? I was thinking to do the same thing in my garage (which coincidentally is the place I do most of the work from), but this might be too loud for me.

@TheDataDaddi 5 месяцев назад

Hi there. Thanks so much for the question. Currently, for my main body of research around browser security I mainly work in computer vision (ConvNets) and also with some of the newer multimodal models like CLIP and FLAVA. In a separate project, I am working with Graph Neural Nets and Graph Databases to ingest and analyze the entire BTC ledger. As of yet, I have not played around much with any of the open source LLMs on this machine because frankly I just have not had the time. It is high on my list of things to do though. So, unfortunately, I cannot tell you exactly what works and what doesn't concretely because I do not have real data to support it. What I can tell you though is that even with 8 24GB GPUs will only get you to 192 GB VRAM. Llama3 70B will need about 140GB for half precision (back of the envelope here) for the model parameters. So pre training or fine tuning are likely out of reach. Honestly, it seems like if you really wanted to pre train or fine tune any of the really large LLMs you would need a distributed cluster. I would love to do this one day, but that will require a lot more time and funds. lol. That said, I do believe that hosting the Llama3 70B locally for inference with model sharding (splitting parts of the model across different GPUs) should be possible with the setup in the video (depending on what GPUs you choose to add of course). Especially if you are okay with some level of quantization, I think it should be highly manageable on this rig. I will try my best to do some experimentation here very soon and make a video on the results. You are not the only who is very interested in this use case. Yes. Lol. This server is loud indeed. It is probably the loudest one I have ever worked with. The noise is not unbearable (for me at least anyway) unless the machine is booting or under heavy load (running 4+ GPUs at max capacity at once). I would definitely not suggest putting it anywhere you will be working all the time as it could get annoying especially if you are constantly running experiments. What I normally do is if I need to work next to the server for extended periods of time, I just wear headphones (mine are over the ear and noise canceling, but any should probably help). This might help you if you don't have any other options, but to work in the garage with the servers frequently.

@emiribrahimbegovic813 5 месяцев назад

@@TheDataDaddi thank you so much for responding, great stuff, great channel.

@novantha1 5 месяцев назад

You know, I'd love to do a proper machine learning server, but there's a couple of things I was a little bit scared of. I've heard horror stories about people who purchased servers, only to find out that certain components (CPU is a big one) are "vendorlocked", or that the server needs to be booted first from a VGA port and accessed with a specific sign in key to access the BIOS and so on... To say nothing of the fact that certain GPUs will have "HPE" "Lenovo" and so on in the name in the listing, which leads me to believe there might be certain GPUs which are vendorlocked as well. It just seems like there's a lot of major issues you can run into that are absolute showstoppers, and it really does frighten me a bit to be caught out by things of this nature, because I haven't personally set up a server build before. Have you run into any issues like this?

@TheDataDaddi 5 месяцев назад

Hi there. Thanks so much for your comment! Interesting. I have never encountered any of these issues. One caveat here is that I have always bought refurbed servers so there my have been unlocked prior to being resold. That part I am unfamiliar with. I have also never encountered a vendor locked CPU, but again I have only ever bought used server CPUs. If you plan on going the used/refurbed route I would say the there is a low probability of this being an issue. Also, to further mitigate risk, I would advise buying from a platform like EBAY that offers a 30 return policy. This should give you enough time to get all the kinks worked out, and if there is something wrong you can always send it back. As for the GPUs, they may have certain manufacture names in the title/description for a couple of reasons. 1) The GPU might be manufactured by that particular vendor. For example, both GEFORCE and MSI both make a version of RTX GPUs. Same chip, similar if not same performance, often similar form factor. 2) The GPU could be manufactured to fit in a particular server type natively (like maybe Lenovo or Cisco or similar). This might mean it is harder to fit into a different server, but might still work. This is one reason why AI hardware is so difficult because there is so much nuance, competition, and a huge lack of clear documentation. Anyway, all of that to say. I would not worry too much about those things. I have built and worked on a good number of servers at this point, and I have never run into any of those issues (knock on wood). If you would like some help getting a build together, feel free to reach out to me. You can find all my contact info in my RU-vid bio.

@kellyheflin5931 5 месяцев назад

What dimensions are necessary for the Supermicro SuperServer 4028GR-TRT to fit in a mid-sized server rack? I'm grateful. Thank you.

@TheDataDaddi 5 месяцев назад

Hi there. Thank you so much for the comment! So, I forget what length I set my rack at, but the Supermicro SuperServer 4028GR-TRT is 29" long so I would set it about 3" to 5" longer than this to comfortably fit the server. To my knowledge, servers are pretty much all the same width. One thing to keep in mind though is not all servers are the same length so if you every buy others in the future they may be longer so it is a good idea to set you rack a few inches bit deeper than the longest server you plan on housing.

@wasifmasood969 5 месяцев назад

Really awesome stuff, can you pls also recommend which type of server rack should be bought for this?

@TheDataDaddi 5 месяцев назад

Hi there. So glad to hear you are enjoying the content! This is the rack I purchased (link below). I have 3 2U servers, the 4U 4028GR-TRT, a 1U switch, and a 1U PSU. It has worked really and been very solid. However, it is a bit expensive. You could probably find something cheaper possibly on EBAY or Facebook market place. www.ebay.com/itm/134077093756

@wasifmasood969 5 месяцев назад

@@TheDataDaddi thanks, can you pls recheck the link you shared?

@TheDataDaddi 5 месяцев назад

@@wasifmasood969 Opps sorry. Wrong link. Here is the correct one. a.co/d/18HuxQG

@GrossGeneralization 4 месяца назад

You probably found it already, but looks like one of your ram modules isn't fully seated (approx half way between both CPUs)

@TheDataDaddi 4 месяца назад

Hi there. Yep, I did catch that and have since fixed it. Thank you so much for letting me know though. Really appreciate it!

@optiondrone5468 5 месяцев назад

man this was an excellent video both the build and the software setup, now please make a video showing us how you use this system for ML both from setting up your venv and your ML coding setup and doing some ML training to stress test your system, how about trying some SDXL image upscaling workflow?

@TheDataDaddi 5 месяцев назад

Hi there. Thanks so much for the comment! Very glad to hear that the content was helpful to you. Okay great feedback. I will try to do a video soon on my workflow, and how I manage multiple projects. I will also be making some video soon to benchmark the system and GPUs so say tuned for that. I am not familiar with the SDXL image upscaling workflow, but I can certainly take a look when I have some time and see if I can make one on that as well. Thanks so much again for the suggestions! Really appreciate it.

@optiondrone5468 5 месяцев назад

@@TheDataDaddi thanks mate, your content are top notch, keep up the good work.

@TheDataDaddi 5 месяцев назад

@@optiondrone5468 Appreciate the kind words! Cheers

@GravitoRaize 6 месяцев назад

In your "software setup" documentation there's a gap where it doesn't have your python and node commands and install prep stuff. Also didn't really have the firewall stuff for XRDP in it, though I suppose people could follow the video to get that info. Probably not a huge deal, but figured I'd mention it in case anyone is trying to follow along using your included documentation. I did want to ask if you were using a specific UPS or pair of UPS devices with this particular setup as this is a dual PSU system? Didn't see anywhere when you mention what you are using for power backup. The IPMI stuff changed just a few years ago and the password is almost always somewhere on the device physically, for mine it was near the network card, but for the 4028GR-TRT I wasn't able to find a reference online for where it was. I suppose it might have been inside somewhere instead, but fortunately ipmicfg exists, I guess.

@lofasz_joska 6 месяцев назад

the man was using chrome as a default browser, that tells everything you need to know....

@TheDataDaddi 6 месяцев назад

Which browser do you prefer?@@lofasz_joska

@TheDataDaddi 6 месяцев назад

Hi there. Thanks so much for letting me know. I will go edit that file so people can better follow along. Yes, I am using UPS units. I am using 4 APC Back-UPS RS1500. MPN BR1500LCD. Max Watt output is 865 per unit. I going to do a whole video soon around power considerations in my home lab. Please stay tuned for that if you are interested! Yeah. I checked literally everywhere I could think of and I could not for the life of me find the password. Glad you had an easier time than me though. I'll go check near the network card like you suggested though and see if maybe I missed something. Was this on the inside or outside near the network card?

@GravitoRaize 6 месяцев назад

@@TheDataDaddi Mine was outside on like the "lip" of metal near the network card, though it didn't say password or anything, it's just a serial number. I've seen some models that have it right above the card in an easy to see place, but it's not always in an easy to spot location. Plus, the label under the CPU doesn't do any good for people who've already seated the CPU, not sure what their thinking was there. Yeah, I'd definitely be interested in what you do for UPS, I imagine a lot of people looking at getting servers like this probably don't really consider it when they are going to do their purchase. I know I didn't at the time I purchased my AI setup. Keep up the good content, it's always good to see what others are doing!

@TheDataDaddi 6 месяцев назад

Got it. I will definitely go check and see if I can find it now. Knowing me it is probably there pain as day, and I just missed it the first time. lol. Yep, I have had several people ask me about this. I will try make a video on general power considerations for a home lab and in the video discuss different options for UPS. I am not sure when I will be able to make that video so just in case you could use this information more quickly here is a summary. I found a great deal on EBAY for some APC Back-UPS RS 1500 (BR1500-LCD). Max output of the unit is 865 W. So I bought 5 units and have distributed power consumption as even as possible across all 5 units. It has been working really well so far. Hopefully this gives you a decent starting place.@@GravitoRaize

@SanjaySharma-ov1kf 5 месяцев назад

This looks very impressive. Do we need that much RAM and Storage for AI model training? And can we install Nvidia 3080 GPU internally?

@TheDataDaddi 5 месяцев назад

Hi there! Thanks for the question. So, you definitely do not need this much RAM or storage. I just have some particularly heavy work loads and having excess storage and RAM makes my life much easier. I would say try to get a little bit extra than what you think you will need for your intended use case. Also, I would spend a bit extra and more memory dense modules. This way you leave yourself open slots for expansion in the future if needed. It is going to depend a lot on the form factor of the particular GPU variant (ASUS, Gigabyte, etc.), but overall I would say it is a safe best to assume that it will not. Part of the issue comes in as well that the 8 pin power connectors are on the top of the GPU so there is not space to plug anything in between the top of the GPU and the lid. You could get creative and buy a 90 degree riser so the GPU would fit sideways or cut a hole in the lid. However, without any modification you can be pretty sure it will not fit.

@SanjaySharma-ov1kf 5 месяцев назад

@@TheDataDaddi Thank you for the quick reply. I have NVidia founders edition GPu which are smaller compared to other brands. Else I will have to purchase Nvidia P40 GPU, not sure about the performance of P40 as compared to 3080. Also is it fine if I go with only 2 RAM chips like 64 GB x 2 or do I need to install minimum of 4 RAM chips memory bank to work.

@TheDataDaddi 5 месяцев назад

@@SanjaySharma-ov1kf The founders editions may work actually. Just do be mindful that you will have enough room to actually connect the power cables. The 3080 will be about 2.5 times as performance as the p40 on paper so it would certainly be worth using it if it is physically possible. Another route you could go is making an external rig for the 3080s. I have a video on that if you are interested in going that route. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zrcKGF156bA.html The 4028GR-TRT supports dual Intel Xeon E5-2600 v3 or v4 series processors, which typically support quad-channel memory configurations. Each processor has four memory channels, and ideally, you should populate one DIMM per channel to maximize memory bandwidth. Using only two DIMMs might limit the memory bandwidth because only two of the eight channels (four per processor) would be utilized. However, it would still function correctly.

@SanjaySharma-ov1kf 5 месяцев назад

@@TheDataDaddi Thank you, I will try out the 3080 and 3070 GPU which I already have, if not I will buy P40. I do not prefer to have external GPU as it will be little risky with kids at home, even though the server will be in the garage. For RAM, I will go with 8 modules to get bandwidth efficiency. I am placing the orders on eBay, hopefully will get it by the coming weekend :) ..Appreciate your help and sharing of information, keep up the good work.

@TheDataDaddi 5 месяцев назад

@@SanjaySharma-ov1kf Yeah if you already have a 3080 and 3070 definitely try those first. Please let me know if you can make them work internally. Great! I think this is what I would do personally. If you are patient you can find some pretty good deals on EBAY for relatively cheap RAM modules. Ofcourse! I am always happy to share what knowledge I have gained. Cheers and good luck!

@stunchbox7564 2 месяца назад

What is the size of the flash drive used for the ventoy setup?

@TheDataDaddi Месяц назад

This is a great question. I honest don't remember of the top of my head. I think the one I used was 32GB, but that is just because I had some extras laying around. You could probably get by with 16GB honestly and even less depending on what tools or other OSes you want to have available.

@rohithkyla7595 6 месяцев назад

Great video! I'm looking to replicate this :) Quick question about the PCIE lanes, it looks like all v3 and v4 gen xeon's have 40 lanes support each, so 80 total. How will 8x 16 (= 128 lanes) fit into this 80 lanes supported by the CPU?

@TheDataDaddi 6 месяцев назад

Hey there. Thanks so much for the comment. Great question! Yes, its absolutely true that each CPU only support 40 PCIE lanes. However, the 4028GR-TRT employs something called PLX technology. These are basically PCIE switches that manage and distribute the 80 physically available lanes to provide higher connectivity, effectively multiplying the number of PCIE lanes coming from the CPUs and thereby increasing the bandwidth available to each GPU. It also provides intelligent data routing between the CPUs and GPUS. This allows for dynamic optimization of PCIE lane usage. Exactly how this works I am honestly not sure. I should probably research this more. All that to say, it should be theoretically possible with this technology to support 8 GPUs at the full X16 lanes bandwidth. However, while it may certainly increase the bandwidth available to each GPU, I highly doubt that under full load with 8 GPUs you would get the full x16 lane bandwidth for each. This would actually be a really interesting thing to test! One day when I get 8 of the same GPUs I will definitely have to try it. lol. Anyway, I hope this answers your question. I wish could give you a more concrete answer, but this is the best I can do for now.

@rohithkyla7595 6 месяцев назад

@@TheDataDaddi thanks a lot for your reply! Will look further into this :) I've ordered one of these servers, looking forward to getting it!

@rohithkyla7595 5 месяцев назад

@@TheDataDaddi Hi Mate, I've just received my 4028GR, I currently have 4 GPUs total (2 p100s and 1 p40 and a rtx 3080), the 2 p100s are in the supermicro, and the p40 is in a separate desktop. In terms of splitting the 2 p100s across the 2 CPUs on the supermicro, would you recommend having both on 1 CPU or split it between 2 CPUs?

@SanjaySharma-ov1kf 5 месяцев назад

@@rohithkyla7595 Hey I have ordered the same server 4028GR-TRT and have 3060 and 3080 GPU, but for some reason the server does not recognize the GPU. I have noticed that the power adapter is bit tight fit for these GPU cards. Did you install 3080 on 4028GR-TRT server? Do we need to change anything in the BIOS or did you use a different power cable for 3080 or 3060 GPU card? I am stuck without the GPU detection on this server.

@rohithkyla7595 5 месяцев назад

@@SanjaySharma-ov1kf Hey, I am currently only running the P40/P100 GPUs which only need 1 8 pin connector which comes with the 4028GR-TRT. I do however have the 3080, but it has power adaptors at the top which makes closing the server's lid difficult. So, I'm currently not using the 3080 in the server. Regarding your issue, it could just be power related, I think an PCIE elevator + outside power should let you know if it's the server's fault.

@SanjaySharma-ov1kf 5 месяцев назад

@TheDataDaddi, Are you using 8 pin GPU splitter to connect 3090 card? Is is 8 pin female to dual 8 pin male connector ?

@TheDataDaddi 5 месяцев назад

Hi there! Thanks for the comment. For the 3090s, it takes 2 male 8 pin connectors in addition to the PCIE slot to power the GPU. I have to PSUs like you would use for a regular consumer grade mother board. I am using the 8 pin connectors from the PSU to power the 3090. I have one 500W PSU per 3090. For a more in depth explanation here, check out the video I have on setting the 3090s up. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zrcKGF156bA.html

@fernandotavarezfondeur3724 5 месяцев назад

Hi TheDataDaddi, I really enjoyed your video! I’m planning to build a GPU server with a budget of $1,500-$2,000, mainly to rent out for AI processing and generate some income. Do you think this budget is realistic for my goals? Any recommendations for someone new to hardware, like me, would be incredibly helpful. Also, I'm from the Dominican Republic and curious about any advice you might have specific to my location. Is there another way to contact you for more detailed guidance? Thanks!

@TheDataDaddi 5 месяцев назад

Hi there. Thanks so much for the comment! Really glad you are enjoying the content. This is actually something I have been looking into myself! I think it is possible on that budget although you may have to be a bit patient and wait for some good deals. Feel free to reach out to me. My contact info should be in my RU-vid bio. I would love the chance to get to know your situation a bit better and offer you more detailed guidance!

@dslkgjsdlkfjd 3 месяца назад

damn daddi that video was great

@TheDataDaddi 3 месяца назад

Hi there! So glad you found this video helpful. Really appreciate the kind words!

@choilive 6 месяцев назад

Hm, if you get a slim right angle PCIE cable could you make those 3090s work?

@TheDataDaddi 6 месяцев назад

Hi there. Thanks so much for the comment. Do you mean work inside of the server?

@choilive 6 месяцев назад

@@TheDataDaddi yeah the inside of the server looks like theres a bit of clearance. obviously with these 2.5 slot cards you cant fit 8 of them in there.. but could be a decent self contained alternative to mining rigs if it would work.

@TheDataDaddi 6 месяцев назад

This is a great idea. I will see if I can find something like what you are talking about that will work. Thanks so much for the tip! @@choilive

@SanjaySharma-ov1kf 5 месяцев назад

Hi @TheDataDaddi, I have couple of questions if you don't mind to help. One I want to boot from PCIE card with NVME SSD, but the NVME SSS is not recognized in the BIOS, but it works when I try to access using Ubuntu. And second issue is that I am not use the 3060 and 3080 GPU on this server, it seem that the power cable for GPU is different on server and not compatible with 3060 and 3080 GPU. I tired new PCIE power cable from Amazon, but it didn't help. Can you please help on these two issues?

@TheDataDaddi 5 месяцев назад

Hi there. Thanks so much for your questions! 1) For the NVME issue you will likely need to go into the bios and enable pcie bifurcation x4x4 or x4x4x4x4 depending on which pcie slot you have it in. If I remember correctly you will need to boot into bios. Then go to chipset configuration>north bridge and here you can change the pcie slot configurations. If you google super micro 4028GR-TRT block diagram this should should you which pcie slots are which. This this and let me know if it works for you. 2) The native power cable in the server is a male 12V EPS connector. This is designed to be used with Telsa series GPUs. However, you can by an adapter that will convert into the 8 pin connector that the RTX series GPUs expect. I believe the following should work for you: a.co/d/fQ9ncia Please let me know if you have any other questions!

@SanjaySharma-ov1kf 5 месяцев назад

@@TheDataDaddi Thank you for the quick response.Really appreciate your help and support. I did change the BIOS setting, Under North bridge 4x4 and 4x4x4x4 but still not able to see the NVME SSD in the boot option. Can you please help to resolve the issue? I have ordered the cable from Amazon, hopefully that will fix the 3080 GPU issue :)

@TheDataDaddi 5 месяцев назад

@@SanjaySharma-ov1kf Sure! Also make sure that under PCIE/PCI/PnP section the PCIE slot is enabled as UEFI. Another issue might be that you have not have set the bifurcation for the correct PCIE slot. Try it for other possible candidates if you have not already. Even with the block diagrams it can be a bit confusing to figure out which PCIE slot is which. Unfortunately, it could also be bad hardware. I order 1 4 slot PCIE to NVME adapter, and it did not work originally due to a hardware issue.

@SanjaySharma-ov1kf 5 месяцев назад

@@TheDataDaddi Thanks for the hehlp. I will try another options as well for bifurcation. Under PCIE/PCI/PNP I do not have option UEFI but EFI. The PCIE slot is working as I did install Ubuntu on it by booting from Ventoy USB but the NVME drive is not visible while booting.

@TheDataDaddi 5 месяцев назад

@@SanjaySharma-ov1kf Sorry I meant EFI. How are you connecting the NVME drive to the PCIE slot?

@AkhilBehl 2 месяца назад

Yo what’s your day job that you need and can afford this hardware? I’m so jelly.

@TheDataDaddi Месяц назад

Haha. Well I am getting my PhD in computer science and want to be able to experiment with the cutting edge so that is the need. At this point, it is mainly just because I love it and am interested, but I plan on eventually starting some businesses from all of this. As far as how I afford it, I don't. lol. I wait for deals and spend way too much of my tiny income on this stuff. That is one benefit of living with your parents. lol. Frees up income to do stuff like this.

@Gastell0 2 месяца назад

2:06:40 - that 25W per GPU idle because Nvidia can't ass to add power governor in headless drivers so that GPU can get off P0... P40's stay at 50W..

@TheDataDaddi Месяц назад

Interesting. This makes a ton of sense. Always noticed that and thought it was odd. Very surprised this issues has not been addressed. I also just checked for solutions. There are not any I particularly like. Apparently, you can try to manually set the GPU into a lower P-state with the following: nvidia-smi -pm 1 # Enable persistence mode nvidia-smi -pl [power limit in watts] # Set power limit Not sure if this works though. I haven't tried it yet.

@birdstrikes 6 месяцев назад

NixOS!

@TheDataDaddi 6 месяцев назад

Had quite a few people recommend NixOS to me recently. I am going to test it out soon when I get some time.