Тёмный

We tour the world's fastest super computer at Oak Ridge National Laboratory! 

The Art of Network Engineering
Подписаться 6 тыс.
Просмотров 48 тыс.
50% 1

Everything Art of Network Engineering: linktr.ee/artofneteng
In this video we get a tour of the world's fastest super computers, Frontier and Summit, at @OakRidgeNationalLab! Both of these High Performance Computing (HPC) environments have played significant rolls in various areas of research.
Our tour guide, Daniel Pelfrey, Principal HPC Network Engineer, takes us through the challenges of Networking in an HPC environment, and some of them might surprise you!
A huge thank you to our friends at the Knoxville Technology Council for connecting us with the Oakridge National Laboratory.
Also, thank you to Kate and Daniel for the tour of ORNL, the super computers there, and for making this video possible!
Chapters
-------------------
0:00 Intro
00:58 What's the high level mission of ORNL?
02:07 What makes a High Performance Computing Environment different from Enterprise Networks?
04:09 HPC Network Design
05:09 Introduction to the Frontier Super Computer
06:40 Inside the Frontier Data Center!
09:54 We get to peek inside a Frontier cabinet!
15:12 The Summit Super Computer
17:03 HPC Environment Operations
19:24 The teams that keep these HPC environments going
20:20 The Why
21:43 Wrap up
22:36 Outro

Наука

Опубликовано:

 

31 май 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 123   
@martijnb3381
@martijnb3381 29 дней назад
"And is 2 Exaflops" big smile 😊 Nice to see someone that is passionate about his work!
@Alfred-Neuman
@Alfred-Neuman 9 дней назад
It's not even "a" computer, it's basically just a botnet installed locally... The only difference I can see between this and a botnet it's this is installed inside a single room so they'll get better latency between the different RAM modules, CPUs and GPUs. What's I'm trying to say is this technology is not very impressing, they're using pretty much the same CPUs and GPUs that we are using in out cheap desktops. Just a lot of them...
@NorbertKasko
@NorbertKasko 5 дней назад
​@@Alfred-NeumanFor a while they used special processors in these systems, developed directly for them. Cray comes to the mind. When clock speeds become less scalable they started to use consumer hardware. In this they have 8.7 million processor cores instead of 16 or 64 (talking about high end desktop machines).
@mikmoody3907
@mikmoody3907 4 дня назад
@@Alfred-Neuman Why don't you build a better one and impress us all..................
@chuckatkinsIII
@chuckatkinsIII 10 дней назад
One of the coolest aspect of Frontier's network architecture is at the node level. Since all the compute is done on GPUs the network fabric connects directly to the GPUs instead of something like a PCIe bus. So simulations can transfer directly between GPU memory and the network fabric without involving the CPU or having to move data on or off the GPU to get to the network. It allows for incredibly efficient internode communication.
@noth606
@noth606 10 дней назад
So the GPU's have NIC's connected directly to them? With some sort of second MMU with it's own NIC? It's a tad unclear from the way you describe it, but I wonder how it connects to the GPU since you say it's not using PCIe?
@chuckatkinsIII
@chuckatkinsIII 10 дней назад
@@noth606 I slightly misspoke. The NICs use PCIe ESM but connected directly to a PCIe root complex on one of the GPUs. Each node has 4 GPUs each with 2 dies (so 8 visible GPUs) and a dedicated nic, so 4 NICs per node. Thus any CPU operation that has to use the fabric actually traverses one of the GPUs to get to a nic. Source: you can find a bunch of architecture docs for Frontier but I also worked for several years on developing some of the library and software stack for this machine and a few others that were just beginning to come online.
@kellymoses8566
@kellymoses8566 Месяц назад
Biggest difference between HPC networks and corporate networks is lack of security in favor of performance at all costs. The compute nodes directly access remote memory over the network RoCE
@JamiesHackShack
@JamiesHackShack Месяц назад
Great video. Thanks for taking us along with you all!
@alexanderahman4884
@alexanderahman4884 27 дней назад
Sorry for nitpicking but he got one thing wrong. The reason you don't use electrical network cables for longer distances is not primarily because of interference from the power cables but has all to do with attenuation. At these speeds it is very hard to get the signal more than a few meters, it will be heavily attenuated and very hard to distinguish a 1 from a 0. The solution to the problem is to use fibre optics instead.
@grandrapids57
@grandrapids57 6 дней назад
This is correct.
@TaterRogers
@TaterRogers 6 дней назад
I am in Security now but I really miss being a network engineer. Thank you for sharing on this platform.
@artysanmobile
@artysanmobile 28 дней назад
Take note of the power cables for each rack, similar to the amount a large house might use, per rack. Removing the heat from those racks is a big part of the design. Air flows from the floor and out the top in active exhausts. A little hard to believe, but compactness is a top priority.
@mikepict9011
@mikepict9011 19 дней назад
Could you imagine their hvac systems!!!! Chillers rated in swimming pools per min
@artysanmobile
@artysanmobile 19 дней назад
@@mikepict9011 They have so much heat to get rid of, the concept of blowing cold air is no longer valid. Fluid is far more effective at conducting heat away from a metal structure and processors are manufactured with built-in liquid cooling. Each rack is built for purpose with an exchanger which takes it directly out of the room, then returns cold for the next batch. If you work on your home’s HVAC unit, you’re familiar. A widely distributed system like that can be monitored and adjusted for best efficiency.
@mikepict9011
@mikepict9011 19 дней назад
@@artysanmobile yeah thats part of a larger cascading system when you consider the envelope usually. The liquid usually and ultimately needs to e rejected outside. And thats called a chiller in a liquid system and a condenser in a direct exchange system. But yeah , vapor compression, pipe joining . Its what i do .
@mikepict9011
@mikepict9011 19 дней назад
@@artysanmobile i serviced the mini chillers that cool MRI machines, they still had a 1 air to refrigerant dx hx and 2 coaxial heat exchangers ( hx ) with 2 pumps . Simple systems compared to real low temp refrigeration
@artysanmobile
@artysanmobile 28 дней назад
The power supply behind them is unbelievable. Enough for a town.
@goutvols103
@goutvols103 10 дней назад
What happens to the hardware after they are removed from Oak Ridge? Is there still some value in them besides recycling?
@RyanAumiller
@RyanAumiller 7 дней назад
Why not check out the visualization suite? That's the coolest part.
@iamatt
@iamatt 18 дней назад
You can tell when the machine is running some serious workloads because the lights flicker in the offices next to it.
@trebabcock
@trebabcock Месяц назад
ORNL is my dream job. I'd honestly sweep floors just to be in the building.
@iamatt
@iamatt 27 дней назад
It isn't all rainbows and unicorns
@jinchey
@jinchey 10 дней назад
​@@iamattdid you have a traumatic experience at oak ridge national laboratory
@pipipip815
@pipipip815 10 дней назад
Good attitude. Doesn’t matter where you get on the ladder, just get on, work hard, learn and be agile.
@stevestarcke
@stevestarcke 9 дней назад
I visited there long ago. It was the most awesome place I have ever seen.
@iamatt
@iamatt 9 дней назад
@@jinchey it's an interesting place to work when you get in the mix and actually see how the politics are, let's just say that.
@knewdist
@knewdist Месяц назад
Awesome tour/interview. Dan seems like a real genuine dude. 👍
@udirt
@udirt Месяц назад
incredibly good interview you did there.
@artofneteng
@artofneteng Месяц назад
Thank you!
@calebwyman5510
@calebwyman5510 9 дней назад
Computers are like watches now we need to start making computers that last hundreds of years in my opinion
@DirkLachowski
@DirkLachowski Месяц назад
This is amazingly quiet for a system of that size
@artofneteng
@artofneteng Месяц назад
Water cooled! The other half of the data center not shown in the video was all storage and that side was LOUD!
@ssmith5048
@ssmith5048 27 дней назад
Simply Awesome!
@roberthealey7238
@roberthealey7238 Месяц назад
No matter how big or small: The network IS the computer… For the past few decades, outside of embedded applications (and even in many situations there), computers have to be connected to a network to have any practical value; every piece of software, and most if not all its data, is sent over a network at some time in its lifecycle.
@dougaltolan3017
@dougaltolan3017 28 дней назад
Never underestimate the bandwidth of a FedEx truck.
@Terost36
@Terost36 8 дней назад
I can't image working there with all these computers so much electric field energy and hopefully is not affecting people's health. Any EMI/EF Faraday cage?
@artysanmobile
@artysanmobile 28 дней назад
I’m surprised they can even talk in there. I’ve been in some major data centers and communication can be difficult.
@artofneteng
@artofneteng 28 дней назад
They were water cooled so no fans on that side of the DC. The other side was storage which still had traditional cooling and was very loud!
@artysanmobile
@artysanmobile 28 дней назад
@@artofneteng Ah, that makes sense.
@robertpierce1981
@robertpierce1981 10 дней назад
I’ve been in the computer rooms at fort Meade. Awe inspiring
@ronaldckrausejr7762
@ronaldckrausejr7762 3 дня назад
Fort Meade is also volumes faster than this system. It’s just the specs are classified - someone will know those specs eventually (perhaps in 20-30 years). Even Snowden knew the NSA has had the best computer in the world since 2002
@jfkastner
@jfkastner Месяц назад
Great Video, thank you. Interesting would have been the type of Failures they see - Overheating, Bad Solder, Caps fail, Fans/Plumbing fails etc
@artofneteng
@artofneteng Месяц назад
We did learn that they have full service staff provided by OEMs of the supercomputer. They were there performing maintenance that day. Our POC didn't have specifics on hardware failures of the HPC environment, I'll see if he has anything on the networking components.
@iamatt
@iamatt 27 дней назад
L3.cache errs for 1
@iamatt
@iamatt 20 дней назад
@@artofneteng MTBF was 1 hour at first
@iamatt
@iamatt 20 дней назад
@@artofneteng blue jackets are pushing carts all day
@blitzio
@blitzio 8 дней назад
Amazing tour, mind blowing stuff
@artofneteng
@artofneteng 2 дня назад
Glad you enjoyed it!
@vanhetgoor
@vanhetgoor 9 дней назад
It would have been nice to know a few things about how that plethora of processors is organised, how they work together and most of all how does the output from all processors is combined to one knowledgeable fact. I can imagine myself a number of cores where on each core is a part of a programme working, But with a numerous number of processors this can't be done any more.
@glennosborne4555
@glennosborne4555 Месяц назад
After working with one we heard the gruntiest one is in Japan now rather than Oakridge.
@mattaikay925
@mattaikay925 18 дней назад
Did I see Cray - oh my - that is just awesome
@iamatt
@iamatt 14 дней назад
And AMD not NVDA 😂
@bmurray330
@bmurray330 5 дней назад
The guy in light blue needs a trimmer wardrobe.
@BreakpointFun
@BreakpointFun 9 дней назад
7:50 his head got a head 😂 i cant stop seeing this
@grantwilcox330
@grantwilcox330 17 дней назад
Great video
@you2be839
@you2be839 3 дня назад
Fascinating... still don't understand much of what that "time machine" is all about, but fascinating nevertheless... even though I think a DMC DeLorean properly retrofitted for time travel offers a bit more practicality and excitement in terms of time travelling!! Haha
@artofneteng
@artofneteng 2 дня назад
The time machine reference was that the supercomputer has done in a shorter amount of time what would have taken us years to complete without it. It dramatically speeds up research.
@brookerobertson2951
@brookerobertson2951 9 дней назад
We will have the same processing power and a phone and around 20 years.. I watched the documentary about a super computer the size of a factory and it wasn't as fast as a new phone 10\15 years later.
@bits2646
@bits2646 Месяц назад
In supercomputing it's either Network or Notwork :DD
@tuneboyz5634
@tuneboyz5634 Месяц назад
thats really funny lil buddy 😊
@jonshouse1
@jonshouse1 16 дней назад
Not sure I understand the "noise" issue with copper Ethernet? It is transformer coupled at each end, self balancing with common mode induced noise rejection via the twist. I've seen it run around along with the wiring for 3 phase CNC equipment with no issues. Even at those scales I am not sure I buy that explanation. Length would be a real issue at that scale rather than noise I would have thought.
@switzerland3696
@switzerland3696 9 дней назад
200Gb, lol I have that between the switches at work which I put in like 3 years ago.
@woodydrn
@woodydrn 10 дней назад
You know if they switched off all those small diodes on each server, blinking all the time, consuming power, I wonder how many watts that is total. You really only need those lights to debug if something is working right? could be a little switch instead to toggle those on and off
@sky173
@sky173 10 дней назад
You can think of five L.E.D.s using about 1 watt of power. In the grand scheme of things, If they were switched off, most people would not know that some energy was saved. If you look at the home computer, it's costs (on average) $35-$40+/- a year to run a home computer 8 hours a day for one year (possibly much less). Those same five LEDs (diodes) that you mentioned would cost 35-40 cents to run them 8 hours a day for a full year (or just over a dollar per year if running 24/7)
@woodydrn
@woodydrn 9 дней назад
@@sky173 But it's quite redundant to have them right? You dont need them at all really
@youtubeaccount931
@youtubeaccount931 8 дней назад
How many instances of doom can it load?
@ThisDJ808
@ThisDJ808 6 дней назад
wheres the NSA Stickers?
@DMSparky
@DMSparky 13 дней назад
You’re here to look at the networking in here as an electrician looking at the electrical.
@robgandy4550
@robgandy4550 26 дней назад
I would love to work there. Tired of making 10 gb as fast as possible. Mind you, I got into a terraflop
@minicoopertn
@minicoopertn 13 дней назад
Are these super computers shielded against EMP
@artofneteng
@artofneteng 11 дней назад
Great question! I don't recall if he said whether they are or not.
@dougaltolan3017
@dougaltolan3017 28 дней назад
No way do you get access to the world's fastest computer... Hypersonic missile systems are classified :P
@iamatt
@iamatt 27 дней назад
Open research, class are in other DCs
@brookerobertson2951
@brookerobertson2951 9 дней назад
But can it run doom ?
@tironhawk1767
@tironhawk1767 25 дней назад
So SkyNet is a Tennesseean.
@deeneyugn4824
@deeneyugn4824 Месяц назад
Where old system goes, eBay?
@olhoTron
@olhoTron 23 дня назад
I think it will go to auction
@eliasboegel
@eliasboegel 3 дня назад
It's usually auctioned off.
@ml.2770
@ml.2770 9 дней назад
But can it run Crysis?
@Gumplayer2
@Gumplayer2 8 дней назад
did he really tell what is the use of these machines?
@bmiller949
@bmiller949 4 дня назад
I would hate to see their electric bill.
@waterdude123123
@waterdude123123 Месяц назад
But can it run crysis?
@tuneboyz5634
@tuneboyz5634 Месяц назад
no
@drooplug
@drooplug 28 дней назад
You spelled Doom wrong.
@munocat
@munocat 28 дней назад
how many chrome tabs can it handle?
@TAP7a
@TAP7a 15 дней назад
In all seriousness, not very well. Games have such miniscule latency requirements that any distributed system is immediately going to fall on its face. Even chiplet-to-chiplet within the same CPU package has proven to be enough to affect game experience - reviews of the R9 7950X all identified that frame pacing was affected dramatically when threads moved between CCDs, let alone moving between entire racks. Now, playing Crysis on a single unit, especially if it has both CPU and GPU compute...
@id104335409
@id104335409 13 дней назад
Nothing can.
@Derekbordeaux24
@Derekbordeaux24 10 дней назад
But can it run doom
@josephmills9031
@josephmills9031 9 дней назад
Asks whats a exaflop, proceeds not to explain a exoflop. EXA FLoating point OPerations per Second) One quintillion floating point operations per second.
@djtomoy
@djtomoy 9 дней назад
Can it play Minecraft?
@shlompy7
@shlompy7 26 дней назад
OMG he asks so many stupid and repeated questions about the network cables....
@artofneteng
@artofneteng 25 дней назад
Some clarifying questions never hurt, and this channel is Network Engineering focused.
@josephconsuegra6420
@josephconsuegra6420 5 дней назад
Quantum computers are exponentially faster.
@switzerland3696
@switzerland3696 9 дней назад
Those poor bastards having to deal with AMD GPU drivers in HPC.
@evanstayuka381
@evanstayuka381 8 дней назад
Is that a problem? Can you expantiate?
@switzerland3696
@switzerland3696 8 дней назад
@@evanstayuka381 Driver and firmware stability vs nvidia, look at the drama around Geohot / tinygrad / tinycorp having to abandon AMD as their primary platform due to the lack of stability.
@gunturbayu6779
@gunturbayu6779 7 дней назад
None of that matters when they use custom software and write their own codes , the hpc are used for open computing, CUDA wont matter. Now you better go back to your gtx 1650 fanboy.
@switzerland3696
@switzerland3696 7 дней назад
@@gunturbayu6779 You statement makes no sense, and why the hate? As they say when you do not have a good argument you resort to person attacks.
@switzerland3696
@switzerland3696 7 дней назад
@@evanstayuka381 I thought I already replied to this, or the post got deleted perhaps as the truth was too hard to handle perhaps. The AMD driver and firmware are competitively unstable compared with the NVIDIA driver and firmware stack. Look at the drama that Geohot / Tinygrad / Tinycorp had trying to go with AMD GPU's and had to abandon going AMD as the tinybox standard and offer the NVIDIA option as the primary option, as they could not get the driver / firmware stability required for the a shippable platform. Lets see if this post gets deleted.
@inseiin
@inseiin 9 дней назад
Balder dude has been wearing headphones for toooooo long....
@TabulaRasa001
@TabulaRasa001 4 дня назад
This guy doesn’t seem like he’s ever seen the inside of a data center before what embarrassingly basic questions that didn’t even get to what’s special about their setup or capability.
@detectiveinspekta
@detectiveinspekta 29 дней назад
Panduit.
@Kenneth_James
@Kenneth_James 9 дней назад
Get that man clothes that don't look like he was just shrunk into
@antonjaden2482
@antonjaden2482 13 дней назад
Bitcoin miners😂
@evileyemcgaming
@evileyemcgaming 17 дней назад
Heheh all im thing how cool be to play minecraft on it
@europana7
@europana7 Месяц назад
it should mine BTC :P
@kennethwren2379
@kennethwren2379 8 дней назад
Are you sure it's the fastest super computer in the world? China has come a long way in this field and would be very competitive.
@rtz549
@rtz549 12 дней назад
They need to generate crypto to pay for future machines and upgrades.
@kilodeltaeight
@kilodeltaeight 9 дней назад
lol. No. Who needs crypto when you can literally just print more dollars? DoE has a massive budget regardless.
@rtz549
@rtz549 9 дней назад
@@kilodeltaeight Then they could construct a supercomputer that had no final size or limitations.
@inraid
@inraid 5 дней назад
Horrible soundtrack!
@EvoPortal
@EvoPortal 13 дней назад
It's just a lot of servers clustered together.... whats the big deal? Server clustering has been around for decades....
@cod4volume
@cod4volume 17 дней назад
Chose amd to save money, could be faster with intel omegalul
@channel20122012
@channel20122012 9 дней назад
Faster with Intel ? Are you living under a stone? Lol
@kilodeltaeight
@kilodeltaeight 9 дней назад
The real crunching of data here is happening on GPU cores, not the CPUs. Those are just managing the GPU cores, effectively. With a system like this, your biggest concern is power and cooling, so efficiency is what matters. AMD very much wins there, and has the experience with building large systems like this - ergo, they won the contract.
@gunturbayu6779
@gunturbayu6779 7 дней назад
The funny thing is. This oak ridge will be number 2 once el capitan project done , and it will be amd number 1 and 2 for fastest super computer. Intel will have number 3 with 80% more power usage lmao.
@javiermac381
@javiermac381 12 дней назад
For playing games?
@ATomRileyA
@ATomRileyA 21 день назад
What a great video, so informative must be a real privilege to work on that system. Reading about it here as well so impressive. en.wikipedia.org/wiki/Frontier_(supercomputer)
Далее
Tesla Reveals The New DOJO Supercomputer!
13:11
Просмотров 638 тыс.
The Journey to Frontier
17:37
Просмотров 20 тыс.
МЯСНОЙ ЦЕХ - Страшилки Minecraft
37:24
Tragic Moments 😥 #2
00:30
Просмотров 2 млн
The ULTIMATE Raspberry Pi 5 NAS
32:14
Просмотров 1,3 млн
host ALL your AI locally
24:20
Просмотров 623 тыс.
Items that Left Rick Harrison SPEECHLESS
20:13
Просмотров 781 тыс.
How Supercomputers ACTUALLY Run The World
21:17
Просмотров 100 тыс.
MS-DOS has been Open-Sourced!  We Build and Run it!
15:01
The Real Reason Tesla Built The DOJO Supercomputer!
17:05
The Birth, Boom and Bust of the Hard Disk Drive
22:02
Просмотров 307 тыс.
ПК с Авито за 3000р
0:58
Просмотров 1,4 млн
What’s your charging level??
0:14
Просмотров 7 млн
Power up all cell phones.
0:17
Просмотров 48 млн