We covered the GPU side of Intel’s Architecture Day here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-F3kE-3ZLA0Q.html We also previously discussed Intel’s process node naming shuffle here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-wxKGFxmwcDo.html
When is magic going to be used in keeping the silicon from degrading when the cooler fans die and the ignoramus that owns the rig doesn't check that? (asking for a friend...)
The spark in the new Gamers Nexus intro logo reminded be of Gigabyte.... Not sure why... Are you offering logo creation services to Power Supply Manufacturers?
couldnt help but notice the hammer next to the gigabyte psu on the desk there. so i guess you are already well on your way to blacksmithing together the worst most lamentable clanger of a dodgy lemon pc of 2021 right? you combine that psu with the nzxt casethat catches fire.and then how about thowing in buildzoid's rdna2 red devil GPU. that randomly blow power stage during idle. when it isnt even doing anything. so whats left to collect then? a motherboard, cpu, ram... maybe some water cooling parts too. perhaps the enermax AIO will be premitted to return once more? if its still is not fixed yet, after all thee years eh?
Gigabyte declared war on Steve by pretending there were only problems with running those PSUs at their limits for extended periods of time. When Steve and Co. had been meticulous in their testing.
even 256 Int 8 operations per cycle implies feeding each core with a terabyte per second or so of data. Good for small data sets, 110% RAM bandwidth limited for anything else.
Not every single byte loaded from memory only needs one operation done on it. Long chains of instructions (i.e. algorithms) are much more common. Also overlooked is the fact that, just like with the 128-bit SSE and 256-bit AVX instructions, there are special MOV/GATHER/SCATTER commands that load the entire register in a single operation, i.e. it would *NOT* require reading 256 8-bit numbers from ram, rather you would use the prefetch instructions (e.g. something like VGATHERPF0QPS in AVX-512) to, in a single instruction, load the required data from RAM into the fast cache and from there you could operate on them just like you would a single, x-bit sized number. That's the beauty of these SIMD architectures; it's not just mathematical calculations being accelerated, it's like having a 256/512/x-bit processor in stead of a 64-bit one, but you actually put those bits to use as a true >64-bit CPU would be extremely large and have a lot of overhead for very little benefit beyond very specific scientific applications. As to the usefulness of 1024 Int8 operations per clock cycle, it would be a godsend for anything to do with video (yes also AI, but that is more a gimmick for actual PC users than anything useful right now; other than the obvious largest use case of relieving the burden from human workers having to look through/listen to all of your data, to just flagging the interesting stuff for human review and long term storage.). AVX-512 also has the potential to literally quadruple gaming performance from the algorithms commonly used today (256-bit SSE2, sometimes also SSE4), although no one is going to selectively optimize and compile their code just for Intel CPUs any time soon, so I highly doubt we'll live to see the absolutely astounding performance improvements possible with it. All of the common compilers (except for ICC, which it seems no one has the balls to develop games with, even before AMD became the leading gaming platform and grabbed a large chunk of gaming market share) have even started to preferentially compile to 256-bit SSE instructions by default, just to not trigger the few 100MHz AVX-offset downclock, even though that reduction in clock speed doesn't come close to the performance gains that could be achieved, even if the programmers didn't specifically write assembly-optimized AVX algorithms, but just used the AVX intrinsics and included a few compiler hints into their code. We could have had a revolution similar to the move from SISD to SIMD with the computation of 4x 32-bit numbers with a single instruction (think, 'Pentium with SSE' levels of performance improvement). Those instructions that allowed going from doing single 32-bit instructions to acting on 4 packaged 32-bit numbers in a single 128-bit wide SSE instruction is what made the games of the late 90's and early 2000's possible on the hardware available at that time. Sadly, such performance optimizations seems to be a completely lost art in modern game engine development (compare the custom, very fast 1/sqrt() function in Quake III, as opposed to simply calling the built-in sqrt() function and relying on the CPU's FPU do the division for you)...
It's for block matrix multiplication. The number of operations grows as the 3/2 th power of the data set size. So the bigger your matrix, the less memory bandwidth limited you are. This was discussed in one of Jim Keller's Tenstorrent interviews.
I think I have heard this one before... one manufacturer is bested by another one, then procceeds to cram lots of cores into their chips and to crank the frequencies as high as they will possibly go, then come up with an unconventional chip design for the usual desktop consumer environment... Jokes aside, I hope this turns out to be good for intel. It's always nice to have both they and AMD as viable options.
I was around doing SW development when threads became a thing. Do not expect the initial rollout of this new kind of thread scheduling to be smooth. It took years to get thread scheduling to be at all automatic.
@@Shapar95 ? I'm saying that people should have expected exactly what happened. P-core threads got dumped on e-cores seemingly at random. Threads stayed on e-cores even when p-core threads became available. Most people don't notice and mostly it doesn't matter but if you're benchmarking you'll get results that vary outside expected variance. If you use these in production environments you'll have weird efficiency issues.
@@severgun On the face of it I would agree but if all background tasks are delegated to the little cores then all your big cores can focus on latetency sensitive user tasks like gaming.
@@severgun watch whole intel presentation. Raja did mention hybrid architecture actually people compear with hybrid car engine which subject to give maximum millage per tank gas. But there is a another hybrid technology in formula one car racing. Conventional turbo charger engine (golden cove core) is being given additional electric charge (gracemount core) to blast maximum speed to get top speed. Alder lake will going to follow 2nd one notion. In alder lake decoder will decord 3type of instruction scaler , vector and ai . The vector instuction priorities p thread but the instuction is too small and sort so don't need p execution unit for achieving task hence the thread director will it send to e core where 17 execution port available per core so it will have sufficient space to execute vector instuction as a result the broader execution port of golden cove core will be traffic free to achieve scaler and ai instructions. That's why in alder lake gracemount core have 17x8= 136 execution port where 11900k only have 80 execution port. On traditional architecture all execution port are being mashed with scaler , vector and ai instuction so on alder lake thread director will transfer rightfully instructions to fightfull thread.
The key to this new architecture will be how well operating systems can firmly assign tasks to the right core. There are some reviewers that seem to think the OS scheduler will NOT necessarily do this but that applications will use whatever cores they want at whatever time they want regardless of the combination of processes and tasks running. Seems to me that power savings alone do not justify creating an entirely new architecture. Unless the OS can assign "background" and "easy" applications to efficiency cores and ensure they do not spill over into performance cores, what's the point? But how will you guys test this? How will we know if it actually works?
11:45 "Creatively calling Fabrics" Steve, Fabric is an actual terminology thats been around in the enterprise space for over a deacde, long before AMD started using it in their Infinity Fabric. Cisco use the terminology in their Fabric Interconnect, Dell EMC their Smart Fabrics for their SAN's, etc.....
Why Intel does not have a variable number of thread configurable like IBM Power 9? What I mean is in most Power 9 systems, if one server is configured with 8 core, the admins can decide if they want 2 SMT for 4 CPU and 8 SMT for rest of 4 CPUs, or they want 8 SMT for all 8 CPUs making 64 thread or they want 2 SMT for 8 cores which will be 16 thread. This is used to configure what kind of workloads go in which CPUs.
14:20 Whenever I see three adjectives right after the other used to describe a product, I can't seem to resist thinking that the person writing this was thinking dirty thoughts...
Benchmarks like cinebench, handbrake etc.. only reflect the CPU performance at AVX instruction execution, it doesn't reflect the CPU performance in other areas.
I'd love a computer with a super efficient core that just sips power like a Raspberry Pi while my system is idle, then the system roars to life when I hit "Render"
@@paco4756 Yes but he said the magic word... RENDER. He can't do any significant rendering jobs on ARM based systems. He like many folks out there want those kind of low power efficiencies on x86 based systems.
The last true change intel made was when they integrated the memory controller on the die and revived the HT. After that was about nm and very small changes and fixes and later adding cores, because AMD. Now it seems a real overhaul which is great
Well, yes and no. While I agree on the last part, Sandy Bridge was none of the changes you described, and it vastly outperformed Nehalem (45nm) and its 6-core die shrink Gulftown (32nm) at a much better power envelope. Since then, yeah, pretty much stagnation and minor upgrades gen to gen. I would partially blame AMD as well, since Bulldozer and its "remixes" were quite weak to say the least, so no reason for intel to push forward. Luckily, now the tables had turned, and both companies are improving vastly.
@@Igbf Of course new process is better. This is kinda of obvious. You can see that really clearly before core era even with basically no changes at all. You are right but wasn't the point I was trying to say.
@@madson-web That is not what I meant. The manufacturing process was the same (Both Sandy Bridge and Gulftown were built on 32nm), but due to architectural improvements both performance and efficiency were vastly improved. That is neither HT nor had major changes to the memory controller, that is pure core architecture, when "Tok" still meant something.
@@Igbf It was optimization inside the same platform. I still do not compare to what is happening now or other changes mentioned. If it was enought to bring more power you can add that to the list, I see a reason. So that is fine. As you said it was at time when "tok" meant something.
I was. But the more I look at this. Most of these items won't be mature until 2023. DDR5 alone at its introduction rate really isn't outpacing DDR4. And this new architecture. I'll bet money that for the first gen its probably going to keep pace with current AMD wares not outpace it. Long term I'm guessing it will overtake AMD's consumer wares, until AMD fires back. *shrugs* We'll see what happens.
Smart man things are about to get serious and interesting I'm waiting for intel raptorlake to upgrade so just sit back and milk ur current PC and just watch the tech videos
It all depends how old your current system is tbh. I personally am going to build a 10700k z490 system soon from my old 3770k ddr3 setup. I really can't be waiting for ddr5 to come out AND getting ironed out till it's actually worth it, my current system would be way too old. But if you come from a much recent system, then yea, keep using that no point in upgrading now.
Very interesting stuff. I was initially skeptical of the heterogeneous Perf/Efficient core setup for desktop. However I suppose if you think about it, at the top end this is basically an i9-9900k with up to 8 atom cores on the side. The atom cores handle all the OS, light tasks and background junk, totally freeing up the “9900k” performance cores to use all their resources and throughput for bigger tasks.
Another interesting note is that the atom cores aren't all that light themselves. Some have described Intel's big.LITTLE as more of a big.MEDIUM. In terms of MT performance, the atom cores are fairly hefty, and future Intel generations are rumored to increase the e-core count to continue improving MT performance (e.g. Raptor Lake top-end SKU is rumored 8 + 16). This is the opposite approach to Apple's M-series chips, which are rumored to increase their big core count, and where the Icestorm / e-cores are truly just there for power efficiency. In short, the little cores have quite a bit of throughput themselves, and if you were to run a highly multithreaded task (e.g. typical Blender test or something), the little cores will definitely make themselves count.
The best way to look at it would be to look at how iOS/iPadOS and Android (and to some extent Win10 ARM) handles the BIG.little architecture that is prevalent in ARM CPUs today. If Intel can pull this off, it will make for some really performant AND efficient x86 chips. Even better if AMD joins in. I would love to see what Zen could do when shrunk down for almost no power consumption.
With this latest Intel offering I know of one Linux user that's more interested in what AMD is doing now. Me. I haven't run an AMD CPU since the original Athlon Thunderbird either.
Admittedly my PC purchases have probably not been the most considered, but the one I bought almost a year ago now was the first to have an AMD CPU. As with previous purchases (pentium II, core i3, core i5) Intel had a definite lead in performance. Ok, so the AMD rig I bought was slightly cheaper than the equivalent Intel rigs at the time. But having done some reading and watching videos it seemed that, unless you wanted to go with liquid cooling, AMD were better performing than Intel and didn't draw as much power. At the time I went with a nVidia 2060 Super. As I was cost conscious but not a psychopath. However since the latest Radeon GPUs were launched, I may have gone for an AMD graphics card if buying today. Can't say I've noticed any difference between AMD and Intel so far. At least with current gen chipsets.
@@Penfolduk001 lmfao. As I can agree that AMD CPUs were price/perf killers, that I can't agree that Radeons are good - they're still trash - 256 bitbus width on 6900XT is like shooting your feet through your knee. Not to mention funny Lanczos 2.0 release and poor RT performance (even tho 4K60FPS w/RT isn't possible ATM, still 2.5K is) - Radeon group always was StinkyCheese, e.g. R9 280 and 280X...
Well intel. try again. i hope you can compete with this one. I perfer AMD. cuz you price gouged us for years with 2 and 4 cores. you got what you deserved. but i love competition. drives prices lower. You greedy BASTARDS!!!
E and P cores sound like a pretty creative way to deal with yields and binning. Apple M1 does the same thing, right? And that worked out pretty well for Apple, so exciting times may be ahead.
It's something ARM does in phones (for the past 10 years). It's a "big.LITTLE Architecture". This makes sense for Mobile products that need to sip power routinely and rarely go to high power. I see it working well for high power desktops. For mid/low range desktops, we'll see. Although those are dissapearing in favor of laptops.
Sounds promising! Great breakdown as usual. My current system is all AMD but I look forward to maybe putting an Intel product in there in the somewhat near-future!
I'm interested to know exactly what the Thread Director is, what it does, and how. Calling it "software transparent" while simultaneously saying "Windows 11 only" seem like contradictory statements to me. If it's so transparent, Win10, Win11, or Linux wouldn't even enter the conversation. It would "just work."
There is an old newspaper saying "Don't pick a fight with someone who buys ink by the barrel." RU-vid is world wide and not limited to newsstands and subscribers only. Gigabyte has "F'd Up" big time.
This is fantastic, they simply churned out the same cpus year after year, banking on their name to sell them, not their ability. Competition has always been good and this is the proof, with radical new designs and hopefully radically different results. I'll be looking forward to seeing the results
This means, Intel is gunning for efficiency. But I think, for full performance, it will ultimately comes down to the P core. Their 24 thread will be equivalent to AMD's 16 core. But it should be more efficient.
@@mineturte it’s not a big jump though. It’s a big jump over AMDs own product stand but compared to Intel they aren’t much faster in real world applications.
the E and P cores is a good demonstration of the transistor density part of Moore's law. Doubling transistor budget (die are) on a single core is expected to yield 1.4X performance gain. A second doubling for 4X is expected to be 2X performance. Hence four E cores having the area of a single P core means the E cores should be able to 50% of the P core. In a highly-threaded workload, the 4 E cores could be 2X a single P core, depending. Hopefully future generations will have different mixes of E and P cores. I am leaning towards 4 P and 32-64 E cores
in a highly threaded workload, just run 6 E cores. most people aren't in that, so you get the fast core + 2 slow ones and 2 other slow ones that are off, saving power budget.
@@dercooney let me be specific. I am talking database transaction processing. serialized memory accesses - pointer chasing code. runs fine on a simple 1.5-2GHz core, could have hundreds of active threads - however, I don't think Intel should jump to hundred cores in the next step, just do 32-64, and keep memory latency low
P-cores will stay at 8 and after alder lake 8 E-cores will most likely be added every year (maybe even less than a year in some instances) Raptor Lake 8+16 core 32 thread (same as alder lake just with more E cores and minor ipc, clock speed and cache gains) Meteor Lake 8+24 core 40 thread (new architechtures for both cores) Arrow Lake 8+32 core 48 thread (new architechtures for both cores again) and so on
@@robinkonig5828 wouldnt the chip size increase with adding 16+ e-cores and if so wont that effect yields negatively. i think we're only seeing the first move here maybe the second move will be mcm foveros 3d packaging p-cores on a different die from e-cores but in same package.
3:32 "Homogeneous core configuration to a Heterogeneous core configuration", That's what he meant. But to those who may nitpick, Heterogeneous multi-core configuration. Good Luck to Intel, that's more complex and harder to accomplish for x86 than arm. If they one day decided this is a bad idea and drop this, Microsoft & Linux Developers won't be too happy. Also TSMC can have some luck too. Hope the yield rate is well or the pricing would be fun... interesting to look forward to. But the lack of comparable numbers gives ME questions that "Aren't the E-cores and P-cores almost same? Isn't it better to stick with the P-cores?" etc. Some comparable numbers would've helped, a lot.
You spoke about watching videos as that being suitable to the E-Cores; but given what is going on in web design and JS engines and just how many resources moder react/vue w/e websites take.... That is going straight to the P-Cores, fosho
Linux might already have the scheduler mostly ready because of Arm's big.LITTLE architecture I guess?! Anyway knowing Intel they'll be ready for Linux.
After all X86 CPUs like Intel and AMD do, are CISC machines, while ARM CPUs are RISC machines. So both devices use completely different instruction sets (Complex-vs Reduced Instruction Set Computer). There is no way Linux could have done any implementation based on ARM bigLittle.
@@Squilliam-Fancyson Parts of the scheduler might be in assembly for efficiency reasons, but it really helps to have the algorithms already written out and tested for a similar idea.
@@Squilliam-Fancyson no one's built a CISC core CPU in 20 years. CISC is just a front end. We do want more than just a few MHz out of CPUs today, you know? I don't think CISC ever cracked the triple digits.
@@1pcfred Linux having a HUGE servers marketshare I don't see Intel not being ready for it on launch. But if they're not, it will be their loss anyway. I'm fine with either Intel or AMD. I'm not a fanboy so I just buy the best for my machine and OS. Same with GPUs.
I have a old CPU, intel i5 2500K on of the best processor for the money at that time, I am really exited to switch my old PC to the new architecture, the main use of my pc is as a WEB Developer I think this kind of architecture will allow to me to work on so many things at the same time using the P -Core and windows and other aplication can run on the e-core in order to allow my main taks running faster
Golden Cove with 20%+ IPC improvement and 8C will put it as the best gaming CPU (in 99% of games, 1% are games that need 10c+). Meanwhile Intels 8C Gracemont with its insane performance efficiency, should lead to 2X MT improvement over 11th gen, allowing Intel to not only catch-up to AMD multi-thread in one year but to surpass them by a few percent. Plus DDR5, plus PCIe5.0. Alder Lake is INSANE. This is Intels biggest launch in a decade, bigger than Zen 1 and chiplets.
I don’t believe their IPC gain claims. We saw what happened with rocket lake, they claimed a significant increase and then the benchmarks came out…. Disappointment would be an understatement.
@@Mojave_Ranger_NCR Rocket Lake did deliver on the IPC gains though. The problem on gaming workloads was that it's IMC was worse than previous generations (due to it being a backport of a 10nm architecture), which led to an increase in latencies that nullified the increases in IPC. Alder Lake won't have (hopefully) any of those problems, so it should be great in gaming and productivity alike.
I Think *By Sticking For A Decade* they meant the core idea of *performance cores & efficient cores.* In fact any architecture leaks from MLID and others *points to hybrid architecture at least until foreseeable 2026-2027.* *Alder Lake, Raptor Lake, Meteor Lake, Arrow Lake, Lunar Lake, Nova Lake* they are all big core - little core designs where *each takes further the disaggregation from a monolithic design of not only the core designs but also of I/O, Memory, igpu, manufacturing & assembling processes, etc.* Alder & Raptor ( *2021-22 Intel 7 - 7 enhanced* ) Lakes are *improvement on scheduling, and use of big cache for gaming and workload in hybrid core designs.* Meteor & Arrow ( *2023-2024 Intel 4 - 3 + TSMC 3 enhanced* ) Lakes are further improvement on them plus *disaggregation of the main cpu systems in tiles ( heterogeneous cores, memory, i/o, igpu, etc )* so they can be made and assembled in different manufacturing places using the best technology possible. Lunar & Nova ( *2025-2026 Intel 20A - 18A + TSMC 2 Gate All Around* ) Lakes are actual attempts of *using all the above mentioned improvements with an enhanced CPU instruction architecture* ( could be x86 enhanced, could be using hybrid x86 + risk/arm and *3-4 threads instead of Two per Performance Core* ) to tackle & fight with arm or efficient micro architecture designs in different sectors. Intel is foreseeing massive improvements in performance from Arrow Lake & onwards. Whether they can succeed doing all that in the foreseeable 6-7 years we will have to wait and see. But for intel it looks like they are indeed sticking with the hybrid design for at least a decade.
Pray for Intel this won't end like Lakefield. Intel already anounced the end of supply of Sunnycove based chips with April 2022. Until this day there are only two devices that adopted Lakefield CPUs, which does not look very promising. LFs performance is not that great and even in it's prime discipline which is "saving power" it falls behind to compareable ARM chips. The 24 threads i9 will have to compete with AMDs 5900X. I highly doubt this will end in a win for Intel. With only 16 high performance threads, this i9 will definitely have a tough time against AMDs consumer primus. But lets be positive. Nice to see Intel tries sth. new and finally overhauls their architecture. For us gamers though, this new gen. wont add much benefit, unless the switch to 10nm. We are still limited to 8 cores 16 threads basically unless we go for an 10900k/10850k which are still Intels most powerful(overall performance) consumer CPUs.
Intel needs to revise their instruction set. There are too many. Some (like string operations) are (perhaps) no longer used. There are a lot of shells and other buildup on this ship.
11:42 it must have been a typo im sure they meant to use the word "Glue" not "Fabrics". How short memory can be, still cool to see intel doing something new. Makes it more exciting now hardware is getting more exotic! Any improvements to windows can only be good for everyone so I hope it's on time or we may end in more of a mess than we had with zen, may even help zen CPU's.
On the E-cores, Intel states its 40% more performance at the same power. But... they are efficiency cores, they will likely never run at that power level, otherwise there's no point to spending silicon space on them if you're just going to spend 65W of power on them. So more likely it'll be the same performance as skylake at 40% less power. Or maybe even more likey, they may be capped to more like 80% the performance of skylake at ~60% less power. Intel can plot a perf/power curve to infinity but they aren't necessarily running the cores at those levels. Now I'm curious if it'll be possible to design benchmarks to really test the efficiency cores vs performance cores.
It probably will be at some point. Doesn't really matter as long as it isn't easy to pull off. If it requires a lot of effort an attack using social engineering is likely going to be a lot more lucrative anyway (you'd be surprised how many people get fooled by fake login pages & the like).
I am the proud owner of a 3700X which makes no noise in daily use and gaming! You can’t do that nearly so easily with Intel’s equivalent processors. Hoping Alder Lake changes that.
The good thing is, big core has a massive 6-way instruction decoder array. Should be the largest ever seen in x86. AMD is still stuck at 4-way decode. Also there is an extra ALU and a loader and some gadgets like FADD. Not nearly a Netburst to Core level improvement(My estimate is about HSW to SKL) but will still make short work of anything AMD has, given they don't mess up something else, or running something that's cache bound only like CPUZ benchmark.
@@mtunayucer We'll see. Keep in mind that successfully extracting 6 non-vectorized instructions from two threads with any reliability is somewhere between "ideal case scenario" and "pie in the sky fantasy." That level of Instruction level parallelism isn't exactly reasonable.
@@mtunayucer Just say Sunny Cove(and other coves of the same design) is the most tragic architecture. It's strong but put off again and again, then abandoned midway on desktop, never reaching its true potential.
@@Alan_Skywalker i honestly got lost after skylake, cuz intel decided to put newest architectures on laptops and leave desktop in the dust. With alder lake, hopefully whole stack is on the latest architecture.
@@jebe4563 But don't forget branch prediction, it tooks a lot of unused slots. For example on 11700K wastes about 20% slots with mispredictions while rendering, and that's just the cost of 4% of branches mispredicted. The other 96%+ correctly predicted instructions will execute in advance, costing even more slots. Of course intel can't get all the slots populated all the time(that's why we don't see a 50% improvements across the board every gen now, and why they've slown down their pace), but that doesn't mean it won't release that 20%+ front end bound when it's bottlenecked, and bring some good perf gains.
Thanks for all your hard work Steve (and GN team)! You’ve educated me immensely, and always kept us up to date on new and important news. I appreciate you!
I find amusement in how Intel is premiering what could be one of the most important architectural changes to the x86 microprocessor of the modern era ... and is still using Skylake era naming conventions which arguably reached their limits two generations ago for the processor itself. I also find amusement in the Gigabyte power supply on the desk.
Considering Intel renamed their future process nodes I still hold out some hope they aren’t afraid to change this , but I also have no idea how you would name processors with two different core types
Hi! I am not a hardcore gamer. I don't mind low performance while gaming. I want to build a Mini-ITX PC using Ryzen 5600g, for some graphic designing work, watching 4k movies, listening to songs and playing games occasionally. Which Mini ITX motherboard would you recommend? It would be great if the board supports future Ryzen APU with better integrated Graphics than the 5600g.
I won't lie, with all those big changes I expected more than a 19% IPC uplift, considering how behind they are from AMD on that front. Maybe they will also have a big frequency boost.
Nah its leaked to be about 5ghz boost. Alder lake is Intels first leap into biglittle. Until like 2023 it'll be a transition period till they can get it bang on. Then we'll see some big ass improvements. Remember we still haven't seen what Jim Keller worked on while he was there.
I'd assume games should fall automatically into p-cores. I say should because third-party responsibilities can cause inconsistencies, especially in niche situations. My gigabyte laptop (intel/nvidia) will use the apu for games on an external monitor unless i extend displays.