How To Auto-Scale Kubernetes Clusters With Karpenter

DevOps Toolkit

Подписаться 77 тыс.

Просмотров 24 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

1 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 98

@DevOpsToolkit 2 года назад

Would you switch from Kubernetes Cluster-Autoscaler to Karpenter (if you can)? IMPORTANT: For reasons I do not comprehend (and Google support could not figure out), RU-vid tends to delete comments that contain links. Please do not use them in your comments.

@marymekins3546 2 года назад

Seems, like a severe vendor lock-in with using Karpenter. Can it be used with other cloud providers, like for example, hetzner cloud? How does it compare to keda.sh and knative autoscaling? thank you sharing

@DevOpsToolkit 2 года назад

@@marymekins3546 @Mary Mekins it's not really about vendor locking. It is open source and the major question is whether other providers will extend it or not. So, today it's only for EKS and tomorrow... we do not yet know. Keda, KNative, and similar are about horizontal scaling of applications. Karpenter is about scaling clusters/nodes. Those are very different goals even though app scaling often results in cluster scaling.

@marymekins3546 2 года назад

@@DevOpsToolkit Thanks for the clarification. Also, Do you consider Crossplane and Gardener's autoscaling components more relevant for node / cluster autoscaling? Thank you

@DevOpsToolkit 2 года назад

@@marymekins3546 Neither of those (Crossplane and Gardener) has its own Cluster Autoscaler. so they rely on those that are baked into managed Kubernetes offerings (e.g,. GKE, AKS, etc.) or can apply cluster scalers (e.g., Kubernetes Cluster Autoscaler, Karpenters, etc.). What I'm trying to say is that Crossplane and, to an extent, Gardener, are orchestrating infra services rather than providing specific implementations of those services, including cluster scalers.

@umeshranasinghe 3 месяца назад

Great video. Thank you very much!

@stormrage8872 2 года назад

I saw Karpenter i think 30 mins after it went GA. The issue why other providers will probably not contribute is because of the binning technique they used which is tied (and a horrible limitation) of AWS. Then I wanted to use it, can't utilize my launch templates because of custom pods per node, can't utilize crossplane because nodegroups (or this also might fall down to not knowing how) so for now it will be a no go, but a project that steps in the right direction nonetheless. Thanks for the video

@gdevelek Год назад

You didn't explain that "limits:" thing at all. Why set a limit on total request CPU? And what if it's exceeded? No autoscaling???

@DevOpsToolkit Год назад

You're right. CPU limits are arguably useless except for QoS.

@j0Nt4Mbi 2 года назад

awesome explanation Viktor thanks again for a such a valuable information.

@amitmantha7662 10 месяцев назад

So as I installed karpenter on eks cluster, I just want to stop spinning the nodes by karpenter on every weekends automatically, how can I do that..??

@DevOpsToolkit 10 месяцев назад

I never had such a req so I never tried something like that. Why not weekends? Does that mean that you prefer having pods in the pending state instead?

@sophiak4286 9 месяцев назад

can we use karpenter for patching nodes in existing nodegroup. that is nodes not managed my karpenter

@DevOpsToolkit 9 месяцев назад

As far as I know, that is not possible.

@maxmustermann9858 9 месяцев назад

As I understand it only scales new nodes, is there also a way when I have a pod which gets utilized very heavily that a new node is created and the pod is moved to this node, for example when apps in pods can’t scale vertically by just adding more pods.

@DevOpsToolkit 9 месяцев назад

Why would you move a pod to a new node? If you specified memory, CPU, and other constraints, it should be irrelevant where that pod runs as long as those constraints are met.

@maxmustermann9858 9 месяцев назад

@@DevOpsToolkit Ah I get it, so the recourses are statistically assigned and cannot dynamically grow with the Pods Load. My assumption was that when there are no recourse limits defined and let’s say the pod normally runs with 2G of ram but now the load gets quite high the pod now needs 4G of Ram but the current system it’s running on can’t provide more so that the Pod won’t get „throttled“ or the application gets slow maybe there is a way that this pod gets restarted on another host which now has enough recourses.

@DevOpsToolkit 9 месяцев назад

@maxmustermann9858 when resource requests are not specified, pods can use any amount of memory and CPU available on the nodes they are running. However, when the collection of all the pods on a node consume more memory and CPU than the node has, pods without requests are kicked out first to leave resources for pods that do have it specified. So, pods without resource requests are considered less important and kubernetes will sacrifice them before others. Check out Quality of Service concept in kubernetes. Also, kubernetes will soon release the feature of dynamic resource allocation so that resource requests can change without restarting pods. That will be especially useful with vertical pod scalers.

@JohnNguyen-x1w 5 месяцев назад

You're fricking funny 👍. Thank you so much for such a great demo. To the point.

@sparshagarwal1877 Год назад

How to run karpenter on control plane ?

@DevOpsToolkit Год назад

Not sure I understood the question. Are you asking how to run Karpenter pods in control plane nodes? If that's the case, you can't, at least when using managed kubernetes as EKS. You do not have write access to control plane nodes.

@envueltoenplastico 2 года назад

More good news! This looks great. Cluster Autoscaler was wrecking my head. Trying Karpenter out now. Thanks for the video :) Also, I'm using the latest build (0.80-dev) of eksctl, which allows you to define a `karpenter` configuration value to `ClusterConfig`, so hopefully that takes most of the legwork out of the process - I believe all that's necessary after that is to create `Provisioner` resources as required.

@DevOpsToolkit 2 года назад

That's not the surprise. WeaveWorks is the company that is heavily involved in all AWS k8s OSS projects so it was to be expected that they'll extend eksctl (they did the most contributions to it).

@miletacekovic 2 года назад

One question: Is Karpenter capable of vertical auto-scaling down? Typical example: Consider a new project started as a Monolith and one big pod is required for initial deployment. Karpenter allocates one big node to fit it. Now as project continues and grows, it is decomposed in Microservices and 10 small pods are used for full system. Is Karpenter capable of replacing a big node with say two much smaller nodes as that might be cheaper than the one big node?

@DevOpsToolkit 2 года назад

Yes. It's doing that fairly well. Its main strength is that it creates nodes that are just the right size for the pending workload.

@miletacekovic 2 года назад

@@DevOpsToolkit Wow, thanks for the answer!

@felipeozoski Год назад

Thanks for another awesome video Viktor :)

@sarvanvik1835 2 года назад

Hi sir,if we use karpenter ,if we want to upgrade the worker node to new version which is in node group,and also newly scaled groupless worker node,What would happen can u clear my doubt?

@DevOpsToolkit 2 года назад

That's a bit "clanky" right now. You'd need to see TTL on the nodes so that they "expire" and be replaced by new nodes which will follow the version you have of your cluster. The good news is that improvements for that are coming. You might want to follow github.com/aws/karpenter/issues/1738. You'll see over there that some additional options are already added while others are in progress.

@ChrisShort 2 года назад

Thanks!

@sebastiansMcuProjekte 2 года назад

Didn't even include a link and my prior comment got removed. Check out spot, maybe it's worth a video?

@DevOpsToolkit 2 года назад

RU-vid has a nasty tendency to remove comments without any obvious reason. Can you please send me the idea over Twitter (@vfarcic) or LinkedIn (www.linkedin.com/in/viktorfarcic/)?

@javisartdesign 2 года назад

Aswesome! Wanted to see a working example of its use. Thanks

@barefeg 2 года назад

How would one have both cluster autoscaler and Karpenter running in the same cluster? Is it just using the special nodeSelector for karpenter to schedule those? I would like to try it out but without committing to it the whole way

@DevOpsToolkit 2 года назад

I haven't tried using both so I'm not sure how it would work and what would need to be done to make that happen. I would rather experiment with it in a new temporary cluster.

@srivathsaharishvenk Год назад

legend!

@kavilofi 6 месяцев назад

Awesome explanation....🤩

@kavilofi 6 месяцев назад

sir please explanation about k8s keda

@DevOpsToolkit 6 месяцев назад

@devopsguy- here it goes... KEDA: Kubernetes Event-Driven Autoscaling ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-3lcaawKAv6s.html

@agun21st Год назад

Very details explainations about Karpenter. Thank you so much sir.

@DevOpsToolkit Год назад

You are most welcome

@barefeg 2 года назад

Do we need to have eksctl configurations for node groups at all?

@DevOpsToolkit 2 года назад

You do need a node group for the cluster so that you get the initial node where you'll install karpenter. You do not have to use eksctl to create that group, but you do have to have it, even if it's for karpenter alone. That's why I complained in the video that it should run on control plane nodes.

@luisrodriguezgarcia1282 2 года назад

Have I understood correctly? This is just for EKS and not just EKS ... Just for EKS created with eksctl... What about the EKS clusters created with terraform? Can not be managed with karpenter? Great video as usual Víctor, by the way.

@DevOpsToolkit 2 года назад

You're partly right. Currently, Karpenter works only with EKS. The initial examples are using eksctl and Terraform examples were added recently. That, however, does not mean that it does not work with other tools. You should be able to use it with EKS clusters no matter which tool you're using to manage them. A bigger problem is with other providers (e.g., GCPO, Azure, etc.). Karpenter project is hoping to attract contributions from others (currently it's mostly AWS folks), but that is yet to be seen.

@leonardo_oliveira241 2 года назад

@@DevOpsToolkit What about Fargate? In the documentation has a mention to work with Fargate.

@DevOpsToolkit 2 года назад

@@leonardo_oliveira241 Fargate is EKS with a layer on top so it does work with it.

@bartekr5372 2 года назад

Nice. Let us consider cluster running hpa and cluster-autoscaler outside of peak hours. If you have a good distribution of pods and hpa starts to decrease the number of replicas you may end up having some nodes underutilized. Released capacity will occur on some of worker nodes. In such condition i always find cluster-autoscaler slow. Can we expect Karpenter to be more active or even doing some optimization? By optimization i mean compaction of unused capacity (something that deschedulers try to acchieve) or optimizing worker node sizes?

@DevOpsToolkit 2 года назад

So far, I think that Karpenter is only marginally better at scaling down nodes that are underutilized. The part that works fairly well is when it scales up for a single pending pod and when that pod is removed, it removes the node almost instantly. That part looks very similar to what GKE Autopilot is doing. The project is still young so we'll see. It's better than Cluster Autoscaler in EKS but we're yet to see whether it will go beyond that (as it should).

@georgeanastasiou2680 2 года назад

Hello Victor, thank you for your video, does it also consider multizone workloadss to instantiate nodes in multiple zones per region, something that as far as i know this is currently be accomodated by the upstream cluster-autoscaler project. Thank you

@DevOpsToolkit 2 года назад

Yes. It does that :) The main advantage of Karpenter is that you have much more control over the relation between pending pods and the nodes that should be created to run them.

@georgeanastasiou2680 2 года назад

@@DevOpsToolkit thank you :)

@igorluizdesousasantos4965 11 месяцев назад

Amazing content 🎉🎉

@DaniVendettaXII 2 года назад

I'm trying Karpenter, and I found a cons, I'm investigating but... When the workload decrease the nodes are not changed. I mean, for example you scale to 10 replicas, and Karpenter decide to provision 1 c5n.2xlarge instance. Some time later you scaledown your pods from 10 to 6, and your instance can change to t3.medium, (for instance), I've observated that Karpenter is not adjusting the instance for the current workload. I have to do more test and experiments with karpenter, but until now is the thing that I've see. Thanks for the video and the channel Victor/DevOps Toolkit. Kind Regars.

@DevOpsToolkit 2 года назад

Scaling down is a problem with all cluster autoscalers including Karpenter :(

@DaniVendettaXII 2 года назад

@@DevOpsToolkit Hi Victor, but, with cluster autoscaler, at least in my configuration, if replicas goes down, and then remaining pods can fit in other workers, the autoscaler evitc that pods, taint the node selected for be deleted, and the pods are re-schedulerd in a existing worker. After that, the worker is deleted, it's more slowly and less accurate. With cluster autoscaler it's easier to ha ve more resources that you need, but with karpenter I can see in scaling down , we have the same problem. Maybe it's resolved in the future, but I see some scopes where karpenter can be more useful than cluster autoscaler and viceversa. Another point to take in consideration, is how to configure the providers, Since Karpenter is pretending to put all new pods in a workers, can have the probability of put all the pods in the same AZ, Today probable I'll trying combine providers with nodeaffinity and pod-ANTI-affinity to see if I can put pods in all my AZs. Again thanks for that nice work, video and channel, and I really appreciate it your answer.

@DevOpsToolkit 2 года назад

You're right. Karpenter solves some of the problems well while others are far from being solved. It's a new project so we're yet to see whether it will mature. My main concern right now, before other issues are solved, is whether other providers will pick it up and even whether AWS will include it into EKS. If neither of those happen, it's a sign that vendors do not trust it.

@srikrishnachaitanyagadde926 2 года назад

Are eks ipv6 clusters supported with karpenter?

@DevOpsToolkit 2 года назад

There are issues with it (e.g., github.com/aws/karpenter/issues/1241).

@barefeg 2 года назад

How do you track all of these new solutions that come up?

@DevOpsToolkit 2 года назад

In some cases, I search for specific solutions that complement those I'm already using. In others, I hear about a tool and put it to my TODO list. In any case, I tend to spend a lot of time (including weekends and nights) on learning.

@bled_2033 2 года назад

Very well explained!

@RakeshKumar-eb9re 2 года назад

To the point 👌

@reddinghiphop1 Год назад

fantastic

@jdiegosf 2 года назад

Excellent!!!

@TheApeMachine 2 года назад

Make Karpenter and ArgoCD fight it out :p

@DevOpsToolkit 2 года назад

Those are very different tools that serve different objectives, so the fight would not be fair. Karpenter could be compared to Cluster Autoscaler or, even better, EKS with Karpenter could be compared with GKE Autopilot.

@TheApeMachine 2 года назад

@@DevOpsToolkit Not compare, fight. Karpenter tries to change the cluster, ArgoCD fights for consistency with state in git :)

@MichaelBushey 2 года назад

@@TheApeMachine They won't fight at all. If the cluster does not have the resources the pods applied via ArgoCD will stay pending. ArgoCD will do it's job, the cluster just won't be able to run it all if it's not big enough.

@snygg-johan9958 2 года назад

Very nice! Does it also work with hpa during high loads?

@DevOpsToolkit 2 года назад

It does. HPA scales your apps and if some of the pods end up in the pending state Karpenter will scale up the cluster :)

@snygg-johan9958 2 года назад

@@DevOpsToolkit Thanks for the response! Then Im going to check it out :-)

@herbertpurpora9452 2 года назад

question: I'm new to kubernetes and aws. but base on my understanding, using karpenter will make our eks cluster cost change dynamically right?

@DevOpsToolkit 2 года назад

Karpenter and similar cluster autoscaler solutions are adding servers when you need them and shitting them down when you don't. AWS, on the other hand, charges things you use. The more optimized usage is, the less you pay.

@guangguang1984 Год назад

Very nice video, thanks! Got 1 questions: As there is no autoscaling group, how can I scale in nodes conveniently in mannual?

@DevOpsToolkit Год назад

Your cluster is still created with a node group and you can always add additional node groups. It's just that those managed automatically are without a node group.

@gvoden 2 года назад

Can I use Karpenter with my clusters that are leveraging managed node groups or do I have to get rid of the node groups first? How would the cluster upgrade process change if I use Karpenter? (I assume I can still do rolling updates regardless). And finally, should I be deploying Karpenter as a DaemonSet?

@DevOpsToolkit 2 года назад

Karpenter does not use managed node group which are essentially based on AWS auto-scaling groups (ASGs). It's intentionally avoiding ASGs because they are slow and because they are managing instances based on same instance types and sizes. Karpenter is avoiding it so that the process is (much) faster (ASG is slow) and so that it can create VMs with sizes that fit pending load. In other words, it's a good thing that it does not use ASGs. That being said, there is nothing preventing you from having a cluster based on managed node group. It's just that the nodes created by Karpenter will not be using it (it'll NOT use ASGs associated with managed node groups). There should be no difference in the upgrade process. New nodes will be created based on the new version and the old nodes will be shut down (rolling updates). There's no need to run Karpenter as DaemonSet. It's not the type of service that needs to run on each node of the cluster.

@shuc1935 2 года назад

Quick question out of curiosity: Since Karpernter auto scaling offering is group less , can we spin up a n eks cluster without nodegroup definition i.e. with zero worker node and based on the deployment resource requests , have Karpenter provision a group less node with appropriate capacity to run the requested application ?

@DevOpsToolkit 2 года назад

That would be possible if Karpenter would be running on control plane nodes (like most of other cluster scalers are running). As it is now, it needs to run on worker nodes and that means that the cluster needs to have at least one where Karpenter will be running before it starts scaling up (and down).

@shuc1935 2 года назад

Never mind, you indeed mentioned that Karpenter can't be deployed on control plane nodes so in order to implement cluster auto scaling we must have at least one node in a node group which is kind of waste from a node group stand point but it's better than regular CA on EKS. I was curious to see if Karpenter could have been the solution for truly fully managed serverless k8s solution on AWS.

@DevOpsToolkit 2 года назад

@@shuc1935 Managed Kubernetes services like EKS, GKE, AKS, etc. do not allow users to access control planes. That means that AWS would need to bake Karpenter into EKS itself. I hope they'll do that. Ideally, it should be a single checkbox asking people to enable Autoscaling which, currently, does not exist in EKS in any form without using Fargate.

@shuc1935 2 года назад

@@DevOpsToolkit yep, like gke auto pilot --enable auto scaling

@shuc1935 2 года назад

Also eks with with fargate profile is only partially fully managed based off think ahead of time namespace speculation

@kiotetheone 2 года назад

Thanks!

@DevOpsToolkit 2 года назад

Thanks a ton!

@bmutziu 2 года назад

Mulțumim!

@DevOpsToolkit 2 года назад

Thanks a ton, Bogdan.

@bmutziu 2 года назад

@@DevOpsToolkit It's nothing, Viktor.

@aswinkumar3396 2 года назад

Questions : when using karpenter with eks image is not pulling from private repository like sonartype nexus

@DevOpsToolkit 2 года назад

Which image you're referring to? Image of carpenter itself or...?

@aswinkumar3396 2 года назад

Docker image of our python project which we have stored sonartype nexus

@aswinkumar3396 2 года назад

network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized Error: ErrImagePull

@DevOpsToolkit 2 года назад

@@aswinkumar3396 That's not related to scaling of the cluster. Karpenter will increase (or decrease) the nodes of the cluster allowing Kubernetes to schedule pending pods in the same way those would be scheduled without Karpenter.

@DevOpsToolkit 2 года назад

@@aswinkumar3396 I think you might be facing the same issue as github.com/aws/karpenter/issues/1391