Another awesome video Vivek! Just one issue, I could be wrong but I think `Watch` does not send requests over and over rather it opens a HTTP text stream with the API server. It is the informer which opens a text stream and at the same time also sends requests in regular intervals. It costs performance but make sure that the controller is not only edge triggered (triggering when event comes in) but also is level triggered, which ensures that dropping events in a distributed environment doesn't effect the data consistency.
Good point, i was thinking about difference between the "watch" that the informer "does" vs the watch that "we" do, i mean, wasnt the whole point of the informer to prevent the "watch mode" in the first place? Also great video Vivek, loving the series!
Hey Vivek. Thank you for making such quality content. I might be wrong but I think `re-sync` works a bit differently. That is, it does not actually send List calls to the API server. `re-sync` and `re-listing` are two different things. `re-listing` is when the List calls are made to the API server and that happens when the `watch` connection that the informer established breaks for some time and re-connects. Since the informer might have missed some events, it performs `re-listing`. `re-sync` just re-adds everything into the underlying DeltaFIFO for reprocessing by any event handlers. But when using workqueue, we don't have the problem of losing the events. So, `re-sync` doesn't really make sense in my opinion. I am not sure whether go-client informers allow us to do `re-listing` manually. This is very deceiving and the docs don't mention it as clearly as well.
Hi Nithish, Thanks for this valuable comment, I will have to read again to actually continue this conversation. Give me sometime and we will continue this discussion here.
Thanks for the video. One question - how would you track the informer cache latency (freshness) in the event that resync can fail? For example, I set a pod informer resync to 30 seconds. However the recyn in 30 seconds may fail, causing my informer cache to be stale (meaning some new pods are missed). Is there a way I can monitor this staleness?
I am not really sure about the answer. But would we really want to implement this? Considering the fact that k8s is designed in such a way that you dont have any guarantee when a particular thing is going to happen. We can try to configure the resync time maybe.
If I set the resync duration to a long time, does that means I can't get the updated resource from API server until next resynchronization, should I tradeoff the pressure of calling API server and getting the latest resource when I set the timeout
From what I remember when I read about this last, resync period doesn’t mean how often the apiserver is going to be queried for latest resources. There would be a watch call against the apiserver irrespective of the resync period. Resync period helps handling the cases where some of the list/watch calls to aposerver failed abd we dont have those resources. After resync period we would be able to get those resources as well. I might be a bit off here, so please try to read more about it.
This is video is really great. But I still have a question. The informer will ListAndWatch resources from Kube-apiserver every "time.Second*30". If a user deletes a resource, the informer (CR controller) will be altered the event at most "time.Second*30" later? In a nutshell, the "Create", "DELETE", "UPDATE" events are pushed to the CR controller or pulled by the CR controller?
Hi, That's a great question. According to what I have on top of my head, if 30 seconds is re sync time, it won't take 30 seconds for the event to reach controller. The event should be with controller instantaneously. The re sync period has different purpose, it's used to re sync cache so that we will be able to get any lost events etc.
Hey Vivek, to my understanding there are 2 parts in which the informer talks to the apiserver. First one is a full duplex connection(not sure if it is HTTP or grpc or any other protocol) which constantly updates the cache so that function requesting information about a resource can be serviced. Second one is the resync service which queries the cluster in a set interval so that if any updates were lost which are occupying the fullduplex connection and useless at the same time (for any reason, say resource deleted) can be course corrected. Is my interpretation correct?
I think it’s almost accurate. Per my understanding when controller starts it makes a list call to apiserver to initialise the cache and then it uses the watch api to update the initiated store/cache. After the first call to list API succeeds, HasSynced() on informer returns true. That’s why we wait for it to make sure the cache is initialised. You are right about resync period, after resync period an attempt is made to make sure the lost resource are entertained, they might be lost because of any reason for example watch api call failure etc. I hope this makes sense, let me know if you have any follow up questions.
@@viveksinghggits Thanks for the reply, yeah, it makes it clear to me. The controller first creates the cache, calls the watch API, and then returns the HasSynced(). Also, again fantastic content on k8s, I really appreciate you doing this.
How do event handlers will work on events, they are running on lister cache data which will be updated only in a re-sync, right? Or they are using watches internally?
I might be a bit off here but cache get updated using watch method to the api server. So first call to initialize the cache is lest but the subsequent calls are watch.
@@viveksinghggits Thanks for the reply, Yes right I see by Listwatcher we call the watch. Listwatcher is a combination of a list function and a watch function for a specific resource in a specific namespace. Hope if you can make a video on this too explaining it and its benefits from the normal watcher.
Awesome video Vivek! Had a query: What's the best practise to instantiate informers when you want to watch k8s resources across multiple namespaces? Should we have 1 SharedInformerFactory across all namespaces and pass label selector option to filter for required resource OR 1 SharedInformerFactory per namespace? In first option we would have 1 large cache being managed by the SharedInformerFactory whereas in second option we would have multiple small caches. I believe first option would be better in case k8s can batch cache invalidation calls as opposed to multiple API calls to multiple caches in the second option. WDYT? Should there be any other factor to consider while choosing b/w these 2 options? TIA!
Hi Nikhil, Creating one shared informer factory for all the namespaces should be ok. That’s what I have seen in most of the source code that I have seen, here is an example github.com/kubernetes-csi/lib-volume-populator/blob/master/populator-machinery/controller.go
I think there are ways to do this Govind. I don't remember the details on top of my head. Try to search about it, if you are not able to find let me know.
Thanks for the video vivek. can you pls let me know how you have started to learn golang this deep. i learned basic concept, but when i see gocode i am not able to understand such a long and dependent files
Hi Shiva, Thanks for the kind words. Golang is still not my expertise, trying to get better at it. I am familiar with those source files because I am used to work with those APIs.
Vivek - great explaination . I have a question: For theory understanding are there any internet pages/docs/repos we can go through? So that we get understanding of what method to use when for what purpose; Or is this self exploration. Thanks
That's pretty good question Prashant, I think we should at least somehow know that which particular module contains which kind of packages. Forn example we know that api-machinery contains utility methods and types so if we need something like that, we would search for that in api machinery. Similarly types are in separate module etc. And we would get to know about those things I think just by some practical examples and exploring the docs.