virtual-kubelet

Author	SHA1	Message	Date
Sargun Dhillon	c4582ccfbc	Allow providers to update pod statuses We had added an optimization that made it so we dedupe pod status updates from the provider. This ignored two subfields that could be updated along with status. Because the details of subresource updating is a bit API server centric, I wrote an envtest which checks for this behaviour. Signed-off-by: Sargun Dhillon <sargun@sargun.me>	2021-02-16 12:30:53 -08:00
wadecai	3ff1694252	Fix race between k8s and provider when deleting pod	2021-02-16 17:45:55 +08:00
Sargun Dhillon	3a361ebabd	queue: Add tracing This adds tracing throughout the queues, so we can determine what's going on.	2021-02-08 11:07:03 -08:00
Sargun Dhillon	ac9a1af564	Replace golang workqueue with our own This is a fundamentally different API than that of the K8s workqueue which is better suited for our needs. Specifically, we need a simple queue which doesn't have complex features like delayed adds that sit on "external" goroutines. In addition, we need deep introspection into the operations of the workqueue. Although you can get this on top of the K8s workqueue by implementing a custom rate limiter, the problem is that the underlying rate limiter's behaviour is still somewhat opaque. This basically has 100% code coverage.	2021-02-08 11:07:03 -08:00
Sargun Dhillon	82452a73a5	Split out rate limiter per workqueue If you share a ratelimiter between workqueues, it breaks. WQ1: Starts processing item (When) WQ1: Fails to process item (When) WQ1: Fails to process item (When) WQ1: Fails to process item (When) --- At this point we've backed off a bit --- WQ2: Starts processing item (with same key, When) WQ2: Succeeds at processing item (Forget) WQ1: Fails to process item (When) ---> THIS RESULTS IN AN ERROR This results in an error because it "forgot" the previous rate limit.	2021-02-02 11:40:58 -08:00
Sargun Dhillon	1b8597647b	Refactor queue code This refactor is a preparation for another commit. I want to add instrumentation around our queues. The code of how queues were handled was spread throughout the code base, and that made adding such instrumentation nice and complicated. This centralizes the queue management logic in queue.go, and only requires the user to provide a (custom) rate limiter, if they want to, a name, and a handler. The lease code is moved into its own package to simplify testing, because the goroutine leak tester was triggering incorrectly if other tests were running, and it was measuring leaks from those tests. This also identified buggy behaviour: wq := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultItemBasedRateLimiter(), "test") wq.AddRateLimited("hi") fmt.Printf("Added hi, len: %d\n", wq.Len()) wq.Forget("hi") fmt.Printf("Forgot hi, len: %d\n", wq.Len()) wq.Done("hi") fmt.Printf("Done hi, len: %d\n", wq.Len()) --- Prints all 0s because event non-delayed items are delayed. If you call Add directly, then the last line prints a len of 2. // Workqueue docs: // Forget indicates that an item is finished being retried. Doesn't matter whether it's for perm failing // or for success, we'll stop the rate limiter from tracking it. This only clears the `rateLimiter`, you // still have to call `Done` on the queue. ^----- Even this seems untrue	2021-01-08 00:56:05 -08:00
Sargun Dhillon	de7f7dd173	Fix issue #899 : Pod status out of sync after being marked as not ready by controller manager As described in the issue, if the following sequence happens, we fail to properly update the pod status in api server: 1. Create pod in k8s 2. Provider creates the pod and syncs its status back 3. Pod in k8s ready/running, all fine 4. Virtual kubelet fails to update node status for some time for whatever reason (e.g. network connectivity issues) 5. Virtual node marked as NotReady with message: Kubelet stopped posting node status 6. kube-controller-manager of k8s, goes and marks all pods as Ready = false: 7. Virtual kubelet never sync's status of pod in provider back to k8s	2020-12-07 16:50:00 -08:00
Sargun Dhillon	d29adf5ce3	Add Gocritic This also fixes the issues laid out by gocritic	2020-12-06 13:20:03 -08:00
Sargun Dhillon	d64d427ec8	Enable all linters by default This removes the directive from .golangci.yml to disable all linters, and fixes the relevant bugs / issues that are exposed.	2020-12-03 11:33:06 -08:00
Brian Goff	c0296b99fd	Support custom filter for pod event handlers This allows users who have a shared informer that is not filtering on node name to supply a filter for event handlers to ensure events do not fire for pods not scheduled to the node. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-07-30 17:17:42 -07:00
Sargun Dhillon	1e9e055e89	Address concerns with PR Also, just use Kubernetes waiter library.	2020-07-22 18:57:27 -07:00
Sargun Dhillon	12625131b5	Solve the notification on startup pod status notification race condition This solves the race condition as described in https://github.com/virtual-kubelet/virtual-kubelet/issues/836. It does this by checking two conditions when the possible race condition is detected. If we receive a pod notification from the provider, and it is not in our known pods list: 1. Is our cache in-sync? 2. Is it known to our pod lister? The first case can happen because of the order we start the provider and sync our caches. The second case can happen because even if the cache returns synced, it does not mean all of the call backs on the informer have quiesced. This slightly changes the behaviour of notifyPods to that it can block (especially at startup). We can solve this later by using something like a fair (ticket?) lock.	2020-07-22 18:57:27 -07:00
wadecai	ca417d5239	Expose the queue rate limiter	2020-06-26 10:45:41 +08:00
wadecai	fedffd6f2c	Add parameters to support change work queue qps	2020-06-26 10:44:09 +08:00
wadecai	3db9ab97c6	Avoid enqueue when status of k8s pods change	2020-06-13 13:19:55 +08:00
Brian Goff	31c8fbaa41	Apply suggestions from code review Typos and punctuation fixes. Co-Authored-By: Pires <1752631+pires@users.noreply.github.com>	2019-10-24 09:23:33 -07:00
Brian Goff	4ee2c4d370	Re-add support for sync providers This brings back support for sync providers by wrapping them in a provider that handles async notifications.	2019-10-24 09:23:28 -07:00
Sargun Dhillon	d22265e5f5	Do not delete pods in a non-graceful manner This moves from forcefully deleting pods to deleting pods in a graceful manner from the API Server. It waits for the pod to get to a terminal status prior to deleting the pod from api server.	2019-10-17 09:58:21 -07:00
Sargun Dhillon	cdc261a08d	Use go-cmp to compare pods to suppress duplicate updates Rather than copying the pods, this uses go-cmp and filters out the paths which should not be compared.	2019-10-10 13:25:27 -07:00
Sargun Dhillon	4202b03cda	Remove sync provider support This removes the legacy sync provider interface. All new providers are expected to implement the async NotifyPods interface. The legacy sync provider interface creates complexities around how the deletion flow works, and the mixed sync and async APIs block us from evolving functionality. This collapses in the NotifyPods interface into the PodLifecycleHandler interface.	2019-10-02 09:28:09 -07:00
Sargun Dhillon	ea8495c3a1	Wait for Workers to exit prior to returning from PodController.Run This changes the behaviour slightly, so rather than immediately exiting on context cancellation, this calls shutdown, and waits for the current items to finish being worked on before returning to the user.	2019-09-12 11:04:32 -07:00
Brian Goff	bb9ff1adf3	Adds Done() and Err() to pod controller (#735 ) Allows callers to wait for pod controller exit in addition to readiness. This means the caller does not have to deal handling errors from the pod controller running in a gorutine since it can wait for exit via `Done()` and check the error with `Err()`	2019-09-10 17:44:19 +01:00
Sargun Dhillon	33df981904	Have NotifyPods store the pod status in a map (#751 ) We introduce a map that can be used to store the pod status. In this, we do not need to call GetPodStatus immediately after NotifyPods is called. Instead, we stash the pod passed via notifypods as in a map we can access later. In addition to this, for legacy providers, the logic to merge the pod, and the pod status is hoisted up to the loop. It prevents leaks by deleting the entry in the map as soon as the pod is deleted from k8s.	2019-09-04 20:14:34 +01:00
Sargun Dhillon	7accddcaf4	Fix linting errors in node/podcontroller.go	2019-09-03 11:00:33 -07:00
Sargun Dhillon	ccb6713b86	Move location of eventhandler registration This moves the event handler registration until after the cache is in-sync. It makes it so we can use the log object from the context, rather than having to use the global logger The cache race condition of the cache starting while the reactor is being added wont exist because we wait for the cache to startup / go in sync prior to adding it.	2019-08-18 08:20:49 -07:00
Sargun Dhillon	3b3bf3ff20	Add documentation to the provider API about concurrency / mutability This adds documentation around what is allowed to be mutated and what may be accessed concurrently from the provider API. Previously, the API was ambigious, and that meant providers could return pods and change them. This resulted in data races occuring.	2019-08-13 10:29:12 -07:00
Sargun Dhillon	a28969355e	Fix race condition around worker ID generation in podcontroller.go	2019-08-12 10:27:21 -07:00
Sargun Dhillon	3efc9229ba	Add a little bit of documentation to NotifyPods As far as I can tell, based on the implementation in MockProvider NotifyPods is called with the mutated pod. This allows us to take a copy of the Pod object in NotifyPods, and make it so (eventually) we don't need to do a callback to GetPodStatus.	2019-08-06 20:20:59 -07:00
Sargun Dhillon	4d60fc2049	Setup event handler at Pod Controller creation time This seems to avoid a race conditions where at pod informer startup time, the reactor doesn't properly get setup. It also refactors the root command example to start up the informers after everything is wired up.	2019-07-26 13:57:00 -07:00
Sargun Dhillon	ce60fb81d4	Make NewPodController function validate that provider is set In NewPodController we validate that the rest of the config is set to non-nil values. The provider must be non-nil as well.	2019-07-21 16:19:00 -07:00
jerryzhuang	0ba0200067	fix several typo Signed-off-by: zhuangqh <zhuangqhc@gmail.com>	2019-07-17 10:36:17 +08:00
Brian Goff	f7fee27790	Move CLI related packages into internal (#697 ) We don't want people to import these packages, so move these out into private packages.	2019-07-04 10:14:38 +01:00
Brian Goff	9bcc381ca3	Use object informers instead of listers (#669 ) We'll need these informers for https://github.com/virtual-kubelet/virtual-kubelet/pull/586 Changing this now means we don't need to make API changes later.	2019-06-17 18:00:02 +01:00
Brian Goff	3346e9e28c	Remove resourcemanager from conroller public API (#664 ) We still use it internally, but this does not need to be part of the public API. Instead just have callers pass us the relevent listers and we create our own resource manager.	2019-06-12 13:42:03 -07:00
Brian Goff	a54753cb82	Move around some packages (#658 ) * Move tracing exporter registration This doesn't belong in the library and should be configured by the consumer of the opencensus package. * Rename `vkublet` package to `node` `vkubelet` does not convey any information to the consumers of the package. Really it would be nice to move this package to the root of the repo, but then you wind up with... interesting... import semantics due to the repo name... and after thinking about it some, a subpackage is really not so bad as long as it has a name that convey's some information. `node` was chosen since this package deals with all the semantics of operating a node in Kubernetes.	2019-06-12 13:11:49 +01:00

35 Commits