virtual-kubelet

Author	SHA1	Message	Date
Sargun Dhillon	c314045d60	Ensure that delete dangling pods which are still deleting at startup (#784 ) If a pod is being gracefully deleted at podcontroller startup, it will not get deleted via the deletedanglingpods code. This ensures the normal deletion loop covers the case.	2019-10-22 06:45:36 -04:00
Sargun Dhillon	d22265e5f5	Do not delete pods in a non-graceful manner This moves from forcefully deleting pods to deleting pods in a graceful manner from the API Server. It waits for the pod to get to a terminal status prior to deleting the pod from api server.	2019-10-17 09:58:21 -07:00
Sargun Dhillon	871424368f	Fix pod status updates for when pod is updated outside of VK Pods can be updated outside of VK. Right now, if this happens, pod status updates are dropped because the resourceversion from the provider will mismatch with what's on the server, breaking pod status updates. Since we're the only ones writing to the pod status, we can do a blind overwrite.	2019-10-11 16:32:48 -07:00
Sargun Dhillon	cdc261a08d	Use go-cmp to compare pods to suppress duplicate updates Rather than copying the pods, this uses go-cmp and filters out the paths which should not be compared.	2019-10-10 13:25:27 -07:00
Sargun Dhillon	4202b03cda	Remove sync provider support This removes the legacy sync provider interface. All new providers are expected to implement the async NotifyPods interface. The legacy sync provider interface creates complexities around how the deletion flow works, and the mixed sync and async APIs block us from evolving functionality. This collapses in the NotifyPods interface into the PodLifecycleHandler interface.	2019-10-02 09:28:09 -07:00
toshi0607	bcfc2accf8	misspell	2019-09-26 20:52:06 +09:00
toshi0607	b712751c6d	gofmt	2019-09-26 20:50:36 +09:00
Sargun Dhillon	82a430ccf7	Add unused code linter	2019-09-24 12:55:52 -07:00
Sargun Dhillon	ea8495c3a1	Wait for Workers to exit prior to returning from PodController.Run This changes the behaviour slightly, so rather than immediately exiting on context cancellation, this calls shutdown, and waits for the current items to finish being worked on before returning to the user.	2019-09-12 11:04:32 -07:00
Brian Goff	334baa73cf	Merge pull request #743 from chewong/pod-status-nil-pointer Add unit tests for #584	2019-09-11 14:49:55 -07:00
Brian Goff	bb9ff1adf3	Adds Done() and Err() to pod controller (#735 ) Allows callers to wait for pod controller exit in addition to readiness. This means the caller does not have to deal handling errors from the pod controller running in a gorutine since it can wait for exit via `Done()` and check the error with `Err()`	2019-09-10 17:44:19 +01:00
Ernest Wong	fdb0c805f7	Add more unit test to #584	2019-09-05 10:48:35 -07:00
Ernest Wong	dc7ff44303	Add unit tests for #584	2019-09-05 09:49:41 -07:00
Sargun Dhillon	da57373abb	Test pods going missing while they're running in legacy providers (#759 ) We poll legacy providers for their pod(s) status periodically. This is because we have no way of knowing when the pod is updated. If the pod somehow goes missing in the provider, that state must be handled. Currently, we update API server, and mark the pod as failed, or ignore it.	2019-09-04 22:16:14 +01:00
Sargun Dhillon	33df981904	Have NotifyPods store the pod status in a map (#751 ) We introduce a map that can be used to store the pod status. In this, we do not need to call GetPodStatus immediately after NotifyPods is called. Instead, we stash the pod passed via notifypods as in a map we can access later. In addition to this, for legacy providers, the logic to merge the pod, and the pod status is hoisted up to the loop. It prevents leaks by deleting the entry in the map as soon as the pod is deleted from k8s.	2019-09-04 20:14:34 +01:00
Sargun Dhillon	7133a372d6	Mark current linting errors as non-errors This is basically claiming linting bankruptcy. It marks all of the issues we had up until this point as nolint.	2019-09-03 11:00:33 -07:00
Sargun Dhillon	5949e6279d	Miscellaneous cleanup for linting	2019-09-03 11:00:33 -07:00
Sargun Dhillon	9cce8640a5	Fix linting errors in node/pod_test.go This moves away from defining pods independently. It moves pod (spec) generation to an independent function.	2019-09-03 11:00:33 -07:00
Sargun Dhillon	7accddcaf4	Fix linting errors in node/podcontroller.go	2019-09-03 11:00:33 -07:00
Brian Goff	2507f57f97	Merge pull request #732 from sargun/move-around-reactor Move location of eventhandler registration	2019-09-03 10:44:52 -07:00
Sargun Dhillon	43ee086360	Fix mock_test DeletePod to store updated pod status	2019-08-25 10:42:35 -07:00
Sargun Dhillon	ccb6713b86	Move location of eventhandler registration This moves the event handler registration until after the cache is in-sync. It makes it so we can use the log object from the context, rather than having to use the global logger The cache race condition of the cache starting while the reactor is being added wont exist because we wait for the cache to startup / go in sync prior to adding it.	2019-08-18 08:20:49 -07:00
Sargun Dhillon	69f1186713	Do not mutate pods, nor hand off pod references to provider This moves to a model where any time that pods are given to a provider, it uses a DeepCopy, as opposed to a reference. If the provider mutates the pod, it prevents it from causing issues with the informer cache. It has to use reflect instead of comparing the hashes because spew prints DeepCopy'd data structures ever so slightly differently.	2019-08-15 09:59:01 -07:00
Sargun Dhillon	89d88a17ed	Add a generic reactor to lifecycle_test to bump resource version (#733 ) All updates in our tests should have the behaviour that best reflects what API server does.	2019-08-15 08:46:38 +01:00
Sargun Dhillon	bc2f6e0dc4	Wait for the informer to become in sync before starting tests If the informers are starting at the same time as createPods, then we can get into a situation where the pod seems to get "lost". Instead, we wait for the informer to get into sync prior to the createpod event. This also moves to one informer as a microoptimization in the tests.	2019-08-14 07:03:53 -07:00
Brian Goff	47f5aa45df	Merge pull request #727 from ethan-daocloud/patch-2 cleanup: fix some typos in node.go	2019-08-13 12:00:43 -07:00
Brian Goff	569706f371	Merge branch 'master' into document-api	2019-08-13 11:47:04 -07:00
Guangming Wang	cb307df71e	cleanup: fix some typos in node.go Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>	2019-08-13 11:39:00 -07:00
Sargun Dhillon	edc0991c0c	Fix hotloop around scheduling in lifecycle_test Lifecycle test had a hotloop, where it would run a never-yielding function while processing was going on elsewhere. This inserts a sleep. A sleep is used rather than a yield to be kind to people's battery life.	2019-08-13 11:25:21 -07:00
Sargun Dhillon	fbed4ca702	Remove usage of atomics It turns out that running atomic.Read(...) in a tight loop breaks Golang. The goroutine would never yield control over the scheduler, so we ended up getting into a situation where the test would get stuck forever. This moves to a different model, in which there is a condition var, instead of atomics in loops.	2019-08-13 11:25:21 -07:00
Sargun Dhillon	9b27eb83fe	Make mock_test follow the aformentioned documentation	2019-08-13 10:30:02 -07:00
Sargun Dhillon	3b3bf3ff20	Add documentation to the provider API about concurrency / mutability This adds documentation around what is allowed to be mutated and what may be accessed concurrently from the provider API. Previously, the API was ambigious, and that meant providers could return pods and change them. This resulted in data races occuring.	2019-08-13 10:29:12 -07:00
Pires	f0a0e8cbfe	Merge branch 'master' into upgrade-k8s-v2	2019-08-13 10:43:00 +01:00
Sargun Dhillon	5c2b682cdc	Array of minor fixups to lifecycle tests * Fix the deletion test to actually test the pod is deleted * Fix the update pods test to update a value which is allowed to be updated * Shut down watches after tests * Do not delete pod statuses on DeletePod in mock_test This intentionally leaks pod statuses, but it makes the situation a lot less complicated around handling race conditions with the GetPodStatus callback	2019-08-12 12:10:29 -07:00
Sargun Dhillon	5ac33e4b0a	Fix race conditions in node_test	2019-08-12 11:33:48 -07:00
Brian Goff	10b291dba1	Merge branch 'master' into patch-1	2019-08-12 10:48:15 -07:00
Sargun Dhillon	ad6cd7d552	Upgrade K8s * Upgrade k8s.io/api go get k8s.io/api@kubernetes-1.15.2 * Upgrade k8s.io/apimachinery go get k8s.io/apimachinery@kubernetes-1.15.2 * Upgrade kubernetes-1.15.2 go get k8s.io/client-go@kubernetes-1.15.2 * Upgrade kk8s.io/kubernetes to v1.15.2 go get k8s.io/kubernetes@v1.15.2 This also locks the the dependency for github.com/prometheus/client_golang/prometheus due to a golang bug, and to please the validation scripts. The replaces were generated by: go get k8s.io/kubernetes@v1.15.2 2> fail for i in $(cat fail\|grep unknown\|cut -f1 -d@\|cut -f2 -d" ") do echo "replace ${i} => ${i} kubernetes-1.15.2" done	2019-08-12 10:29:19 -07:00
Sargun Dhillon	a28969355e	Fix race condition around worker ID generation in podcontroller.go	2019-08-12 10:27:21 -07:00
ethan	75a1877d9f	cleanup: fix misspelled words in error message Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>	2019-08-10 19:03:44 +08:00
Sargun Dhillon	3efc9229ba	Add a little bit of documentation to NotifyPods As far as I can tell, based on the implementation in MockProvider NotifyPods is called with the mutated pod. This allows us to take a copy of the Pod object in NotifyPods, and make it so (eventually) we don't need to do a callback to GetPodStatus.	2019-08-06 20:20:59 -07:00
Sakura	7188238caa	fix a to an in annotation (#715 )	2019-08-05 20:13:40 +01:00
Sargun Dhillon	50bbc3d1d4	Add tests around updates This makes sure the update function works correctly after the pod is running if the podspec is changed. Upon writing the test, I realized we were accessing the variables outside of the goroutine that the workers with tests were running in, and we had no locks. Therefore, I converted all of those numbers to use atomics.	2019-07-30 09:13:43 -07:00
Sargun Dhillon	bd8e39e3f9	Add a benchmark for pod creation This adds a benchmark for pod creation and makes the mock_test provider actually work correctly in concurrent situations.	2019-07-30 09:12:56 -07:00
Sargun Dhillon	ce38d72c0e	Add additional lifecycle tests * Don't scheduled failed, or succeeded pods * Delete dangling pods	2019-07-30 06:56:54 -07:00
Sargun Dhillon	4a270fea08	Add a test which tests the e2e lifecycle of the pod controller This uses the mock provider, so I moved the mock provider to a location where the node test can use it.	2019-07-30 06:56:54 -07:00
Sargun Dhillon	4d60fc2049	Setup event handler at Pod Controller creation time This seems to avoid a race conditions where at pod informer startup time, the reactor doesn't properly get setup. It also refactors the root command example to start up the informers after everything is wired up.	2019-07-26 13:57:00 -07:00
Sargun Dhillon	ce60fb81d4	Make NewPodController function validate that provider is set In NewPodController we validate that the rest of the config is set to non-nil values. The provider must be non-nil as well.	2019-07-21 16:19:00 -07:00
jerryzhuang	0ba0200067	fix several typo Signed-off-by: zhuangqh <zhuangqhc@gmail.com>	2019-07-17 10:36:17 +08:00
Brian Goff	8493cbb42a	Unexport node update helper functions (#701 ) Thinking these maybe should either not be exposed or in a separate package. For 1.0 let's unexport them and we may re-introduce later.	2019-07-05 19:24:46 +01:00
Brian Goff	f7fee27790	Move CLI related packages into internal (#697 ) We don't want people to import these packages, so move these out into private packages.	2019-07-04 10:14:38 +01:00

1 2 3

108 Commits