virtual-kubelet

Author	SHA1	Message	Date
Adrien Trouillaud	845b4cd409	upgrade k8s libs to 1.18.4	2020-07-07 21:00:56 -07:00
Sargun Dhillon	e805cb744a	Introduce three-way patch for proper handling of out-of-band status updates As described in the patch itself, there is a case that if a node is updated out of band (e.g. node-problem-detector (https://github.com/kubernetes/node-problem-detector)), we will overwrite the patch in our typicaly two-way strategic patch for node status updates. The reason why the standard kubelet can do this is because the flow goes: apiserver->kubelet: Fetch current node kubelet->kubelet: Update apiserver's snapshot with local state changes kubelet->apiserver: patch We don't have this luxury, as we rely on providers making a callback into us in order to get the most recent pod status. They do not have a way to do that merge operation themselves, and a two-way merge doesn't give us enough metadata. In order to work around this, we perform a three-way merge on behalf of the user. We do this by stashing the contents of the last update inside of it. We then fetch that status back, and use that for the future update itself. In the upgrade case, or the case where the VK has been created by "someone else", we do not know which attributes were created by or written by us, so we cannot generate a three way patch. In this case, we will do our best to avoid deleting any attributes, and only overwrite them. We will consider all current api server values written by "someone else", and not edit them. This is done by considering the "old node" to be empty.	2020-07-06 11:10:32 -07:00
Brian Goff	5306173408	Merge pull request #846 from sargun/add-trace-to-updateStatus Add instrumentation to node controller (tracing)	2020-07-01 12:53:27 -07:00
Sargun Dhillon	30aabe6fcb	Add instrumentation to node controller (tracing) This adds tracing in node controller in several sections where it was missing.	2020-07-01 12:40:09 -07:00
Sargun Dhillon	1e8c16877d	Make node status updates non-blocking There's a (somewhat) common case we can get into where the node status update loop is busy while a provider is trying to send a node status update. Right now, we block the provider from creating a notification in this case.	2020-07-01 12:32:54 -07:00
wadecai	ca417d5239	Expose the queue rate limiter	2020-06-26 10:45:41 +08:00
wadecai	fedffd6f2c	Add parameters to support change work queue qps	2020-06-26 10:44:09 +08:00
Weidong Cai	2398504d08	dedup in updatePodStatus (#830 ) Co-authored-by: Brian Goff <cpuguy83@gmail.com>	2020-06-15 14:35:14 -07:00
wadecai	3db9ab97c6	Avoid enqueue when status of k8s pods change	2020-06-13 13:19:55 +08:00
Brian Goff	51b9a6c40d	Fix stream timeout defaults This was an unintentional breaking change in `0bdf742303` A timeout of 0 doesn't make any sense, so use the old values of 30s as a default.	2020-06-03 10:01:34 -07:00
Vilmos Nebehaj	3e0d03c833	Use errdefs.InvalidInputf() for formatting	2020-04-28 11:19:37 -07:00
Vilmos Nebehaj	7628c13aeb	Add tests for parseLogOptions()	2020-04-28 11:19:37 -07:00
Vilmos Nebehaj	8308033eff	Add support for v1.PodLogOptions	2020-04-28 11:19:37 -07:00
wadecai	30e31c0451	Check pod status equal before enqueue	2020-04-21 10:42:29 +08:00
Sargun Dhillon	5ad12cd476	Add /pods HTTP endpoint	2020-03-20 12:04:00 -07:00
guoliangshuai	554d30a0b1	add 'GET' method to pod exec handler, so it can support websocket	2020-03-09 14:16:49 +08:00
Vilmos Nebehaj	47112aa5d6	Use correct Flush() prototype from http.Flusher When calling GetContainerLogs(), a type check is performed to see if the http.ResponseWriter supports flushing. However, Flush() in http.Flusher does not return an error, therefore the type check will always fail. Fix the flushWriter helper interface so flushing the writer will work.	2020-01-20 13:27:36 -08:00
Weidong Cai	0bdf742303	Make exec timeout configurable (#803 ) * make exec timeout configurable	2020-01-18 12:11:54 -08:00
wadecai	55f3f17ba0	add some event to pod	2019-11-29 14:33:00 +08:00
Brian Goff	6e33b0f084	[Sync Provider] Fix panic on not found pod status	2019-11-15 09:44:29 -08:00
Thomas Hartland	c258614d8f	After handling status update, reset update timer with correct duration If the ping timer is being used, it should be reset with the ping update interval. If the status update interval is used then Ping stops being called for long enough to cause kubernetes to mark the node as NotReady.	2019-11-11 14:29:52 +01:00
Thomas Hartland	3783a39b26	Add test for node ping interval	2019-11-11 14:29:52 +01:00
Brian Goff	0ccf5059e4	Put sync lifecycle tests being -short flag. This lets you skip tests for the slower sync provider.	2019-10-29 15:05:35 -07:00
Brian Goff	31c8fbaa41	Apply suggestions from code review Typos and punctuation fixes. Co-Authored-By: Pires <1752631+pires@users.noreply.github.com>	2019-10-24 09:23:33 -07:00
Brian Goff	4ee2c4d370	Re-add support for sync providers This brings back support for sync providers by wrapping them in a provider that handles async notifications.	2019-10-24 09:23:28 -07:00
Sargun Dhillon	c314045d60	Ensure that delete dangling pods which are still deleting at startup (#784 ) If a pod is being gracefully deleted at podcontroller startup, it will not get deleted via the deletedanglingpods code. This ensures the normal deletion loop covers the case.	2019-10-22 06:45:36 -04:00
Sargun Dhillon	d22265e5f5	Do not delete pods in a non-graceful manner This moves from forcefully deleting pods to deleting pods in a graceful manner from the API Server. It waits for the pod to get to a terminal status prior to deleting the pod from api server.	2019-10-17 09:58:21 -07:00
Sargun Dhillon	871424368f	Fix pod status updates for when pod is updated outside of VK Pods can be updated outside of VK. Right now, if this happens, pod status updates are dropped because the resourceversion from the provider will mismatch with what's on the server, breaking pod status updates. Since we're the only ones writing to the pod status, we can do a blind overwrite.	2019-10-11 16:32:48 -07:00
Sargun Dhillon	cdc261a08d	Use go-cmp to compare pods to suppress duplicate updates Rather than copying the pods, this uses go-cmp and filters out the paths which should not be compared.	2019-10-10 13:25:27 -07:00
Sargun Dhillon	4202b03cda	Remove sync provider support This removes the legacy sync provider interface. All new providers are expected to implement the async NotifyPods interface. The legacy sync provider interface creates complexities around how the deletion flow works, and the mixed sync and async APIs block us from evolving functionality. This collapses in the NotifyPods interface into the PodLifecycleHandler interface.	2019-10-02 09:28:09 -07:00
toshi0607	bcfc2accf8	misspell	2019-09-26 20:52:06 +09:00
toshi0607	b712751c6d	gofmt	2019-09-26 20:50:36 +09:00
Sargun Dhillon	82a430ccf7	Add unused code linter	2019-09-24 12:55:52 -07:00
Sargun Dhillon	ea8495c3a1	Wait for Workers to exit prior to returning from PodController.Run This changes the behaviour slightly, so rather than immediately exiting on context cancellation, this calls shutdown, and waits for the current items to finish being worked on before returning to the user.	2019-09-12 11:04:32 -07:00
Brian Goff	334baa73cf	Merge pull request #743 from chewong/pod-status-nil-pointer Add unit tests for #584	2019-09-11 14:49:55 -07:00
Brian Goff	bb9ff1adf3	Adds Done() and Err() to pod controller (#735 ) Allows callers to wait for pod controller exit in addition to readiness. This means the caller does not have to deal handling errors from the pod controller running in a gorutine since it can wait for exit via `Done()` and check the error with `Err()`	2019-09-10 17:44:19 +01:00
Ernest Wong	fdb0c805f7	Add more unit test to #584	2019-09-05 10:48:35 -07:00
Ernest Wong	dc7ff44303	Add unit tests for #584	2019-09-05 09:49:41 -07:00
Sargun Dhillon	da57373abb	Test pods going missing while they're running in legacy providers (#759 ) We poll legacy providers for their pod(s) status periodically. This is because we have no way of knowing when the pod is updated. If the pod somehow goes missing in the provider, that state must be handled. Currently, we update API server, and mark the pod as failed, or ignore it.	2019-09-04 22:16:14 +01:00
Sargun Dhillon	33df981904	Have NotifyPods store the pod status in a map (#751 ) We introduce a map that can be used to store the pod status. In this, we do not need to call GetPodStatus immediately after NotifyPods is called. Instead, we stash the pod passed via notifypods as in a map we can access later. In addition to this, for legacy providers, the logic to merge the pod, and the pod status is hoisted up to the loop. It prevents leaks by deleting the entry in the map as soon as the pod is deleted from k8s.	2019-09-04 20:14:34 +01:00
Sargun Dhillon	7133a372d6	Mark current linting errors as non-errors This is basically claiming linting bankruptcy. It marks all of the issues we had up until this point as nolint.	2019-09-03 11:00:33 -07:00
Sargun Dhillon	5949e6279d	Miscellaneous cleanup for linting	2019-09-03 11:00:33 -07:00
Sargun Dhillon	9cce8640a5	Fix linting errors in node/pod_test.go This moves away from defining pods independently. It moves pod (spec) generation to an independent function.	2019-09-03 11:00:33 -07:00
Sargun Dhillon	7accddcaf4	Fix linting errors in node/podcontroller.go	2019-09-03 11:00:33 -07:00
Brian Goff	2507f57f97	Merge pull request #732 from sargun/move-around-reactor Move location of eventhandler registration	2019-09-03 10:44:52 -07:00
Sargun Dhillon	43ee086360	Fix mock_test DeletePod to store updated pod status	2019-08-25 10:42:35 -07:00
Sargun Dhillon	ccb6713b86	Move location of eventhandler registration This moves the event handler registration until after the cache is in-sync. It makes it so we can use the log object from the context, rather than having to use the global logger The cache race condition of the cache starting while the reactor is being added wont exist because we wait for the cache to startup / go in sync prior to adding it.	2019-08-18 08:20:49 -07:00
Sargun Dhillon	69f1186713	Do not mutate pods, nor hand off pod references to provider This moves to a model where any time that pods are given to a provider, it uses a DeepCopy, as opposed to a reference. If the provider mutates the pod, it prevents it from causing issues with the informer cache. It has to use reflect instead of comparing the hashes because spew prints DeepCopy'd data structures ever so slightly differently.	2019-08-15 09:59:01 -07:00
Sargun Dhillon	89d88a17ed	Add a generic reactor to lifecycle_test to bump resource version (#733 ) All updates in our tests should have the behaviour that best reflects what API server does.	2019-08-15 08:46:38 +01:00
Sargun Dhillon	bc2f6e0dc4	Wait for the informer to become in sync before starting tests If the informers are starting at the same time as createPods, then we can get into a situation where the pod seems to get "lost". Instead, we wait for the informer to get into sync prior to the createpod event. This also moves to one informer as a microoptimization in the tests.	2019-08-14 07:03:53 -07:00

1 2 3

133 Commits