virtual-kubelet

Author	SHA1	Message	Date
Hasan Turken	42f7c56d32	Don't skip pods status update if podStatusReasonProviderFailed Closes #399 Signed-off-by: Hasan Turken <turkenh@gmail.com>	2020-11-16 13:57:48 -08:00
Brian Goff	2716c38e1f	Merge branch 'master' into upgrade-golint-to-v1.32.2	2020-11-06 16:08:09 -08:00
Sargun Dhillon	c437e05ad0	Move env var code into its own package This creates a new package -- podutils. The env var related code doesn't really have any business being part of the node package, and to create a separation of concerns, faster tests, and just general code isolation and cleanliness, we can move the env var related code into this package. This change is purely hygiene, and not logic related. For node, the package is under internal, because the constructor references manager, which is an internal package.	2020-11-06 14:49:53 -08:00
Sargun Dhillon	b9303714de	Upgrade to golangci-lint v1.32.2	2020-11-06 14:45:19 -08:00
chao zheng	b793d89c66	when initialzing vk, use empty clientcmd.ConfigOverrides instead of nil	2020-10-28 15:34:16 -07:00
Brian Goff	590d2e7f01	Merge pull request #862 from cpuguy83/node_helpers	2020-10-26 15:00:45 -07:00
Sargun Dhillon	84a169f25d	Fix golang ci warner	2020-10-04 19:52:34 -07:00
Sargun Dhillon	946c616c67	Create stronger separation between provider node and server node There were some (additional) bugs that were easy-ish to introduce by interleaving the provider provided node, and the server provided updated node. This removes the chance of that confusion.	2020-10-04 19:52:34 -07:00
Sargun Dhillon	1c32b2c8ee	Fix data race in test	2020-09-21 23:38:48 -07:00
Sargun Dhillon	cf2d5264a5	Fix datarace in node ping controller	2020-09-21 23:38:43 -07:00
Brian Goff	0c64171e85	Add v2 node provider for accepting status updates This allows the use of a built-in provider to do things like mark a node as ready once all the controllers are spun up. The e2e tests now use this instead of waiting on the pod that the vk provider is deployed in to be marked ready (this was waiting on /stats/summary to be serving, which is racey).	2020-09-17 13:52:58 -07:00
Sargun Dhillon	3d1226d45d	Fix logging when leases are mis-set This fixes a small logic bug in the leases code for checking is owner references are not set correctly, and makes it so that we properly log when owner references are set, but not set to the node that is "us".	2020-09-08 12:04:16 -07:00
Sargun Dhillon	cd059d9755	Fix node ping interval code / default setting code Change the place where we set the defaults for node ping and node status interval. This problem manifested itself by the node ping interval being 0 when it was set to the default. This makes two changes: 1. Invalid ping values, and ping timeouts will not allow VK to start up 2. We set the default values very early on in creation of the node controller -- where all the other values are set. Signed-off-by: Sargun Dhillon <sargun@sargun.me>	2020-08-18 00:39:14 -07:00
Sargun Dhillon	6845cf825a	Delete and recreate lease on conflict This takes a somewhat hamfisted approach at dealing with lease conflicts. This can happen if "someone" changes the lease underneath us. Again, this should happen rarely, but it can happen (And does happen in production systems). Signed-off-by: Sargun Dhillon <sargun@sargun.me>	2020-08-17 11:54:43 -07:00
Sargun Dhillon	d390dfce43	Move node pinging to its own goroutine This moves the job of pinging the node provider into its own goroutine. If it takes a long time, it shouldn't slow down leases, and vice-versa. It also adds timeouts for node pings. One of the problems is that we don't know how long a node ping will take -- there could be a bunch of network calls underneath us. The point of the lease is to say whether or not the Kubelet is unreachable, not whether or not the node pings are "passing". Signed-off-by: Sargun Dhillon <sargun@sargun.me>	2020-08-03 10:57:37 -07:00
Sargun Dhillon	49c596c5ca	Split waitableInt into its own test file This is merely a rearranging of the deck chairs and moving waitable int into its own file since we intend to use it across multiple tests.	2020-08-03 10:57:37 -07:00
Brian Goff	3fc79dc677	Merge pull request #871 from virtual-kubelet/set-node-lease-owner Set Node Leader Owner Reference	2020-07-31 11:51:14 -07:00
Sargun Dhillon	4bdcba5b85	Set Node Leader Owner Reference This sets / updates the node lease owner reference to the current node. Previously, we did not set this, which had the interesting problem of leaking node leases on clusters with node churn.	2020-07-31 11:23:47 -07:00
Brian Goff	c0296b99fd	Support custom filter for pod event handlers This allows users who have a shared informer that is not filtering on node name to supply a filter for event handlers to ensure events do not fire for pods not scheduled to the node. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-07-30 17:17:42 -07:00
Brian Goff	83f8cd1a58	Add helpers for common setup code Create a clientset, setup pod informer filters, and setup node lease client.	2020-07-27 14:51:02 -07:00
Brian Goff	af1df79088	Merge pull request #851 from virtual-kubelet/race-condition-2nd	2020-07-23 13:53:58 -07:00
Vilmos Nebehaj	56b248c854	Add GetStatsSummary to PodHandlerConfig If both the metrics routes and the pod routes are attached to the same mux with the pattern "/", it will panic. Instead, add the stats handler function to PodHandlerConfig and set up the route if it is not nil.	2020-07-23 09:50:19 -07:00
Sargun Dhillon	4258c46746	Enhance / cleanup enqueuePodStatusUpdate polling in retry loop	2020-07-22 18:57:27 -07:00
Sargun Dhillon	1e9e055e89	Address concerns with PR Also, just use Kubernetes waiter library.	2020-07-22 18:57:27 -07:00
Sargun Dhillon	12625131b5	Solve the notification on startup pod status notification race condition This solves the race condition as described in https://github.com/virtual-kubelet/virtual-kubelet/issues/836. It does this by checking two conditions when the possible race condition is detected. If we receive a pod notification from the provider, and it is not in our known pods list: 1. Is our cache in-sync? 2. Is it known to our pod lister? The first case can happen because of the order we start the provider and sync our caches. The second case can happen because even if the cache returns synced, it does not mean all of the call backs on the informer have quiesced. This slightly changes the behaviour of notifyPods to that it can block (especially at startup). We can solve this later by using something like a fair (ticket?) lock.	2020-07-22 18:57:27 -07:00
Brian Goff	bcb5dfa11c	Fix running pods handler on nil lister This follows suit with other hanlders and returns a NotImplemented http.HandlerFunc when the lister is nil. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2020-07-14 15:33:59 -07:00
Adrien Trouillaud	845b4cd409	upgrade k8s libs to 1.18.4	2020-07-07 21:00:56 -07:00
Sargun Dhillon	e805cb744a	Introduce three-way patch for proper handling of out-of-band status updates As described in the patch itself, there is a case that if a node is updated out of band (e.g. node-problem-detector (https://github.com/kubernetes/node-problem-detector)), we will overwrite the patch in our typicaly two-way strategic patch for node status updates. The reason why the standard kubelet can do this is because the flow goes: apiserver->kubelet: Fetch current node kubelet->kubelet: Update apiserver's snapshot with local state changes kubelet->apiserver: patch We don't have this luxury, as we rely on providers making a callback into us in order to get the most recent pod status. They do not have a way to do that merge operation themselves, and a two-way merge doesn't give us enough metadata. In order to work around this, we perform a three-way merge on behalf of the user. We do this by stashing the contents of the last update inside of it. We then fetch that status back, and use that for the future update itself. In the upgrade case, or the case where the VK has been created by "someone else", we do not know which attributes were created by or written by us, so we cannot generate a three way patch. In this case, we will do our best to avoid deleting any attributes, and only overwrite them. We will consider all current api server values written by "someone else", and not edit them. This is done by considering the "old node" to be empty.	2020-07-06 11:10:32 -07:00
Brian Goff	5306173408	Merge pull request #846 from sargun/add-trace-to-updateStatus Add instrumentation to node controller (tracing)	2020-07-01 12:53:27 -07:00
Sargun Dhillon	30aabe6fcb	Add instrumentation to node controller (tracing) This adds tracing in node controller in several sections where it was missing.	2020-07-01 12:40:09 -07:00
Sargun Dhillon	1e8c16877d	Make node status updates non-blocking There's a (somewhat) common case we can get into where the node status update loop is busy while a provider is trying to send a node status update. Right now, we block the provider from creating a notification in this case.	2020-07-01 12:32:54 -07:00
wadecai	ca417d5239	Expose the queue rate limiter	2020-06-26 10:45:41 +08:00
wadecai	fedffd6f2c	Add parameters to support change work queue qps	2020-06-26 10:44:09 +08:00
Weidong Cai	2398504d08	dedup in updatePodStatus (#830 ) Co-authored-by: Brian Goff <cpuguy83@gmail.com>	2020-06-15 14:35:14 -07:00
wadecai	3db9ab97c6	Avoid enqueue when status of k8s pods change	2020-06-13 13:19:55 +08:00
Brian Goff	51b9a6c40d	Fix stream timeout defaults This was an unintentional breaking change in `0bdf742303` A timeout of 0 doesn't make any sense, so use the old values of 30s as a default.	2020-06-03 10:01:34 -07:00
Vilmos Nebehaj	3e0d03c833	Use errdefs.InvalidInputf() for formatting	2020-04-28 11:19:37 -07:00
Vilmos Nebehaj	7628c13aeb	Add tests for parseLogOptions()	2020-04-28 11:19:37 -07:00
Vilmos Nebehaj	8308033eff	Add support for v1.PodLogOptions	2020-04-28 11:19:37 -07:00
wadecai	30e31c0451	Check pod status equal before enqueue	2020-04-21 10:42:29 +08:00
Sargun Dhillon	5ad12cd476	Add /pods HTTP endpoint	2020-03-20 12:04:00 -07:00
guoliangshuai	554d30a0b1	add 'GET' method to pod exec handler, so it can support websocket	2020-03-09 14:16:49 +08:00
Vilmos Nebehaj	47112aa5d6	Use correct Flush() prototype from http.Flusher When calling GetContainerLogs(), a type check is performed to see if the http.ResponseWriter supports flushing. However, Flush() in http.Flusher does not return an error, therefore the type check will always fail. Fix the flushWriter helper interface so flushing the writer will work.	2020-01-20 13:27:36 -08:00
Weidong Cai	0bdf742303	Make exec timeout configurable (#803 ) * make exec timeout configurable	2020-01-18 12:11:54 -08:00
wadecai	55f3f17ba0	add some event to pod	2019-11-29 14:33:00 +08:00
Brian Goff	6e33b0f084	[Sync Provider] Fix panic on not found pod status	2019-11-15 09:44:29 -08:00
Thomas Hartland	c258614d8f	After handling status update, reset update timer with correct duration If the ping timer is being used, it should be reset with the ping update interval. If the status update interval is used then Ping stops being called for long enough to cause kubernetes to mark the node as NotReady.	2019-11-11 14:29:52 +01:00
Thomas Hartland	3783a39b26	Add test for node ping interval	2019-11-11 14:29:52 +01:00
Brian Goff	0ccf5059e4	Put sync lifecycle tests being -short flag. This lets you skip tests for the slower sync provider.	2019-10-29 15:05:35 -07:00
Brian Goff	31c8fbaa41	Apply suggestions from code review Typos and punctuation fixes. Co-Authored-By: Pires <1752631+pires@users.noreply.github.com>	2019-10-24 09:23:33 -07:00

1 2 3

109 Commits