This drops another dependency on k8s.io/kubernetes.
This does have the unfortunate side effect that implementers will now
get a compile error until they update their code to use the new type.
Just as a note:
The stats types have moved to k8s.io/kubelet, however the stats types
are only there as of v1.20.
Currently we support older versions than v1.20, and even our go.mod
imports from v1.19.
For now we copy the types in. Later we can remove the type defs and
change them to type aliases to the k8s.io/kubelet types (which prevents
another compile time issue).
Anything relying on type assertions to determine if something implements
this method will, unfortunately, be broken and it will be hard to notice
until runtime. We need to make sure to call this out in the release
notes.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Our current retry policy is naive and only does 20 retries. It is
also based off of the rate limiter. If the user is somewhat aggressive in
rate limiting, but they have a temporary outage on API server, they
may want to continue to delay.
In facts, K8s has a built-in function to suggest delays:
https://pkg.go.dev/k8s.io/apimachinery/pkg/api/errors#SuggestsClientDelay
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This starts watching for events prior to the start of the controller.
This smells like a bug in the fakeclient bits, but it seems to fix
the problem.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
We had added an optimization that made it so we dedupe pod status updates
from the provider. This ignored two subfields that could be updated along
with status.
Because the details of subresource updating is a bit API server centric,
I wrote an envtest which checks for this behaviour.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This is a fundamentally different API than that of the K8s workqueue
which is better suited for our needs. Specifically, we need a simple
queue which doesn't have complex features like delayed adds that
sit on "external" goroutines.
In addition, we need deep introspection into the operations of the
workqueue. Although you can get this on top of the K8s workqueue
by implementing a custom rate limiter, the problem is that
the underlying rate limiter's behaviour is still somewhat
opaque.
This basically has 100% code coverage.
If you share a ratelimiter between workqueues, it breaks.
WQ1: Starts processing item (When)
WQ1: Fails to process item (When)
WQ1: Fails to process item (When)
WQ1: Fails to process item (When)
--- At this point we've backed off a bit ---
WQ2: Starts processing item (with same key, When)
WQ2: Succeeds at processing item (Forget)
WQ1: Fails to process item (When) ---> THIS RESULTS IN AN ERROR
This results in an error because it "forgot" the previous
rate limit.
Copy/paste some more kubernetes code. This is to remove the dep on
kubernetes/kubernetes from within exec.go
See #940
Signed-off-by: Miek Gieben <miek@miek.nl>
This refactor is a preparation for another commit. I want to add instrumentation
around our queues. The code of how queues were handled was spread throughout
the code base, and that made adding such instrumentation nice and complicated.
This centralizes the queue management logic in queue.go, and only requires
the user to provide a (custom) rate limiter, if they want to, a name,
and a handler.
The lease code is moved into its own package to simplify testing, because
the goroutine leak tester was triggering incorrectly if other tests
were running, and it was measuring leaks from those tests.
This also identified buggy behaviour:
wq := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultItemBasedRateLimiter(), "test")
wq.AddRateLimited("hi")
fmt.Printf("Added hi, len: %d\n", wq.Len())
wq.Forget("hi")
fmt.Printf("Forgot hi, len: %d\n", wq.Len())
wq.Done("hi")
fmt.Printf("Done hi, len: %d\n", wq.Len())
---
Prints all 0s because event non-delayed items are delayed. If you call Add
directly, then the last line prints a len of 2.
// Workqueue docs:
// Forget indicates that an item is finished being retried. Doesn't matter whether it's for perm failing
// or for success, we'll stop the rate limiter from tracking it. This only clears the `rateLimiter`, you
// still have to call `Done` on the queue.
^----- Even this seems untrue
This refactors the v1 lease controller. It makes two functional differences
to the lease controller:
* It no longer ties lease updates to node pings or node status updates
* There is no fallback mechanism to status updates
This also moves vk_envtest, allowing for future brown-box testing of the
lease controller with envtest
As described in the issue, if the following sequence happens, we fail to properly
update the pod status in api server:
1. Create pod in k8s
2. Provider creates the pod and syncs its status back
3. Pod in k8s ready/running, all fine
4. Virtual kubelet fails to update node status for some time for whatever reason (e.g. network connectivity issues)
5. Virtual node marked as NotReady with message: Kubelet stopped posting node status
6. kube-controller-manager of k8s, goes and marks all pods as Ready = false:
7. Virtual kubelet never sync's status of pod in provider back to k8s
We were having issues with golint not properly reporting declaration of functions
without proper documentation (comments). This is due to a config with golangci.
See: https://github.com/golangci/golangci-lint/issues/456
This moves node ping controller to using the new internal lock
API.
The reason for this is twofold:
* The channel approach that was used to notify other
controllers of changes could only be used once (at startup),
and couldn't be used in the future to broadcast node
ping status. The idea idea is here that we could move
to a sync.Cond style API and only wakeup other controllers
on change, as opposed to constantly polling each other
* The problem with sync.Cond is that it's not context friendly.
If we want to do stuff like wait on a sync.cond and use a context
or a timer or similar, it doesn't work whereas this API allows
context cancellations on condition change.
The idea is that as we have more controllers that act as centralized
sources of authority, they can broadcast out their state.
For example:
Provier is a K8s provider, pod created by deployment would be evicted when node is not ready.
If we do not delete pod in K8s, deployment would not create a new one.
Add some tests for updateStatus
This creates a new package -- podutils. The env var related code
doesn't really have any business being part of the node package,
and to create a separation of concerns, faster tests, and just
general code isolation and cleanliness, we can move the env
var related code into this package. This change is purely hygiene,
and not logic related.
For node, the package is under internal, because the constructor
references manager, which is an internal package.
There were some (additional) bugs that were easy-ish to introduce
by interleaving the provider provided node, and the server provided
updated node. This removes the chance of that confusion.
This allows the use of a built-in provider to do things like mark a node
as ready once all the controllers are spun up.
The e2e tests now use this instead of waiting on the pod that the vk
provider is deployed in to be marked ready (this was waiting on
/stats/summary to be serving, which is racey).
This fixes a small logic bug in the leases code for checking is owner
references are not set correctly, and makes it so that we properly
log when owner references are set, but not set to the node that
is "us".
Change the place where we set the defaults for node ping
and node status interval. This problem manifested itself
by the node ping interval being 0 when it was set to
the default.
This makes two changes:
1. Invalid ping values, and ping timeouts will not
allow VK to start up
2. We set the default values very early on in creation
of the node controller -- where all the other values
are set.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This takes a somewhat hamfisted approach at dealing with lease
conflicts. This can happen if "someone" changes the lease underneath
us. Again, this should happen rarely, but it can happen (And does
happen in production systems).
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This moves the job of pinging the node provider into its own
goroutine. If it takes a long time, it shouldn't slow down
leases, and vice-versa.
It also adds timeouts for node pings. One of the problems
is that we don't know how long a node ping will take --
there could be a bunch of network calls underneath us.
The point of the lease is to say whether or not the
Kubelet is unreachable, not whether or not the node
pings are "passing".
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This sets / updates the node lease owner reference to the current
node. Previously, we did not set this, which had the interesting
problem of leaking node leases on clusters with node churn.
This allows users who have a shared informer that is *not* filtering on
node name to supply a filter for event handlers to ensure events do not
fire for pods not scheduled to the node.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
If both the metrics routes and the pod routes are attached to the same
mux with the pattern "/", it will panic. Instead, add the stats handler
function to PodHandlerConfig and set up the route if it is not nil.
This solves the race condition as described in
https://github.com/virtual-kubelet/virtual-kubelet/issues/836.
It does this by checking two conditions when the possible race condition
is detected.
If we receive a pod notification from the provider, and it is not
in our known pods list:
1. Is our cache in-sync?
2. Is it known to our pod lister?
The first case can happen because of the order we start the
provider and sync our caches. The second case can happen because
even if the cache returns synced, it does not mean all of the call
backs on the informer have quiesced.
This slightly changes the behaviour of notifyPods to that it
can block (especially at startup). We can solve this later
by using something like a fair (ticket?) lock.
This follows suit with other hanlders and returns a NotImplemented
http.HandlerFunc when the lister is nil.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>