Our current retry policy is naive and only does 20 retries. It is
also based off of the rate limiter. If the user is somewhat aggressive in
rate limiting, but they have a temporary outage on API server, they
may want to continue to delay.
In facts, K8s has a built-in function to suggest delays:
https://pkg.go.dev/k8s.io/apimachinery/pkg/api/errors#SuggestsClientDelay
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
We had added an optimization that made it so we dedupe pod status updates
from the provider. This ignored two subfields that could be updated along
with status.
Because the details of subresource updating is a bit API server centric,
I wrote an envtest which checks for this behaviour.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
This refactor is a preparation for another commit. I want to add instrumentation
around our queues. The code of how queues were handled was spread throughout
the code base, and that made adding such instrumentation nice and complicated.
This centralizes the queue management logic in queue.go, and only requires
the user to provide a (custom) rate limiter, if they want to, a name,
and a handler.
The lease code is moved into its own package to simplify testing, because
the goroutine leak tester was triggering incorrectly if other tests
were running, and it was measuring leaks from those tests.
This also identified buggy behaviour:
wq := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultItemBasedRateLimiter(), "test")
wq.AddRateLimited("hi")
fmt.Printf("Added hi, len: %d\n", wq.Len())
wq.Forget("hi")
fmt.Printf("Forgot hi, len: %d\n", wq.Len())
wq.Done("hi")
fmt.Printf("Done hi, len: %d\n", wq.Len())
---
Prints all 0s because event non-delayed items are delayed. If you call Add
directly, then the last line prints a len of 2.
// Workqueue docs:
// Forget indicates that an item is finished being retried. Doesn't matter whether it's for perm failing
// or for success, we'll stop the rate limiter from tracking it. This only clears the `rateLimiter`, you
// still have to call `Done` on the queue.
^----- Even this seems untrue
As described in the issue, if the following sequence happens, we fail to properly
update the pod status in api server:
1. Create pod in k8s
2. Provider creates the pod and syncs its status back
3. Pod in k8s ready/running, all fine
4. Virtual kubelet fails to update node status for some time for whatever reason (e.g. network connectivity issues)
5. Virtual node marked as NotReady with message: Kubelet stopped posting node status
6. kube-controller-manager of k8s, goes and marks all pods as Ready = false:
7. Virtual kubelet never sync's status of pod in provider back to k8s
For example:
Provier is a K8s provider, pod created by deployment would be evicted when node is not ready.
If we do not delete pod in K8s, deployment would not create a new one.
Add some tests for updateStatus
This creates a new package -- podutils. The env var related code
doesn't really have any business being part of the node package,
and to create a separation of concerns, faster tests, and just
general code isolation and cleanliness, we can move the env
var related code into this package. This change is purely hygiene,
and not logic related.
For node, the package is under internal, because the constructor
references manager, which is an internal package.
This solves the race condition as described in
https://github.com/virtual-kubelet/virtual-kubelet/issues/836.
It does this by checking two conditions when the possible race condition
is detected.
If we receive a pod notification from the provider, and it is not
in our known pods list:
1. Is our cache in-sync?
2. Is it known to our pod lister?
The first case can happen because of the order we start the
provider and sync our caches. The second case can happen because
even if the cache returns synced, it does not mean all of the call
backs on the informer have quiesced.
This slightly changes the behaviour of notifyPods to that it
can block (especially at startup). We can solve this later
by using something like a fair (ticket?) lock.
This moves from forcefully deleting pods to deleting pods in a
graceful manner from the API Server. It waits for the pod to
get to a terminal status prior to deleting the pod from api
server.
Pods can be updated outside of VK. Right now, if this happens, pod
status updates are dropped because the resourceversion from the
provider will mismatch with what's on the server, breaking
pod status updates.
Since we're the only ones writing to the pod status, we
can do a blind overwrite.
This removes the legacy sync provider interface. All new providers
are expected to implement the async NotifyPods interface.
The legacy sync provider interface creates complexities around
how the deletion flow works, and the mixed sync and async APIs
block us from evolving functionality.
This collapses in the NotifyPods interface into the PodLifecycleHandler
interface.
We poll legacy providers for their pod(s) status periodically. This is because
we have no way of knowing when the pod is updated. If the pod somehow goes
missing in the provider, that state must be handled. Currently, we update
API server, and mark the pod as failed, or ignore it.
We introduce a map that can be used to store the pod status. In this,
we do not need to call GetPodStatus immediately after NotifyPods
is called. Instead, we stash the pod passed via notifypods
as in a map we can access later. In addition to this, for legacy
providers, the logic to merge the pod, and the pod status is
hoisted up to the loop.
It prevents leaks by deleting the entry in the map as soon
as the pod is deleted from k8s.
This moves to a model where any time that pods are given to a
provider, it uses a DeepCopy, as opposed to a reference. If the
provider mutates the pod, it prevents it from causing issues
with the informer cache.
It has to use reflect instead of comparing the hashes because
spew prints DeepCopy'd data structures ever so slightly differently.
* Fix error handling for delete pod
- Error handling was looking for a k8s error from the provider, but
providers should be using errdefs.
- Error handling was returning early if pod was not found and deleting
from k8s in all other cases.
* Don't run unit tests twice
* Move tracing exporter registration
This doesn't belong in the library and should be configured by the
consumer of the opencensus package.
* Rename `vkublet` package to `node`
`vkubelet` does not convey any information to the consumers of the
package.
Really it would be nice to move this package to the root of the repo,
but then you wind up with... interesting... import semantics due to the
repo name... and after thinking about it some, a subpackage is really
not so bad as long as it has a name that convey's some information.
`node` was chosen since this package deals with all the semantics of
operating a node in Kubernetes.