This allows the use of a built-in provider to do things like mark a node
as ready once all the controllers are spun up.
The e2e tests now use this instead of waiting on the pod that the vk
provider is deployed in to be marked ready (this was waiting on
/stats/summary to be serving, which is racey).
If both the metrics routes and the pod routes are attached to the same
mux with the pattern "/", it will panic. Instead, add the stats handler
function to PodHandlerConfig and set up the route if it is not nil.
This solves the race condition as described in
https://github.com/virtual-kubelet/virtual-kubelet/issues/836.
It does this by checking two conditions when the possible race condition
is detected.
If we receive a pod notification from the provider, and it is not
in our known pods list:
1. Is our cache in-sync?
2. Is it known to our pod lister?
The first case can happen because of the order we start the
provider and sync our caches. The second case can happen because
even if the cache returns synced, it does not mean all of the call
backs on the informer have quiesced.
This slightly changes the behaviour of notifyPods to that it
can block (especially at startup). We can solve this later
by using something like a fair (ticket?) lock.
This follows suit with other hanlders and returns a NotImplemented
http.HandlerFunc when the lister is nil.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
As described in the patch itself, there is a case that if a node is updated out of
band (e.g. node-problem-detector (https://github.com/kubernetes/node-problem-detector)),
we will overwrite the patch in our typicaly two-way strategic patch for node status
updates.
The reason why the standard kubelet can do this is because the flow goes:
apiserver->kubelet: Fetch current node
kubelet->kubelet: Update apiserver's snapshot with local state changes
kubelet->apiserver: patch
We don't have this luxury, as we rely on providers making a callback into us
in order to get the most recent pod status. They do not have a way
to do that merge operation themselves, and a two-way merge doesn't
give us enough metadata.
In order to work around this, we perform a three-way merge on behalf of
the user. We do this by stashing the contents of the last update inside
of it. We then fetch that status back, and use that for the future
update itself.
In the upgrade case, or the case where the VK has been created by
"someone else", we do not know which attributes were created by
or written by us, so we cannot generate a three way patch.
In this case, we will do our best to avoid deleting any attributes,
and only overwrite them. We will consider all current api server
values written by "someone else", and not edit them. This is done
by considering the "old node" to be empty.
There's a (somewhat) common case we can get into where the node
status update loop is busy while a provider is trying to send
a node status update. Right now, we block the provider from
creating a notification in this case.
When calling GetContainerLogs(), a type check is performed to see if the
http.ResponseWriter supports flushing. However, Flush() in http.Flusher
does not return an error, therefore the type check will always fail.
Fix the flushWriter helper interface so flushing the writer will work.
If the ping timer is being used, it should be reset with the ping update
interval. If the status update interval is used then Ping stops being
called for long enough to cause kubernetes to mark the node as NotReady.
If a pod is being gracefully deleted at podcontroller startup,
it will not get deleted via the deletedanglingpods code. This
ensures the normal deletion loop covers the case.
This moves from forcefully deleting pods to deleting pods in a
graceful manner from the API Server. It waits for the pod to
get to a terminal status prior to deleting the pod from api
server.
Pods can be updated outside of VK. Right now, if this happens, pod
status updates are dropped because the resourceversion from the
provider will mismatch with what's on the server, breaking
pod status updates.
Since we're the only ones writing to the pod status, we
can do a blind overwrite.
This removes the legacy sync provider interface. All new providers
are expected to implement the async NotifyPods interface.
The legacy sync provider interface creates complexities around
how the deletion flow works, and the mixed sync and async APIs
block us from evolving functionality.
This collapses in the NotifyPods interface into the PodLifecycleHandler
interface.
This changes the behaviour slightly, so rather than immediately exiting on
context cancellation, this calls shutdown, and waits for the current
items to finish being worked on before returning to the user.
Allows callers to wait for pod controller exit in addition to readiness.
This means the caller does not have to deal handling errors from the pod
controller running in a gorutine since it can wait for exit via `Done()`
and check the error with `Err()`
We poll legacy providers for their pod(s) status periodically. This is because
we have no way of knowing when the pod is updated. If the pod somehow goes
missing in the provider, that state must be handled. Currently, we update
API server, and mark the pod as failed, or ignore it.
We introduce a map that can be used to store the pod status. In this,
we do not need to call GetPodStatus immediately after NotifyPods
is called. Instead, we stash the pod passed via notifypods
as in a map we can access later. In addition to this, for legacy
providers, the logic to merge the pod, and the pod status is
hoisted up to the loop.
It prevents leaks by deleting the entry in the map as soon
as the pod is deleted from k8s.