It turns out that running atomic.Read(...) in a tight loop breaks
Golang. The goroutine would never yield control over the scheduler,
so we ended up getting into a situation where the test would get
stuck forever. This moves to a different model, in which
there is a condition var, instead of atomics in loops.
This adds documentation around what is allowed to be mutated and
what may be accessed concurrently from the provider API. Previously,
the API was ambigious, and that meant providers could return pods
and change them. This resulted in data races occuring.
Right now, if the tests get stuck (on CI), they are terminated
after 10 minutes. This means as well that we get 0 output about
what went wrong.
Instead, this triggers a panic after 9 minutes on CI.
* Fix the deletion test to actually test the pod is deleted
* Fix the update pods test to update a value which is allowed
to be updated
* Shut down watches after tests
* Do not delete pod statuses on DeletePod in mock_test
This intentionally leaks pod statuses, but it makes the situation
a lot less complicated around handling race conditions with
the GetPodStatus callback
* Upgrade k8s.io/api
go get k8s.io/api@kubernetes-1.15.2
* Upgrade k8s.io/apimachinery
go get k8s.io/apimachinery@kubernetes-1.15.2
* Upgrade kubernetes-1.15.2
go get k8s.io/client-go@kubernetes-1.15.2
* Upgrade kk8s.io/kubernetes to v1.15.2
go get k8s.io/kubernetes@v1.15.2
This also locks the the dependency for
github.com/prometheus/client_golang/prometheus due to a golang bug, and to
please the validation scripts.
The replaces were generated by:
go get k8s.io/kubernetes@v1.15.2 2> fail
for i in $(cat fail|grep unknown|cut -f1 -d@|cut -f2 -d" ")
do echo "replace ${i} => ${i} kubernetes-1.15.2"
done
As far as I can tell, based on the implementation in MockProvider
NotifyPods is called with the mutated pod. This allows us to
take a copy of the Pod object in NotifyPods, and make it so
(eventually) we don't need to do a callback to GetPodStatus.
This makes sure the update function works correctly after the pod
is running if the podspec is changed. Upon writing the test, I realized
we were accessing the variables outside of the goroutine that the
workers with tests were running in, and we had no locks. Therefore,
I converted all of those numbers to use atomics.
This seems to avoid a race conditions where at pod informer
startup time, the reactor doesn't properly get setup.
It also refactors the root command example to start up
the informers after everything is wired up.
Otherwise the variable is evaluated before it's even set (assuming it's
not set from the CLI).
skaffold also requires bin/e2e/virtual-kubelet to work on it's own.