This changes the behaviour slightly, so rather than immediately exiting on
context cancellation, this calls shutdown, and waits for the current
items to finish being worked on before returning to the user.
Allows callers to wait for pod controller exit in addition to readiness.
This means the caller does not have to deal handling errors from the pod
controller running in a gorutine since it can wait for exit via `Done()`
and check the error with `Err()`
This caches the downloaded go modules. It invalidates them based on
a hash of the go.mod, and go.sum. The test step showed a reduction
from 1:30 -> 1:00, and the e2e tests from 8:30 to 5 minutes.
* Rename VK to chewong for development purpose
* Rename basic_test.go to basic.go
* Add e2e.go and suite.go
* Disable tests in node.go
* End to end tests are now importable as a testing suite
* Remove 'test' from test files
* Add documentations
* Rename chewong back to virtual-kubelet
* Change 'Testing Suite' to 'Test Suite'
* Add the ability to skip certain testss
* Add unit tests for suite.go
* Add README.md for importable e2e test suite
* VK implementation has to be based on VK v1.0.0
* Stricter checks on validating test functions
* Move certain files back to internal folder
* Add WatchTimeout as a config field
* Add slight modifications
We poll legacy providers for their pod(s) status periodically. This is because
we have no way of knowing when the pod is updated. If the pod somehow goes
missing in the provider, that state must be handled. Currently, we update
API server, and mark the pod as failed, or ignore it.
We introduce a map that can be used to store the pod status. In this,
we do not need to call GetPodStatus immediately after NotifyPods
is called. Instead, we stash the pod passed via notifypods
as in a map we can access later. In addition to this, for legacy
providers, the logic to merge the pod, and the pod status is
hoisted up to the loop.
It prevents leaks by deleting the entry in the map as soon
as the pod is deleted from k8s.
This moves the event handler registration until after the cache
is in-sync.
It makes it so we can use the log object from the context,
rather than having to use the global logger
The cache race condition of the cache starting while the reactor
is being added wont exist because we wait for the cache
to startup / go in sync prior to adding it.
This moves to a model where any time that pods are given to a
provider, it uses a DeepCopy, as opposed to a reference. If the
provider mutates the pod, it prevents it from causing issues
with the informer cache.
It has to use reflect instead of comparing the hashes because
spew prints DeepCopy'd data structures ever so slightly differently.
If the informers are starting at the same time as createPods,
then we can get into a situation where the pod seems to get
"lost". Instead, we wait for the informer to get into sync
prior to the createpod event.
This also moves to one informer as a microoptimization in
the tests.
Lifecycle test had a hotloop, where it would run a never-yielding
function while processing was going on elsewhere. This inserts
a sleep. A sleep is used rather than a yield to be kind to
people's battery life.
It turns out that running atomic.Read(...) in a tight loop breaks
Golang. The goroutine would never yield control over the scheduler,
so we ended up getting into a situation where the test would get
stuck forever. This moves to a different model, in which
there is a condition var, instead of atomics in loops.
This adds documentation around what is allowed to be mutated and
what may be accessed concurrently from the provider API. Previously,
the API was ambigious, and that meant providers could return pods
and change them. This resulted in data races occuring.
Right now, if the tests get stuck (on CI), they are terminated
after 10 minutes. This means as well that we get 0 output about
what went wrong.
Instead, this triggers a panic after 9 minutes on CI.
* Fix the deletion test to actually test the pod is deleted
* Fix the update pods test to update a value which is allowed
to be updated
* Shut down watches after tests
* Do not delete pod statuses on DeletePod in mock_test
This intentionally leaks pod statuses, but it makes the situation
a lot less complicated around handling race conditions with
the GetPodStatus callback