Commit Graph

814 Commits

Author SHA1 Message Date
Brian Goff
bb9ff1adf3 Adds Done() and Err() to pod controller (#735)
Allows callers to wait for pod controller exit in addition to readiness.
This means the caller does not have to deal handling errors from the pod
controller running in a gorutine since it can wait for exit via `Done()`
and check the error with `Err()`
2019-09-10 17:44:19 +01:00
Brian Goff
db146a0e01 Merge pull request #761 from sargun/cache-deps
Cache Downloaded Go Modules
2019-09-06 15:20:37 -07:00
Ernest Wong
fdb0c805f7 Add more unit test to #584 2019-09-05 10:48:35 -07:00
Ernest Wong
dc7ff44303 Add unit tests for #584 2019-09-05 09:49:41 -07:00
Sargun Dhillon
e7a36c3505 Cache Downloaded Go Modules
This caches the downloaded go modules. It invalidates them based on
a hash of the go.mod, and go.sum. The test step showed a reduction
from 1:30 -> 1:00, and the e2e tests from 8:30 to 5 minutes.
2019-09-05 09:23:13 -07:00
Ernest Wong
f10a16aed7 Importable End-To-End Test Suite (#758)
* Rename VK to chewong for development purpose

* Rename basic_test.go to basic.go

* Add e2e.go and suite.go

* Disable tests in node.go

* End to end tests are now importable as a testing suite

* Remove 'test' from test files

* Add documentations

* Rename chewong back to virtual-kubelet

* Change 'Testing Suite' to 'Test Suite'

* Add the ability to skip certain testss

* Add unit tests for suite.go

* Add README.md for importable e2e test suite

* VK implementation has to be based on VK v1.0.0

* Stricter checks on validating test functions

* Move certain files back to internal folder

* Add WatchTimeout as a config field

* Add slight modifications
2019-09-04 22:25:43 +01:00
Sargun Dhillon
da57373abb Test pods going missing while they're running in legacy providers (#759)
We poll legacy providers for their pod(s) status periodically. This is because
we have no way of knowing when the pod is updated. If the pod somehow goes
missing in the provider, that state must be handled. Currently, we update
API server, and mark the pod as failed, or ignore it.
2019-09-04 22:16:14 +01:00
Sargun Dhillon
33df981904 Have NotifyPods store the pod status in a map (#751)
We introduce a map that can be used to store the pod status. In this,
we do not need to call GetPodStatus immediately after NotifyPods
is called. Instead, we stash the pod passed via notifypods
as in a map we can access later. In addition to this, for legacy
providers, the logic to merge the pod, and the pod status is
hoisted up to the loop.

It prevents leaks by deleting the entry in the map as soon
as the pod is deleted from k8s.
2019-09-04 20:14:34 +01:00
Brian Goff
ecf6e45bfc Merge pull request #755 from sargun/fix-golang-lint
Fix golang lint
2019-09-03 11:25:21 -07:00
Sargun Dhillon
3f85705461 Upgrade linter, and move away from incremental linting
Incremental linting doesn't seem to catch issues correctly. This
runs the linters in a more standard way.
2019-09-03 11:00:33 -07:00
Sargun Dhillon
7133a372d6 Mark current linting errors as non-errors
This is basically claiming linting bankruptcy. It marks all of the
issues we had up until this point as nolint.
2019-09-03 11:00:33 -07:00
Sargun Dhillon
5949e6279d Miscellaneous cleanup for linting 2019-09-03 11:00:33 -07:00
Sargun Dhillon
9cce8640a5 Fix linting errors in node/pod_test.go
This moves away from defining pods independently. It moves pod (spec)
generation to an independent function.
2019-09-03 11:00:33 -07:00
Sargun Dhillon
7accddcaf4 Fix linting errors in node/podcontroller.go 2019-09-03 11:00:33 -07:00
Ernest Wong
ee31118596 Update docs on virtual-kubelet.io (#754)
* Update website content

* Add PodLifecycleHandler
2019-09-03 10:52:23 -07:00
Brian Goff
2507f57f97 Merge pull request #732 from sargun/move-around-reactor
Move location of eventhandler registration
2019-09-03 10:44:52 -07:00
Sargun Dhillon
9a461a61ad Bump the Circle CI build job to an resource_class of xlarge (#722) 2019-09-02 07:11:11 +01:00
Sargun Dhillon
9443e32ae7 Merge pull request #742 from sargun/fix-mock-provider
Fix mock_test DeletePod to store updated pod status
2019-08-25 10:52:56 -07:00
Sargun Dhillon
43ee086360 Fix mock_test DeletePod to store updated pod status 2019-08-25 10:42:35 -07:00
Sargun Dhillon
0c6de30684 Merge pull request #746 from 928234269/patch2
fix tyop in doc.go
2019-08-21 08:29:46 -07:00
928234269
7305c08d7e fix tyop in doc.go
Signed-off-by: 928234269 <longfei.shang@daocloud.io>
2019-08-20 18:44:11 +08:00
Sargun Dhillon
ccb6713b86 Move location of eventhandler registration
This moves the event handler registration until after the cache
is in-sync.

It makes it so we can use the log object from the context,
rather than having to use the global logger

The cache race condition of the cache starting while the reactor
is being added wont exist because we wait for the cache
to startup / go in sync prior to adding it.
2019-08-18 08:20:49 -07:00
Brian Goff
2f2625c8e2 Merge pull request #734 from sargun/do-not-change-pods
Do not mutate pods, nor hand off pod references to provider
2019-08-15 10:58:39 -07:00
Sargun Dhillon
69f1186713 Do not mutate pods, nor hand off pod references to provider
This moves to a model where any time that pods are given to a
provider, it uses a DeepCopy, as opposed to a reference. If the
provider mutates the pod, it prevents it from causing issues
with the informer cache.

It has to use reflect instead of comparing the hashes because
spew prints DeepCopy'd data structures ever so slightly differently.
2019-08-15 09:59:01 -07:00
Sargun Dhillon
89d88a17ed Add a generic reactor to lifecycle_test to bump resource version (#733)
All updates in our tests should have the behaviour that best
reflects what API server does.
2019-08-15 08:46:38 +01:00
Brian Goff
cad19238fd Merge pull request #736 from sargun/fix-race
Wait for the informer to become in sync before starting tests
2019-08-14 11:44:21 -07:00
Sargun Dhillon
bc2f6e0dc4 Wait for the informer to become in sync before starting tests
If the informers are starting at the same time as createPods,
then we can get into a situation where the pod seems to get
"lost". Instead, we wait for the informer to get into sync
prior to the createpod event.

This also moves to one informer as a microoptimization in
the tests.
2019-08-14 07:03:53 -07:00
Brian Goff
47f5aa45df Merge pull request #727 from ethan-daocloud/patch-2
cleanup: fix some typos in node.go
2019-08-13 12:00:43 -07:00
Sargun Dhillon
de238ee280 Merge pull request #731 from sargun/document-api
Add documentation to the provider API about concurrency / mutability
2019-08-13 11:58:00 -07:00
Brian Goff
569706f371 Merge branch 'master' into document-api 2019-08-13 11:47:04 -07:00
Guangming Wang
cb307df71e cleanup: fix some typos in node.go
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-13 11:39:00 -07:00
Sargun Dhillon
40a4b54ca7 Merge pull request #728 from sargun/im-an-idiot
Remove usage of atomics in tests
2019-08-13 11:34:55 -07:00
Sargun Dhillon
edc0991c0c Fix hotloop around scheduling in lifecycle_test
Lifecycle test had a hotloop, where it would run a never-yielding
function while processing was going on elsewhere. This inserts
a sleep. A sleep is used rather than a yield to be kind to
people's battery life.
2019-08-13 11:25:21 -07:00
Sargun Dhillon
fbed4ca702 Remove usage of atomics
It turns out that running atomic.Read(...) in a tight loop breaks
Golang. The goroutine would never yield control over the scheduler,
so we ended up getting into a situation where the test would get
stuck forever. This moves to a different model, in which
there is a condition var, instead of atomics in loops.
2019-08-13 11:25:21 -07:00
Sargun Dhillon
9b27eb83fe Make mock_test follow the aformentioned documentation 2019-08-13 10:30:02 -07:00
Sargun Dhillon
3b3bf3ff20 Add documentation to the provider API about concurrency / mutability
This adds documentation around what is allowed to be mutated and
what may be accessed concurrently from the provider API. Previously,
the API was ambigious, and that meant providers could return pods
and change them. This resulted in data races occuring.
2019-08-13 10:29:12 -07:00
Sargun Dhillon
75a399f6f4 Merge pull request #724 from sargun/upgrade-k8s-v2
Upgrade k8s
2019-08-13 03:08:37 -07:00
Pires
f0a0e8cbfe Merge branch 'master' into upgrade-k8s-v2 2019-08-13 10:43:00 +01:00
Sargun Dhillon
32ff40eb56 Merge pull request #720 from sargun/set-test-timeout
Set timeout for tests on CI to  9 minutes
2019-08-12 14:53:09 -07:00
Sargun Dhillon
65c5446c94 Set timeout for tests on CI to 9 minutes
Right now, if the tests get stuck (on CI), they are terminated
after 10 minutes. This means as well that we get 0 output about
what went wrong.

Instead, this triggers a panic after 9 minutes on CI.
2019-08-12 13:45:30 -07:00
Brian Goff
cafcdeeefa Merge pull request #723 from sargun/lifecycle-test-fixes
Array of minor fixups to lifecycle tests
2019-08-12 13:22:51 -07:00
Sargun Dhillon
5c2b682cdc Array of minor fixups to lifecycle tests
* Fix the deletion test to actually test the pod is deleted
 * Fix the update pods test to update a value which is allowed
   to be updated
 * Shut down watches after tests
 * Do not delete pod statuses on DeletePod in mock_test

This intentionally leaks pod statuses, but it makes the situation
a lot less complicated around handling race conditions with
the GetPodStatus callback
2019-08-12 12:10:29 -07:00
Sargun Dhillon
e1c3bc3151 Merge pull request #725 from sargun/fix-race-conditions-in-node-test
Fix race conditions in node_test
2019-08-12 11:43:06 -07:00
Sargun Dhillon
5ac33e4b0a Fix race conditions in node_test 2019-08-12 11:33:48 -07:00
Sargun Dhillon
42656aae2f Merge pull request #719 from ethan-daocloud/patch-1
cleanup: fix misspelled words in error message
2019-08-12 11:09:35 -07:00
Brian Goff
10b291dba1 Merge branch 'master' into patch-1 2019-08-12 10:48:15 -07:00
Brian Goff
9d90c599e7 Merge pull request #721 from sargun/fix-race-condition
Fix race condition around worker ID generation in podcontroller.go
2019-08-12 10:43:32 -07:00
Sargun Dhillon
82de7f02c4 Upgrade Kubernetes e2e test cluster to 1.15.2 2019-08-12 10:30:04 -07:00
Sargun Dhillon
ad6cd7d552 Upgrade K8s
* Upgrade k8s.io/api
   go get k8s.io/api@kubernetes-1.15.2
 * Upgrade k8s.io/apimachinery
   go get k8s.io/apimachinery@kubernetes-1.15.2
 * Upgrade kubernetes-1.15.2
   go get k8s.io/client-go@kubernetes-1.15.2
 * Upgrade kk8s.io/kubernetes to v1.15.2
   go get k8s.io/kubernetes@v1.15.2

This also locks the the dependency for
github.com/prometheus/client_golang/prometheus due to a golang bug, and to
please the validation scripts.

The replaces were generated by:
go get k8s.io/kubernetes@v1.15.2 2> fail
for i in $(cat fail|grep unknown|cut -f1 -d@|cut -f2 -d" ")
  do echo "replace ${i} => ${i} kubernetes-1.15.2"
done
2019-08-12 10:29:19 -07:00
Sargun Dhillon
a28969355e Fix race condition around worker ID generation in podcontroller.go 2019-08-12 10:27:21 -07:00