Commit Graph

998 Commits

Author SHA1 Message Date
Sargun Dhillon
9a461a61ad Bump the Circle CI build job to an resource_class of xlarge (#722) 2019-09-02 07:11:11 +01:00
Sargun Dhillon
9443e32ae7 Merge pull request #742 from sargun/fix-mock-provider
Fix mock_test DeletePod to store updated pod status
2019-08-25 10:52:56 -07:00
Sargun Dhillon
43ee086360 Fix mock_test DeletePod to store updated pod status 2019-08-25 10:42:35 -07:00
Sargun Dhillon
0c6de30684 Merge pull request #746 from 928234269/patch2
fix tyop in doc.go
2019-08-21 08:29:46 -07:00
928234269
7305c08d7e fix tyop in doc.go
Signed-off-by: 928234269 <longfei.shang@daocloud.io>
2019-08-20 18:44:11 +08:00
Sargun Dhillon
ccb6713b86 Move location of eventhandler registration
This moves the event handler registration until after the cache
is in-sync.

It makes it so we can use the log object from the context,
rather than having to use the global logger

The cache race condition of the cache starting while the reactor
is being added wont exist because we wait for the cache
to startup / go in sync prior to adding it.
2019-08-18 08:20:49 -07:00
Brian Goff
2f2625c8e2 Merge pull request #734 from sargun/do-not-change-pods
Do not mutate pods, nor hand off pod references to provider
2019-08-15 10:58:39 -07:00
Sargun Dhillon
69f1186713 Do not mutate pods, nor hand off pod references to provider
This moves to a model where any time that pods are given to a
provider, it uses a DeepCopy, as opposed to a reference. If the
provider mutates the pod, it prevents it from causing issues
with the informer cache.

It has to use reflect instead of comparing the hashes because
spew prints DeepCopy'd data structures ever so slightly differently.
2019-08-15 09:59:01 -07:00
Sargun Dhillon
89d88a17ed Add a generic reactor to lifecycle_test to bump resource version (#733)
All updates in our tests should have the behaviour that best
reflects what API server does.
2019-08-15 08:46:38 +01:00
Brian Goff
cad19238fd Merge pull request #736 from sargun/fix-race
Wait for the informer to become in sync before starting tests
2019-08-14 11:44:21 -07:00
Sargun Dhillon
bc2f6e0dc4 Wait for the informer to become in sync before starting tests
If the informers are starting at the same time as createPods,
then we can get into a situation where the pod seems to get
"lost". Instead, we wait for the informer to get into sync
prior to the createpod event.

This also moves to one informer as a microoptimization in
the tests.
2019-08-14 07:03:53 -07:00
Brian Goff
47f5aa45df Merge pull request #727 from ethan-daocloud/patch-2
cleanup: fix some typos in node.go
2019-08-13 12:00:43 -07:00
Sargun Dhillon
de238ee280 Merge pull request #731 from sargun/document-api
Add documentation to the provider API about concurrency / mutability
2019-08-13 11:58:00 -07:00
Brian Goff
569706f371 Merge branch 'master' into document-api 2019-08-13 11:47:04 -07:00
Guangming Wang
cb307df71e cleanup: fix some typos in node.go
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-13 11:39:00 -07:00
Sargun Dhillon
40a4b54ca7 Merge pull request #728 from sargun/im-an-idiot
Remove usage of atomics in tests
2019-08-13 11:34:55 -07:00
Sargun Dhillon
edc0991c0c Fix hotloop around scheduling in lifecycle_test
Lifecycle test had a hotloop, where it would run a never-yielding
function while processing was going on elsewhere. This inserts
a sleep. A sleep is used rather than a yield to be kind to
people's battery life.
2019-08-13 11:25:21 -07:00
Sargun Dhillon
fbed4ca702 Remove usage of atomics
It turns out that running atomic.Read(...) in a tight loop breaks
Golang. The goroutine would never yield control over the scheduler,
so we ended up getting into a situation where the test would get
stuck forever. This moves to a different model, in which
there is a condition var, instead of atomics in loops.
2019-08-13 11:25:21 -07:00
Sargun Dhillon
9b27eb83fe Make mock_test follow the aformentioned documentation 2019-08-13 10:30:02 -07:00
Sargun Dhillon
3b3bf3ff20 Add documentation to the provider API about concurrency / mutability
This adds documentation around what is allowed to be mutated and
what may be accessed concurrently from the provider API. Previously,
the API was ambigious, and that meant providers could return pods
and change them. This resulted in data races occuring.
2019-08-13 10:29:12 -07:00
Sargun Dhillon
75a399f6f4 Merge pull request #724 from sargun/upgrade-k8s-v2
Upgrade k8s
2019-08-13 03:08:37 -07:00
Pires
f0a0e8cbfe Merge branch 'master' into upgrade-k8s-v2 2019-08-13 10:43:00 +01:00
Sargun Dhillon
32ff40eb56 Merge pull request #720 from sargun/set-test-timeout
Set timeout for tests on CI to  9 minutes
2019-08-12 14:53:09 -07:00
Sargun Dhillon
65c5446c94 Set timeout for tests on CI to 9 minutes
Right now, if the tests get stuck (on CI), they are terminated
after 10 minutes. This means as well that we get 0 output about
what went wrong.

Instead, this triggers a panic after 9 minutes on CI.
2019-08-12 13:45:30 -07:00
Brian Goff
cafcdeeefa Merge pull request #723 from sargun/lifecycle-test-fixes
Array of minor fixups to lifecycle tests
2019-08-12 13:22:51 -07:00
Sargun Dhillon
5c2b682cdc Array of minor fixups to lifecycle tests
* Fix the deletion test to actually test the pod is deleted
 * Fix the update pods test to update a value which is allowed
   to be updated
 * Shut down watches after tests
 * Do not delete pod statuses on DeletePod in mock_test

This intentionally leaks pod statuses, but it makes the situation
a lot less complicated around handling race conditions with
the GetPodStatus callback
2019-08-12 12:10:29 -07:00
Sargun Dhillon
e1c3bc3151 Merge pull request #725 from sargun/fix-race-conditions-in-node-test
Fix race conditions in node_test
2019-08-12 11:43:06 -07:00
Sargun Dhillon
5ac33e4b0a Fix race conditions in node_test 2019-08-12 11:33:48 -07:00
Sargun Dhillon
42656aae2f Merge pull request #719 from ethan-daocloud/patch-1
cleanup: fix misspelled words in error message
2019-08-12 11:09:35 -07:00
Brian Goff
10b291dba1 Merge branch 'master' into patch-1 2019-08-12 10:48:15 -07:00
Brian Goff
9d90c599e7 Merge pull request #721 from sargun/fix-race-condition
Fix race condition around worker ID generation in podcontroller.go
2019-08-12 10:43:32 -07:00
Sargun Dhillon
82de7f02c4 Upgrade Kubernetes e2e test cluster to 1.15.2 2019-08-12 10:30:04 -07:00
Sargun Dhillon
ad6cd7d552 Upgrade K8s
* Upgrade k8s.io/api
   go get k8s.io/api@kubernetes-1.15.2
 * Upgrade k8s.io/apimachinery
   go get k8s.io/apimachinery@kubernetes-1.15.2
 * Upgrade kubernetes-1.15.2
   go get k8s.io/client-go@kubernetes-1.15.2
 * Upgrade kk8s.io/kubernetes to v1.15.2
   go get k8s.io/kubernetes@v1.15.2

This also locks the the dependency for
github.com/prometheus/client_golang/prometheus due to a golang bug, and to
please the validation scripts.

The replaces were generated by:
go get k8s.io/kubernetes@v1.15.2 2> fail
for i in $(cat fail|grep unknown|cut -f1 -d@|cut -f2 -d" ")
  do echo "replace ${i} => ${i} kubernetes-1.15.2"
done
2019-08-12 10:29:19 -07:00
Sargun Dhillon
a28969355e Fix race condition around worker ID generation in podcontroller.go 2019-08-12 10:27:21 -07:00
ethan
75a1877d9f cleanup: fix misspelled words in error message
Signed-off-by: Guangming Wang <guangming.wang@daocloud.io>
2019-08-10 19:03:44 +08:00
Sargun Dhillon
a87af0818f Merge pull request #708 from sargun/better-docs
Add a little bit of documentation to NotifyPods
2019-08-08 03:10:15 -07:00
Sargun Dhillon
3efc9229ba Add a little bit of documentation to NotifyPods
As far as I can tell, based on the implementation in MockProvider
NotifyPods is called with the mutated pod. This allows us to
take a copy of the Pod object in NotifyPods, and make it so
(eventually) we don't need to do a callback to GetPodStatus.
2019-08-06 20:20:59 -07:00
choury
d0c91a1933 Fix log.Infof in mock (#714) 2019-08-05 20:30:59 +01:00
Sakura
7188238caa fix a to an in annotation (#715) 2019-08-05 20:13:40 +01:00
Brian Goff
9a7698b09f Merge pull request #706 from virtual-kubelet/better-test
Add a test which tests the e2e lifecycle of the pod controller
2019-07-31 11:05:29 -07:00
Sargun Dhillon
50bbc3d1d4 Add tests around updates
This makes sure the update function works correctly after the pod
is running if the podspec is changed. Upon writing the test, I realized
we were accessing the variables outside of the goroutine that the
workers with tests were running in, and we had no locks. Therefore,
I converted all of those numbers to use atomics.
2019-07-30 09:13:43 -07:00
Sargun Dhillon
bd8e39e3f9 Add a benchmark for pod creation
This adds a benchmark for pod creation and makes the mock_test
provider actually work correctly in concurrent situations.
2019-07-30 09:12:56 -07:00
Sargun Dhillon
ce38d72c0e Add additional lifecycle tests
* Don't scheduled failed, or succeeded pods
 * Delete dangling pods
2019-07-30 06:56:54 -07:00
Sargun Dhillon
4a270fea08 Add a test which tests the e2e lifecycle of the pod controller
This uses the mock provider, so I moved the mock provider to a
location where the node test can use it.
2019-07-30 06:56:54 -07:00
Sargun Dhillon
2974de3961 Merge pull request #711 from sargun/avoid-startup-race
Setup event handler at Pod Controller creation time
2019-07-29 09:37:28 -07:00
Sargun Dhillon
4d60fc2049 Setup event handler at Pod Controller creation time
This seems to avoid a race conditions where at pod informer
startup time, the reactor doesn't properly get setup.

It also refactors the root command example to start up
the informers after everything is wired up.
2019-07-26 13:57:00 -07:00
Brian Goff
28dac027ce Merge pull request #700 from cpuguy83/jaeger_exporter_import
Update jaeger exporter import path
2019-07-24 08:44:58 -07:00
Brian Goff
732c0a82d6 Merge branch 'master' into jaeger_exporter_import 2019-07-23 11:15:42 -07:00
Brian Goff
b056ac08bb Merge pull request #705 from virtual-kubelet/fix-new-pod-controller
Make NewPodController function validate that provider is set
2019-07-23 11:15:01 -07:00
Sargun Dhillon
ce60fb81d4 Make NewPodController function validate that provider is set
In NewPodController we validate that the rest of the config is
set to non-nil values. The provider must be non-nil as well.
2019-07-21 16:19:00 -07:00