Commit Graph

99 Commits

Author SHA1 Message Date
Brian Goff
7dd49516d8 Decouple vkubelet/* packages from providers (#626)
This makes the concept of a `Provider` wholely implemented in the cli
implementation in cmd/virtual-kubelet.

It allows us to slim down the interfaces used in vkubelet (and
vkubelet/api) to what is actually used there rather than a huge
interface that is only there to serve the CLI's needs.
2019-05-17 17:01:05 -07:00
Jeremy Rickard
87e72bf4df Light up UpdatePod (#613)
* Light up UpdatePod

This PR updates the vkublet/pod.go createOrUpdate(..) method to actually handle
updates. It gets the pod from the provider as before, but now if it exists the method
checks the hash of the spec against the spec of the new pod. If they've changed, it
calls UpdatePod(..).

Also makes a small change to the Server struct to swap from kuberentes.Clientset to kubernetes.Interface
to better facilitate testing with fake ClientSet.

Co-Authored-By: Brian Goff <cpuguy83@gmail.com>
2019-05-17 11:14:29 -07:00
Sargun Dhillon
63fa4e124b Add the /runningpods/ api endpoint (#611)
* Add the /runningpods/ api endpoint

This adds an API endpoint from the kubelet (/runningpods/). It is
an endpoint on kubelet which is considered a "debug" endpoint, so
it might be worth exposing through the options, but by default
it is exposed in most k8s configs AFAICT.
2019-05-13 15:10:31 -07:00
Brian Goff
3cc051f7c2 Use I/O stream for provider logs interface
Providers must still update the implementaiton to actually gain any
benefit here, but this makes the provider interface a bit more sane.
2019-05-08 09:17:29 -07:00
Sargun Dhillon
f1cb6a7bf6 Add the concept of startup timeout (#597)
This adds two concepts, where one encompasses the other.

Startup timeout
Startup timeout is how long to wait for the entire kubelet
to get into a functional state. Right now, this only waits
for the pod informer cache for the pod controllerto become
in-sync with API server, but this could be extended to other
informers (like secrets informer).

Wait For Startup
This changes the behaviour of the virtual kubelet to wait
for the pod controller to start before registering the node.

It is to avoid the race condition where the node is registered,
but we cannot actually do any pod operations.
2019-05-06 09:25:00 -07:00
Brian Goff
d809dff289 Refactor exec interface (#578)
This removes the dependence on remotecommand in providers as well as the
need to expose provider ID's for the sake of the ExecInContainer API.
2019-04-26 12:57:56 -07:00
Brian Goff
449eb3bb7d Fix exec parameter parsing (#580)
Exec seems to be broken by ad6fbba806
This change basically copies what's in remotecommand.NewOptions, just
without the logging.
2019-04-25 15:51:53 -07:00
Yash Desai
de32752395 Set container env var using services. (#573)
* Introduce service env vars.
2019-04-17 11:30:39 -07:00
Brian Goff
99c07d487e Fix node create after delete
node.ResourceVersion must not be set when creating a node.
This issue prevents vk from resolving issues after the vk node instance
has been deleted (for whatever reason).
2019-04-03 22:57:11 -07:00
Yash Desai
85292ef4ef Patch the node status instead of updating it. (#557)
* Patch the node status instead of updating it.

Virtual-kubelet updates the node status periodically.
This change proposes we use the `Patch` API instead of `Update`,
to update the node status.
This avoids overwriting any node updates made by other controllers
in the system, for example a attach-detach controller.
Patch API does a strategic merge instead of overwriting
the entire object, which ensures parallel updates don't overwrite
each other.

Note: `PatchNodeStatus` reduces the time precision to the seconds-level
and therefore I corrected the test for this.

consider two controllers:
CONTROLLER 1 (virtual kubelet)                       | CONTROLLER 2
oldNode := nodes.Get(nodename)                       |
                                                     | node := nodes.Get(nodename)
                                                     | // update status with attached volumes info
                                                     | updateNode := Nodes.UpdateStatus(node)
// update vkubelet info on node status               |
latestNode := Nodes.UpdateStatus(oldNode)            |
<-- latestNode does not contain the volume info added by second controller.

with my patch change:

CONTROLLER 1 (virtual kubelet)                       | CONTROLLER 2
oldNode := Nodes.Get(nodename)                       |
                                                     | node := Nodes.Get(nodename)
                                                     | // update status with attached volumes info
                                                     | updateNode := Nodes.UpdateStatus(node)
node := oldNode.DeepCopy()                           |
// update vkubelet info on node status               |
latestNode := util.PatchNodeStatus(oldNode, node)    |
<-- latestNode contains the volume info added by second controller.

Testing Done: make test

* Introduce PatchNodeStatus into vkubelet.

* Pass only the node interface.
2019-04-03 10:40:57 -07:00
Brian Goff
80de7fd566 Fix typo 2019-04-01 11:43:54 -07:00
Brian Goff
1942522cf6 Add async provider pod status updates (#493)
This adds a new interface that a provider can implement which enables
async notifications of pod status changes rather than the existing loop
which goes through every pod in k8s and checks the status in the
provider.
In practice this should be significantly more efficient since we are not
constantly listing all pods and then looking up the status in the
provider.

For providers that do not support this interface, the old method is
still used to sync state from the provider.

This commit does not update any of the providers to support this
interface.
2019-04-01 09:07:26 -07:00
Brian Goff
947b530f1e Replace testify with gotest.tools (#553)
* vendor gotest.tools

* Run gotest.tools migration tools

* Fixup tests that were improperly converted

* Remove unused testify package vendors
2019-03-28 17:08:12 -07:00
Brian Goff
10430f0b7f Add node provider interfaace (#526)
This starts the work of having a `NodeProvider` which is responsible for
providing node details.
It splits the responsibilities of node management off to a new
controller.

The primary change here is to add the framework pieces for node
management and move the VK CLI to use this new controller.

It also adds support for node leases where available. This can be
enabled via the command line (disabled by default), but may fall back if
we find that leaess aren't supported on the cluster.
2019-03-25 15:02:40 -07:00
Vineeth Reddy
5cea3e7ea8 FieldRef feature for DownwardAPI (#534)
* FieldRef feature for DownwardAPI

Signed-off-by: VineethReddy02 <vineethpothulapati@outlook.com>

* Unit tests for FieldRef

Signed-off-by: VineethReddy02 <vineethpothulapati@outlook.com>
2019-03-08 11:15:08 -08:00
Brian Goff
1bfffa975e Make tracing interface to coalesce logging/tracing (#519)
* Define and use an interface for logging.

This allows alternative implementations to use whatever logging package
they want.

Currently the interface just mimicks what logrus already implements,
with minor modifications to not rely on logrus itself. I think the
interface is pretty solid in terms of logging implementations being able
to do what they need to.

* Make tracing interface to coalesce logging/tracing

Allows us to share data between the tracer and the logger so we can
simplify log/trace handling wher we generally want data to go both
places.
2019-02-22 11:36:03 -08:00
Fei Xu
ad6fbba806 parse the exec flags from request query (#510) 2019-02-01 17:05:37 -08:00
Paulo Pires
323c02d468 env: fix resource reference Optional nil pointer (#491)
Signed-off-by: Paulo Pires <pjpires@gmail.com>
2019-01-08 10:52:56 -08:00
Brian Goff
5796be449b Adds some package docs (#479)
Was just browing godoc and noticed we are missing some docs that would
be quite useful.
2019-01-07 11:03:35 -08:00
Brian Goff
3ab101da00 Use timer instead of ticker (#477)
Tickers always tick, so if we tick every 5 seconds and the work that we
perform at each tick takes 5 seconds, we end up just looping with no
sleep period.

Instead this is using a timer to ensure we actually get a full 5 second
sleep between loops.

We should consider an async API instead of polling the provider like
this.
2018-12-21 15:48:47 -08:00
Brian Goff
0d14914e85 Refactor http server stuff (#466)
* Don't start things in New

* Move http server handling up to daemon.

This removes the burdern of dealing with listeners, http servers, etc in
the core framework.

Instead provide helpers to attach the appropriate routes to the
caller's serve mux.

With this change, the vkubelet package only helps callers setup HTTP
rather than forcing a specific HTTP config on them.
2018-12-21 11:45:07 -08:00
Paulo Pires
5a0093ce31 vkubelet: set kubelet version to build version (#446)
* deps: bump to Kubernetes 1.13.1

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* version: new VK version

Signed-off-by: Paulo Pires <pjpires@gmail.com>
2018-12-18 17:08:23 -08:00
Paulo Pires
4c80760079 tests: add "test/util" subpackage
Signed-off-by: Paulo Pires <pjpires@gmail.com>
2018-12-15 11:01:42 +00:00
Paulo Pires
8bcbbf58cd env: rename methods and improve readability
Signed-off-by: Paulo Pires <pjpires@gmail.com>
2018-12-15 11:01:41 +00:00
Paulo Pires
f839db4692 tests: envvars processing
Signed-off-by: Paulo Pires <pjpires@gmail.com>
2018-12-15 11:01:40 +00:00
Paulo Pires
103a19fe9d env: observe envFrom
Also observe initContainers env and envFrom.

Fixes #460
Fixes #461

Signed-off-by: Paulo Pires <pjpires@gmail.com>
2018-12-15 11:01:40 +00:00
Paulo Pires
62b46d971c env: emit events for missing envvars
Fixes #465

Signed-off-by: Paulo Pires <pjpires@gmail.com>
2018-12-15 11:01:36 +00:00
Tarun Pothulapati
fbae26fc11 env: fix pod envFrom processing 2018-12-12 13:18:39 +00:00
Paulo Pires
d73e563b97 Merge branch 'master' into stop_ticker 2018-12-12 12:36:20 +00:00
Brian Goff
616d12ed76 Remove old pod notification stuff
These are no longer used since we started using the k8s client's queue.
2018-12-10 13:40:21 -08:00
Brian Goff
e6ca19d059 Ensure reconcile ticker stops on shutdown
Otherwise this ticker could run forever (or until the process exits).
2018-12-10 10:33:36 -08:00
Brian Goff
ab7c55cb5f Make pod status updates concurrent. (#433)
This uses the same number of workers as the pod sync workers.

We may want to start a worker queue here instead, but I think for now
this is ok, particularly because we are limiting the number of
goroutines being spun up at once.
2018-12-04 14:03:45 -08:00
Paulo Pires
28a757f4da use shared informers and workqueue (#425)
* vendor: add vendored code

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* controller: use shared informers and a work queue

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* errors: use cpuguy83/strongerrors

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* aci: fix test that uses resource manager

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* readme: clarify skaffold run before e2e

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* cmd: use root context everywhere

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: refactor pod lifecycle management

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* e2e: fix race in test when observing deletions

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* e2e: test pod forced deletion

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* cmd: fix root context potential leak

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: rename metaKey

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: remove calls to HandleError

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* Revert "errors: use cpuguy83/strongerrors"

This reverts commit f031fc6d.

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* manager: remove redundant lister constraint

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: rename the pod event recorder

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: amend misleading comment

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* mock: add tracing

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: add tracing

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* test: observe timeouts

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* trace: remove unnecessary comments

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: limit concurrency in deleteDanglingPods

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: never store context, always pass in calls

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: remove HandleCrash and just panic

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: don't sync succeeded pods

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: ensure pod deletion from kubernetes

Signed-off-by: Paulo Pires <pjpires@gmail.com>
2018-11-30 15:53:58 -08:00
Paulo Pires
0f8ef994a3 sync: don't swallow delete errors
Signed-off-by: Paulo Pires <pjpires@gmail.com>
2018-11-28 20:31:55 +00:00
Brian Goff
aee1fde504 Fix a case where provider pod status is not found
Updates the pod status in Kubernetes to "Failed" when the pod status is
not found from the provider.

Note that currently thet most providers return `nil, nil` when a pod is
not found. This works but should possibly return a typed error so we can
determine if the error means not found or something else... but this
works as is so I haven't changed it.
2018-11-06 16:11:42 -08:00
Brian Goff
bec818bf3c Do not close pod sync, use context cancel instead. (#402)
Closing the channel is racey and can lead to a panic on exit.
Instead rely on context cancellation to know if workers should exit.
2018-11-05 11:37:00 -08:00
robbiezhang
966c76368f user %T instead of reflect.TypeOf 2018-10-18 20:06:03 +00:00
robbiezhang
a6bab6e3bb Fix the potential runtime type casting error 2018-10-18 19:15:05 +00:00
Robbie Zhang
4a7b74ed42 [VK] Use Cache controller and Make create/delete pod Concurrently (#373)
* Add k8s.io/client-go/tools/cache package

* Add cache controller

* Add pod creator and terminator

* Pod Synchronizer

* Clean up

* Add back reconcile

* Remove unnecessary space in log

* Incorprate feedbacks

* dep ensure

* Fix the syntax error

* Fix the merge errors

* Minor Refactor

* Set status

* Pass context together with the pod to the pod channel

* Change to use flag to specify the number of pod sync workers

* Remove the unused const

* Use Stable PROD Region WestUS in Test

EastUS2EUAP is not reliable
2018-10-16 17:20:02 -07:00
Brian Goff
c1fe923131 Minor refactorings (#368)
* Split vkubelet funcitons into separate files.

* Minor re-org for cmd/census*

* refactor run loop
2018-10-12 17:36:37 -07:00
Brian Goff
682b2bccf8 Add support for tracing via OpenCencus
This adds a few flags for configuring the tracer.
Includes support for jaeger tracing (built into OC).
2018-09-26 13:48:40 -07:00
Brian Goff
083f6dee05 Refactor provider init (#360)
* Refactor provider init

This moves provider init out of vkubelet setup, instead preferring to
initialize vkubelet with a provider.

* Split API server configuration from setup.

This makes sure that configuration (which is done primarily through env
vars) is separate from actually standing up the servers.

This also makes sure to abort daemon initialization if the API servers
are not able to start.
2018-09-26 13:18:02 -07:00
Robbie Zhang
6b97713af3 Set the pod phase based on pod restart policy when provider failed (#361)
Update the resource manager to include the deleting pods in the GetPods function
2018-09-26 10:29:55 -07:00
Robbie Zhang
87acc00457 Merge branch 'master' into alicloud-eci 2018-09-24 12:33:19 -07:00
shidao-ytt
e9d17c23d3 Add Alibaba Cloud ECI Provider
Alibaba Cloud ECI(Elastic Container Instance) is a service that allow you
run containers without having to manage servers or clusters.

This commit add ECI provider for virtual kubelet, connects ECI with
kubernetes cluster.

Signed-off-by: xianwei.zw <xianwei.zw@alibaba-inc.com>
Signed-off-by: shidao.ytt <shidao.ytt@alibaba-inc.com>
2018-09-23 23:29:06 +08:00
Brian Goff
da5e24ef4d Move API handlers to separate package
This makes the package split a little cleaner and easier to import the
HTTP handlers for other consumers.
2018-09-18 11:08:24 -07:00
Brian Goff
74f76c75d5 Instrustment handlers for logging/error handling
This refactors a bit of the http handler code.
Moves error handling for handler functions to a generic handler.
This also has a side-effect of being able to propagate errors from the
provider to send the correct status code, provided the error type
matches a pre-defined interface.
2018-09-17 16:54:24 -07:00
Brian Goff
8eb6ab4bcd Remove intermediate API server objects
Instead just generate HTTP handler functions directly.
2018-09-17 14:47:26 -07:00
Brian Goff
8091b089a2 Plumb context to providers 2018-09-13 13:49:26 -07:00
robbiezhang
4e20fc40ca Override the host in kubeconfig if MASTER_URI EnvVar is set 2018-09-10 12:56:50 -07:00