Commit Graph

56 Commits

Author SHA1 Message Date
Brian Goff
7dd49516d8 Decouple vkubelet/* packages from providers (#626)
This makes the concept of a `Provider` wholely implemented in the cli
implementation in cmd/virtual-kubelet.

It allows us to slim down the interfaces used in vkubelet (and
vkubelet/api) to what is actually used there rather than a huge
interface that is only there to serve the CLI's needs.
2019-05-17 17:01:05 -07:00
Jeremy Rickard
87e72bf4df Light up UpdatePod (#613)
* Light up UpdatePod

This PR updates the vkublet/pod.go createOrUpdate(..) method to actually handle
updates. It gets the pod from the provider as before, but now if it exists the method
checks the hash of the spec against the spec of the new pod. If they've changed, it
calls UpdatePod(..).

Also makes a small change to the Server struct to swap from kuberentes.Clientset to kubernetes.Interface
to better facilitate testing with fake ClientSet.

Co-Authored-By: Brian Goff <cpuguy83@gmail.com>
2019-05-17 11:14:29 -07:00
Sargun Dhillon
f1cb6a7bf6 Add the concept of startup timeout (#597)
This adds two concepts, where one encompasses the other.

Startup timeout
Startup timeout is how long to wait for the entire kubelet
to get into a functional state. Right now, this only waits
for the pod informer cache for the pod controllerto become
in-sync with API server, but this could be extended to other
informers (like secrets informer).

Wait For Startup
This changes the behaviour of the virtual kubelet to wait
for the pod controller to start before registering the node.

It is to avoid the race condition where the node is registered,
but we cannot actually do any pod operations.
2019-05-06 09:25:00 -07:00
Brian Goff
1942522cf6 Add async provider pod status updates (#493)
This adds a new interface that a provider can implement which enables
async notifications of pod status changes rather than the existing loop
which goes through every pod in k8s and checks the status in the
provider.
In practice this should be significantly more efficient since we are not
constantly listing all pods and then looking up the status in the
provider.

For providers that do not support this interface, the old method is
still used to sync state from the provider.

This commit does not update any of the providers to support this
interface.
2019-04-01 09:07:26 -07:00
Brian Goff
10430f0b7f Add node provider interfaace (#526)
This starts the work of having a `NodeProvider` which is responsible for
providing node details.
It splits the responsibilities of node management off to a new
controller.

The primary change here is to add the framework pieces for node
management and move the VK CLI to use this new controller.

It also adds support for node leases where available. This can be
enabled via the command line (disabled by default), but may fall back if
we find that leaess aren't supported on the cluster.
2019-03-25 15:02:40 -07:00
Brian Goff
3ab101da00 Use timer instead of ticker (#477)
Tickers always tick, so if we tick every 5 seconds and the work that we
perform at each tick takes 5 seconds, we end up just looping with no
sleep period.

Instead this is using a timer to ensure we actually get a full 5 second
sleep between loops.

We should consider an async API instead of polling the provider like
this.
2018-12-21 15:48:47 -08:00
Brian Goff
0d14914e85 Refactor http server stuff (#466)
* Don't start things in New

* Move http server handling up to daemon.

This removes the burdern of dealing with listeners, http servers, etc in
the core framework.

Instead provide helpers to attach the appropriate routes to the
caller's serve mux.

With this change, the vkubelet package only helps callers setup HTTP
rather than forcing a specific HTTP config on them.
2018-12-21 11:45:07 -08:00
Paulo Pires
d73e563b97 Merge branch 'master' into stop_ticker 2018-12-12 12:36:20 +00:00
Brian Goff
616d12ed76 Remove old pod notification stuff
These are no longer used since we started using the k8s client's queue.
2018-12-10 13:40:21 -08:00
Brian Goff
e6ca19d059 Ensure reconcile ticker stops on shutdown
Otherwise this ticker could run forever (or until the process exits).
2018-12-10 10:33:36 -08:00
Paulo Pires
28a757f4da use shared informers and workqueue (#425)
* vendor: add vendored code

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* controller: use shared informers and a work queue

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* errors: use cpuguy83/strongerrors

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* aci: fix test that uses resource manager

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* readme: clarify skaffold run before e2e

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* cmd: use root context everywhere

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: refactor pod lifecycle management

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* e2e: fix race in test when observing deletions

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* e2e: test pod forced deletion

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* cmd: fix root context potential leak

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: rename metaKey

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: remove calls to HandleError

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* Revert "errors: use cpuguy83/strongerrors"

This reverts commit f031fc6d.

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* manager: remove redundant lister constraint

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: rename the pod event recorder

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: amend misleading comment

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* mock: add tracing

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: add tracing

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* test: observe timeouts

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* trace: remove unnecessary comments

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: limit concurrency in deleteDanglingPods

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: never store context, always pass in calls

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: remove HandleCrash and just panic

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: don't sync succeeded pods

Signed-off-by: Paulo Pires <pjpires@gmail.com>

* sync: ensure pod deletion from kubernetes

Signed-off-by: Paulo Pires <pjpires@gmail.com>
2018-11-30 15:53:58 -08:00
Brian Goff
aee1fde504 Fix a case where provider pod status is not found
Updates the pod status in Kubernetes to "Failed" when the pod status is
not found from the provider.

Note that currently thet most providers return `nil, nil` when a pod is
not found. This works but should possibly return a typed error so we can
determine if the error means not found or something else... but this
works as is so I haven't changed it.
2018-11-06 16:11:42 -08:00
Brian Goff
bec818bf3c Do not close pod sync, use context cancel instead. (#402)
Closing the channel is racey and can lead to a panic on exit.
Instead rely on context cancellation to know if workers should exit.
2018-11-05 11:37:00 -08:00
Robbie Zhang
4a7b74ed42 [VK] Use Cache controller and Make create/delete pod Concurrently (#373)
* Add k8s.io/client-go/tools/cache package

* Add cache controller

* Add pod creator and terminator

* Pod Synchronizer

* Clean up

* Add back reconcile

* Remove unnecessary space in log

* Incorprate feedbacks

* dep ensure

* Fix the syntax error

* Fix the merge errors

* Minor Refactor

* Set status

* Pass context together with the pod to the pod channel

* Change to use flag to specify the number of pod sync workers

* Remove the unused const

* Use Stable PROD Region WestUS in Test

EastUS2EUAP is not reliable
2018-10-16 17:20:02 -07:00
Brian Goff
c1fe923131 Minor refactorings (#368)
* Split vkubelet funcitons into separate files.

* Minor re-org for cmd/census*

* refactor run loop
2018-10-12 17:36:37 -07:00
Brian Goff
682b2bccf8 Add support for tracing via OpenCencus
This adds a few flags for configuring the tracer.
Includes support for jaeger tracing (built into OC).
2018-09-26 13:48:40 -07:00
Brian Goff
083f6dee05 Refactor provider init (#360)
* Refactor provider init

This moves provider init out of vkubelet setup, instead preferring to
initialize vkubelet with a provider.

* Split API server configuration from setup.

This makes sure that configuration (which is done primarily through env
vars) is separate from actually standing up the servers.

This also makes sure to abort daemon initialization if the API servers
are not able to start.
2018-09-26 13:18:02 -07:00
Robbie Zhang
6b97713af3 Set the pod phase based on pod restart policy when provider failed (#361)
Update the resource manager to include the deleting pods in the GetPods function
2018-09-26 10:29:55 -07:00
Brian Goff
8091b089a2 Plumb context to providers 2018-09-13 13:49:26 -07:00
robbiezhang
4e20fc40ca Override the host in kubeconfig if MASTER_URI EnvVar is set 2018-09-10 12:56:50 -07:00
robbiezhang
0f54e1ed9c Bug fixes 2018-09-07 18:46:49 -07:00
Robbie Zhang
b019ec5549 Bug Fixes (#329) 2018-08-27 11:53:59 -07:00
Brian Goff
8de6693460 Don't use globals for API server
Refactors how HTTP servers are started and binds them to objects that
can store the provider rather than relying on a global.
2018-08-20 11:52:54 -07:00
Brian Goff
e8abca0ac9 Add supports for stats in ACI provider
This adds a new, optional, interface for providers that want to provide
stats.
2018-08-17 17:03:25 -07:00
Brian Goff
1e774a32b3 Use standard logging package (#323) 2018-08-17 16:50:24 -07:00
Robbie Zhang
d7f97b9bfc If --taint is specified, set the taint value to empty (#322)
Add the old tolerations the examples to make it backward compatible during the switch
2018-08-15 17:44:51 -07:00
Jacob LeGrone
5115c1e5cd Add back deprecated taint flag
TODO: Revert this commit

Related to #316
2018-08-14 17:09:44 -07:00
Jacob LeGrone
d47a0b2fc0 Add default provider taint and taint configuration options
This allows for more specificity when setting taint tolerations for
workloads. Three new env variables are introduced:

VKUBELET_TAINT_KEY (defaults to `virtual-kubelet.io/provider`)
VKUBELET_TAINT_VALUE (defaults to provider name)
VKUBELET_TAINT_EFFECT (defaults to `NoSchedule`)

BREAKING CHANGES:
- The default taint key of `azure.com/aci` is now
  `virtual-kubelet.io/provider`.
- Specifying a custom taint key is now done via an environment variable
  rather than the `--taint` command line flag.
2018-08-14 17:09:44 -07:00
Nick Maliwacki
bf02f887f0 Fix to build virtual-kubelet in windows 2018-08-09 18:31:35 -07:00
yaron2
36db5d9583 added Service Fabric Mesh provider 2018-07-31 16:00:56 -07:00
Robbie Zhang
3f83588e59 Reduce ACI API calls (#282)
* Reduce ACI API calls

Reduce reconcile calls and API calls in reconcile

* Fix the pod status update issue

* Revert a few unnecessary change
2018-07-31 13:31:00 -07:00
Robbie Zhang
6723b0d719 Register the Node when GetNode Returns NotFound (#254) 2018-07-11 14:57:09 -07:00
Liang Mingqiang
3e8a1b9bb5 config files for mock provider (#247) 2018-07-06 14:07:33 -07:00
Fei Xu
a30303035f Huawei Cloud Provider implementation (#241)
* add huawei CCI provider

* add readme

* add vender

* add huawei provider mock test
2018-06-29 10:21:15 -07:00
Robbie Zhang
e1fa5b03af [Azure] Increase the Default Node Quota for ACI (#240)
* Increase the default quota for ACI provider

* VK update the node capcity

* VK update node IP addresses
2018-06-25 11:08:49 -07:00
Lawrence Gripper
d6e8b3daf7 Create a provider to use Azure Batch (#133)
* Started work on provider

* WIP Adding batch provider

* Working basic call into pool client. Need to parameterize the baseurl

* Fixed job creation by manipulating the content-type

* WIP Kicking off containers. Dirty

* [wip] More meat around scheduling simple containers.

* Working on basic task wrapper to co-schedule pods

* WIP on task wrapper

* WIP

* Working pod minimal wrapper for batch

* Integrate pod template code into provider

* Cleaning up

* Move to docker without gpu

* WIP batch integration

* partially working

* Working logs

* Tidy code

* WIP: Testing and readme

* Added readme and terraform deployment for GPU Azure Batch pool.

* Update to enable low priority nodes for gpu

* Fix log formatting bug. Return node logs when container not yet started

* Moved to golang v1.10

* Fix cri test

* Fix up minor docs Issue. Add provider to readme. Add var for vk image.
2018-06-22 16:33:49 -07:00
Robbie Zhang
027b76651d Do not Sync ProviderFailed Pod Status (#227) 2018-06-18 10:21:09 -07:00
Robbie Zhang
22a4114890 Add an extra loop when the pod watcher is closed unexpectedly. (#226) 2018-06-11 14:09:16 -07:00
Loc Nguyen
513cebe7b7 VMware vSphere Integrated Containers provider (#206)
* Add Virtual Kubelet provider for VIC

Initial virtual kubelet provider for VMware VIC.  This provider currently
handles creating and starting of a pod VM via the VIC portlayer and persona
server.  Image store handling via the VIC persona server.  This provider
currently requires the feature/wolfpack branch of VIC.

* Added pod stop and delete.  Also added node capacity.

Added the ability to stop and delete pod VMs via VIC.  Also retrieve
node capacity information from the VCH.

* Cleanup and readme file

Some file clean up and added a Readme.md markdown file for the VIC
provider.

* Cleaned up errors, added function comments, moved operation code

1. Cleaned up error handling.  Set standard for creating errors.
2. Added method prototype comments for all interface functions.
3. Moved PodCreator, PodStarter, PodStopper, and PodDeleter to a new folder.

* Add mocking code and unit tests for podcache, podcreator, and podstarter

Used the unit test framework used in VIC to handle assertions in the provider's
unit test.  Mocking code generated using OSS project mockery, which is compatible
with the testify assertion framework.

* Vendored packages for the VIC provider

Requires feature/wolfpack branch of VIC and a few specific commit sha of
projects used within VIC.

* Implementation of POD Stopper and Deleter unit tests (#4)

* Updated files for initial PR
2018-06-04 15:41:32 -07:00
Fei Xu
9c38c1bdfb Add namespace for storeKey in ResourceManager (#211)
* add namespace for storeKey in RM

* fix UT for add namespace
2018-05-29 10:45:03 -07:00
Fei Xu
8068f3cac8 gofmt the project files (#205) 2018-05-18 16:13:34 -07:00
Ben Corrie
c6c89f062f CRI Provider implementation (#195)
* First commit of CRI provider. Vendor deps not included

* First commit of CRI provider. Vendor deps not included

* Tidy up comments and format code

* vendor grpc, CRI APIs, update protobuf and tidy logging

* First commit of CRI provider. Vendor deps not included

* Tidy up comments and format code

* vendor grpc, CRI APIs, update protobuf and tidy logging

* Add README

* Fix errors in make test
2018-05-15 08:50:52 -07:00
Robbie Zhang
f3ebde2533 Add the kubernete.io/hostname label to the VK node (#188) 2018-05-08 12:13:05 -07:00
Onur Filiz
feb2366e60 Add skeleton Fargate provider 2018-04-26 11:06:42 -07:00
Nick Schuch
cf30854bb1 Provider: Mock (#72)
* Provider: Mock

* Add pod conditions

* Mock: Convert pod list to a map and allow 'internalIP' to be configurable
2018-02-01 09:48:42 -08:00
Bhargav Nookala
a48f9866f2 Adding node exclusion label 2018-01-26 00:16:32 -08:00
chshou
8c0345edcf [Azure] Filters service account secret volume mount for Windows (#60)
* filters the SA secret volume for windows

* make it a map

* bettern go convention
2018-01-22 11:19:53 -08:00
Rajasekharan Vengalil
a4e99c2133 Implement web broker provider and a sample provider in Rust 2018-01-04 16:48:58 -08:00
Rita Zhang
04db926faa Set Status.DaemonEndpoints.KubeletEndpoint.Port to KUBELET_PORT 2017-12-20 19:13:56 -08:00
Rita Zhang
123863c1ae Integrate apiserver with provider GetContainerLogs 2017-12-15 17:25:19 -08:00