This adds a new interface that a provider can implement which enables
async notifications of pod status changes rather than the existing loop
which goes through every pod in k8s and checks the status in the
provider.
In practice this should be significantly more efficient since we are not
constantly listing all pods and then looking up the status in the
provider.
For providers that do not support this interface, the old method is
still used to sync state from the provider.
This commit does not update any of the providers to support this
interface.
This starts the work of having a `NodeProvider` which is responsible for
providing node details.
It splits the responsibilities of node management off to a new
controller.
The primary change here is to add the framework pieces for node
management and move the VK CLI to use this new controller.
It also adds support for node leases where available. This can be
enabled via the command line (disabled by default), but may fall back if
we find that leaess aren't supported on the cluster.
Tickers always tick, so if we tick every 5 seconds and the work that we
perform at each tick takes 5 seconds, we end up just looping with no
sleep period.
Instead this is using a timer to ensure we actually get a full 5 second
sleep between loops.
We should consider an async API instead of polling the provider like
this.
* Don't start things in New
* Move http server handling up to daemon.
This removes the burdern of dealing with listeners, http servers, etc in
the core framework.
Instead provide helpers to attach the appropriate routes to the
caller's serve mux.
With this change, the vkubelet package only helps callers setup HTTP
rather than forcing a specific HTTP config on them.
* vendor: add vendored code
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* controller: use shared informers and a work queue
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* errors: use cpuguy83/strongerrors
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* aci: fix test that uses resource manager
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* readme: clarify skaffold run before e2e
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* cmd: use root context everywhere
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: refactor pod lifecycle management
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* e2e: fix race in test when observing deletions
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* e2e: test pod forced deletion
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* cmd: fix root context potential leak
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: rename metaKey
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: remove calls to HandleError
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* Revert "errors: use cpuguy83/strongerrors"
This reverts commit f031fc6d.
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* manager: remove redundant lister constraint
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: rename the pod event recorder
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: amend misleading comment
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* mock: add tracing
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: add tracing
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* test: observe timeouts
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* trace: remove unnecessary comments
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: limit concurrency in deleteDanglingPods
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: never store context, always pass in calls
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: remove HandleCrash and just panic
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: don't sync succeeded pods
Signed-off-by: Paulo Pires <pjpires@gmail.com>
* sync: ensure pod deletion from kubernetes
Signed-off-by: Paulo Pires <pjpires@gmail.com>
Updates the pod status in Kubernetes to "Failed" when the pod status is
not found from the provider.
Note that currently thet most providers return `nil, nil` when a pod is
not found. This works but should possibly return a typed error so we can
determine if the error means not found or something else... but this
works as is so I haven't changed it.
* Add k8s.io/client-go/tools/cache package
* Add cache controller
* Add pod creator and terminator
* Pod Synchronizer
* Clean up
* Add back reconcile
* Remove unnecessary space in log
* Incorprate feedbacks
* dep ensure
* Fix the syntax error
* Fix the merge errors
* Minor Refactor
* Set status
* Pass context together with the pod to the pod channel
* Change to use flag to specify the number of pod sync workers
* Remove the unused const
* Use Stable PROD Region WestUS in Test
EastUS2EUAP is not reliable
* Refactor provider init
This moves provider init out of vkubelet setup, instead preferring to
initialize vkubelet with a provider.
* Split API server configuration from setup.
This makes sure that configuration (which is done primarily through env
vars) is separate from actually standing up the servers.
This also makes sure to abort daemon initialization if the API servers
are not able to start.
This allows for more specificity when setting taint tolerations for
workloads. Three new env variables are introduced:
VKUBELET_TAINT_KEY (defaults to `virtual-kubelet.io/provider`)
VKUBELET_TAINT_VALUE (defaults to provider name)
VKUBELET_TAINT_EFFECT (defaults to `NoSchedule`)
BREAKING CHANGES:
- The default taint key of `azure.com/aci` is now
`virtual-kubelet.io/provider`.
- Specifying a custom taint key is now done via an environment variable
rather than the `--taint` command line flag.
* Started work on provider
* WIP Adding batch provider
* Working basic call into pool client. Need to parameterize the baseurl
* Fixed job creation by manipulating the content-type
* WIP Kicking off containers. Dirty
* [wip] More meat around scheduling simple containers.
* Working on basic task wrapper to co-schedule pods
* WIP on task wrapper
* WIP
* Working pod minimal wrapper for batch
* Integrate pod template code into provider
* Cleaning up
* Move to docker without gpu
* WIP batch integration
* partially working
* Working logs
* Tidy code
* WIP: Testing and readme
* Added readme and terraform deployment for GPU Azure Batch pool.
* Update to enable low priority nodes for gpu
* Fix log formatting bug. Return node logs when container not yet started
* Moved to golang v1.10
* Fix cri test
* Fix up minor docs Issue. Add provider to readme. Add var for vk image.
* Add Virtual Kubelet provider for VIC
Initial virtual kubelet provider for VMware VIC. This provider currently
handles creating and starting of a pod VM via the VIC portlayer and persona
server. Image store handling via the VIC persona server. This provider
currently requires the feature/wolfpack branch of VIC.
* Added pod stop and delete. Also added node capacity.
Added the ability to stop and delete pod VMs via VIC. Also retrieve
node capacity information from the VCH.
* Cleanup and readme file
Some file clean up and added a Readme.md markdown file for the VIC
provider.
* Cleaned up errors, added function comments, moved operation code
1. Cleaned up error handling. Set standard for creating errors.
2. Added method prototype comments for all interface functions.
3. Moved PodCreator, PodStarter, PodStopper, and PodDeleter to a new folder.
* Add mocking code and unit tests for podcache, podcreator, and podstarter
Used the unit test framework used in VIC to handle assertions in the provider's
unit test. Mocking code generated using OSS project mockery, which is compatible
with the testify assertion framework.
* Vendored packages for the VIC provider
Requires feature/wolfpack branch of VIC and a few specific commit sha of
projects used within VIC.
* Implementation of POD Stopper and Deleter unit tests (#4)
* Updated files for initial PR
* First commit of CRI provider. Vendor deps not included
* First commit of CRI provider. Vendor deps not included
* Tidy up comments and format code
* vendor grpc, CRI APIs, update protobuf and tidy logging
* First commit of CRI provider. Vendor deps not included
* Tidy up comments and format code
* vendor grpc, CRI APIs, update protobuf and tidy logging
* Add README
* Fix errors in make test