Create a provider to use Azure Batch (#133)

* Started work on provider

* WIP Adding batch provider

* Working basic call into pool client. Need to parameterize the baseurl

* Fixed job creation by manipulating the content-type

* WIP Kicking off containers. Dirty

* [wip] More meat around scheduling simple containers.

* Working on basic task wrapper to co-schedule pods

* WIP on task wrapper

* WIP

* Working pod minimal wrapper for batch

* Integrate pod template code into provider

* Cleaning up

* Move to docker without gpu

* WIP batch integration

* partially working

* Working logs

* Tidy code

* WIP: Testing and readme

* Added readme and terraform deployment for GPU Azure Batch pool.

* Update to enable low priority nodes for gpu

* Fix log formatting bug. Return node logs when container not yet started

* Moved to golang v1.10

* Fix cri test

* Fix up minor docs Issue. Add provider to readme. Add var for vk image.
This commit is contained in:
Lawrence Gripper
2018-06-23 00:33:49 +01:00
committed by Robbie Zhang
parent 1ad6fb434e
commit d6e8b3daf7
75 changed files with 20040 additions and 6 deletions

View File

@@ -0,0 +1,49 @@
export DEBIAN_FRONTEND=noninteractive
export TEMP_DISK=/mnt
apt-get install -y -q --no-install-recommends \
build-essential
# Add dockerce repo
apt-get update -y -q --no-install-recommends
apt-get install -y -q -o Dpkg::Options::="--force-confnew" --no-install-recommends \
apt-transport-https ca-certificates curl software-properties-common cgroup-lite
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update
#Install latest cuda driver..
CUDA_REPO_PKG=cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
wget -O /tmp/${CUDA_REPO_PKG} http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/${CUDA_REPO_PKG}
sudo dpkg -i /tmp/${CUDA_REPO_PKG}
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
rm -f /tmp/${CUDA_REPO_PKG}
sudo apt-get update -y -q --no-install-recommends
sudo apt-get install cuda-drivers -y -q --no-install-recommends
# install nvidia-docker
curl -fSsL https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
curl -fSsL https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | \
tee /etc/apt/sources.list.d/nvidia-docker.list
apt-get update -y -q --no-install-recommends
apt-get install -y -q --no-install-recommends -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confnew" nvidia-docker2
systemctl restart docker.service
nvidia-docker version
# prep docker
systemctl stop docker.service
rm -rf /var/lib/docker
mkdir -p /etc/docker
mkdir -p $TEMPDISK/docker
chmod 777 $TEMPDISK/docker
echo "{ \"data-root\": \"$TEMP_DISK/docker\", \"hosts\": [ \"unix:///var/run/docker.sock\", \"tcp://127.0.0.1:2375\" ] }" > /etc/docker/daemon.json.merge
python -c "import json;a=json.load(open('/etc/docker/daemon.json.merge'));b=json.load(open('/etc/docker/daemon.json'));a.update(b);f=open('/etc/docker/daemon.json','w');json.dump(a,f);f.close();"
rm -f /etc/docker/daemon.json.merge
sed -i 's|^ExecStart=/usr/bin/dockerd.*|ExecStart=/usr/bin/dockerd|' /lib/systemd/system/docker.service
systemctl daemon-reload
systemctl start docker.service