Create a provider to use Azure Batch (#133)
* Started work on provider * WIP Adding batch provider * Working basic call into pool client. Need to parameterize the baseurl * Fixed job creation by manipulating the content-type * WIP Kicking off containers. Dirty * [wip] More meat around scheduling simple containers. * Working on basic task wrapper to co-schedule pods * WIP on task wrapper * WIP * Working pod minimal wrapper for batch * Integrate pod template code into provider * Cleaning up * Move to docker without gpu * WIP batch integration * partially working * Working logs * Tidy code * WIP: Testing and readme * Added readme and terraform deployment for GPU Azure Batch pool. * Update to enable low priority nodes for gpu * Fix log formatting bug. Return node logs when container not yet started * Moved to golang v1.10 * Fix cri test * Fix up minor docs Issue. Add provider to readme. Add var for vk image.
This commit is contained in:
committed by
Robbie Zhang
parent
1ad6fb434e
commit
d6e8b3daf7
49
providers/azurebatch/deployment/scripts/poolstartup.sh
Normal file
49
providers/azurebatch/deployment/scripts/poolstartup.sh
Normal file
@@ -0,0 +1,49 @@
|
||||
export DEBIAN_FRONTEND=noninteractive
|
||||
export TEMP_DISK=/mnt
|
||||
|
||||
apt-get install -y -q --no-install-recommends \
|
||||
build-essential
|
||||
|
||||
|
||||
# Add dockerce repo
|
||||
apt-get update -y -q --no-install-recommends
|
||||
apt-get install -y -q -o Dpkg::Options::="--force-confnew" --no-install-recommends \
|
||||
apt-transport-https ca-certificates curl software-properties-common cgroup-lite
|
||||
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
|
||||
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
|
||||
apt-get update
|
||||
|
||||
|
||||
#Install latest cuda driver..
|
||||
CUDA_REPO_PKG=cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
|
||||
wget -O /tmp/${CUDA_REPO_PKG} http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/${CUDA_REPO_PKG}
|
||||
sudo dpkg -i /tmp/${CUDA_REPO_PKG}
|
||||
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
|
||||
rm -f /tmp/${CUDA_REPO_PKG}
|
||||
sudo apt-get update -y -q --no-install-recommends
|
||||
sudo apt-get install cuda-drivers -y -q --no-install-recommends
|
||||
|
||||
# install nvidia-docker
|
||||
curl -fSsL https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
|
||||
curl -fSsL https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | \
|
||||
tee /etc/apt/sources.list.d/nvidia-docker.list
|
||||
apt-get update -y -q --no-install-recommends
|
||||
apt-get install -y -q --no-install-recommends -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confnew" nvidia-docker2
|
||||
systemctl restart docker.service
|
||||
nvidia-docker version
|
||||
|
||||
# prep docker
|
||||
systemctl stop docker.service
|
||||
rm -rf /var/lib/docker
|
||||
mkdir -p /etc/docker
|
||||
mkdir -p $TEMPDISK/docker
|
||||
chmod 777 $TEMPDISK/docker
|
||||
echo "{ \"data-root\": \"$TEMP_DISK/docker\", \"hosts\": [ \"unix:///var/run/docker.sock\", \"tcp://127.0.0.1:2375\" ] }" > /etc/docker/daemon.json.merge
|
||||
python -c "import json;a=json.load(open('/etc/docker/daemon.json.merge'));b=json.load(open('/etc/docker/daemon.json'));a.update(b);f=open('/etc/docker/daemon.json','w');json.dump(a,f);f.close();"
|
||||
rm -f /etc/docker/daemon.json.merge
|
||||
sed -i 's|^ExecStart=/usr/bin/dockerd.*|ExecStart=/usr/bin/dockerd|' /lib/systemd/system/docker.service
|
||||
systemctl daemon-reload
|
||||
systemctl start docker.service
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user