v1.0.0 -> v1.1.0.

Since there have been a few minor but not insignificant changes across
several services its time for a tag update!

General:
- add k8s-node-termination-handler to cordon nodes before preemption,
increasing uptime with nicer failover
- Add podAntiAffinity to spread pods for a service across nodes more
evenly, for several services.

Gitlab and CI:
- Update from 11.10.4 -> 12.2.0
- Add retry for Gitlab's self-deploy, which always used to fail.

README.md:
- Update ubuntu-node-01 OS information
- Re-run VPN performance.
- Add note about pusher/oauth2_proxy
parent d04c3df1
......@@ -7,21 +7,25 @@
- [Added](#added)
- [Changes](#changes)
- [Removed](#removed)
- [1.0.0 - 2019-07-01](#100-2019-07-01)
- [1.1.0 - 2019-08-23](#110-2019-08-23)
- [Added](#added-1)
- [Changes](#changes-1)
- [Removed](#removed-1)
- [0.2.0 - 2018-03-05](#020-2018-03-05)
- [1.0.0 - 2019-07-01](#100-2019-07-01)
- [Added](#added-2)
- [Changes](#changes-2)
- [Removed](#removed-2)
- [0.1.1 - 2017-12-05](#011-2017-12-05)
- [0.2.0 - 2018-03-05](#020-2018-03-05)
- [Added](#added-3)
- [Changes](#changes-3)
- [0.1.0 - 2017-11-25](#010-2017-11-25)
- [Removed](#removed-3)
- [0.1.1 - 2017-12-05](#011-2017-12-05)
- [Added](#added-4)
- [Changes](#changes-4)
- [Removed](#removed-3)
- [0.1.0 - 2017-11-25](#010-2017-11-25)
- [Added](#added-5)
- [Changes](#changes-5)
- [Removed](#removed-4)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
......@@ -33,14 +37,20 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
## [Unreleased]
### Added
### Changes
### Removed
## [1.1.0] - 2019-08-23
### Added
- [k8s-node-termination-handler](https://github.com/GoogleCloudPlatform/k8s-node-termination-handler) and podAntiAffinity added to the GKE cluster to see if uptimes improve.
- Personal websites [justinpalpant-website](https://gitlab.palpant.us/justin/justinpalpant-website) and [timpalpant-website](https://gitlab.palpant.us/justin/timpalpant-website) added as submodules.
### Changes
- [transmission-web](https://gitlab.palpant.us/palpantlab-transmission) project now has a public, authenticated [web UI](https://transmission.palpant.us).
- Kubernetes dashboard deployment updated to `v2.0.0-beta2` and moved to the `kubernetes-dashboard` namespace accordingly.
- Gitlab updated to v12.1.2 and Gitlab Runner updated to v12.0.1.
- Gitlab updated to v12.2.0 and Gitlab Runner updated to v12.2.0.
- Personal websites updated to provide Expires headers and better support client-side caching.
- Support retry on script_failure in deployment of GitLab (which frequently fails, because when GitLab deploys itself it often restarts and causes a script failure mid-job)
- ubuntu-node-01 upgraded to Ubuntu 19.04.
### Removed
## [1.0.0] - 2019-07-01
......@@ -66,7 +76,7 @@ Lastly, I have split up the single mono-repo into individual repos to support si
- Redeployed GitLab + CI as a [cloud-native helm chart](https://gitlab.palpant.us/justin/palpantlab-gitlab)
- Update Gitlab to 11.10.4 and allowed clone-over-SSH
- On-prem reverse-proxying and certificate management reverted to be handled by nas.sfo.palpant.us instead of nginx-ingress.
- Update ubuntu-node-01 to Ubuntu 18.04.2, NVIDIA drivers to 430.26, Linux kernel to 5.1.15, Kubernetes to 1.15.0, Docker to 18.09.3.
- Update ubuntu-node-01 to Ubuntu 18.04.2, NVIDIA drivers to 430.26, Linux kernel to 5.1.15, Kubernetes to 1.15.0.
- ubuntu-node-01 backups with LVM replaced with simple userspace backup with duplicacy managed via [dotfiles](https://gitlab.palpant.us/justin/dotfiles). No backup of all data, though the root partition including /boot is in RAID1.
### Removed
......@@ -152,7 +162,8 @@ Lastly, I have split up the single mono-repo into individual repos to support si
- HAProxy, all instances
- Most of the 9s in my previous uptime. But they will be back, and better than ever!
[Unreleased]: https://gitlab.palpant.us/justin/palpantlab-infra/compare/v1.0.0...HEAD
[Unreleased]: https://gitlab.palpant.us/justin/palpantlab-infra/compare/v1.1.0...HEAD
[1.1.0]: https://gitlab.palpant.us/justin/palpantlab-infra/compare/v1.0.0...v1.1.0
[1.0.0]: https://gitlab.palpant.us/justin/palpantlab-infra/compare/v0.2.0...v1.0.0
[0.2.0]: https://gitlab.palpant.us/justin/palpantlab-infra/compare/v0.1.1...v0.2.0
[0.1.1]: https://gitlab.palpant.us/justin/palpantlab-infra/compare/v0.1.0...v0.1.1
......
......@@ -90,9 +90,9 @@ All of this actually started with hosting a private GitLab instance, https://git
Prior to the move to GKE there were periods of significant instability and high-risk of data loss. However, data is now stored on GCE Peristent Disks, with LFS artifacts and daily backups (via Kubernetes CronJob) stored to GCS with object versioning enabled, greatly reducing the risk of data loss.
GitLab cloudnative helm chart: `gitlab-2.1.2`
GitLab cloudnative helm chart: `gitlab-2.2.0`
GitLab version: `12.1.2`
GitLab version: `12.2.0`
#### GitLab CI Runner
In addition to Gitlab shared GitLab runners have been created for use with any project which is hosted on https://gitlab.palpant.us. The shared runners use the Kubernetes executor to launch a build pod for each triggered build for projects that have a .gitlab-ci.yaml file in the root of their repository. By default these Kubernetes executors now use Kaniko as their base image, to avoid the need for Docker-in-Docker (DinD) which is not compatible with GKE on cos. The GitLab runner continously polls the GitLab server over the public internet (using a preshared token for authorization) for jobs, and executes thoes jobs by creating Kubernetes pods and `exec`ing into them to run the commands of the job.
......@@ -101,7 +101,7 @@ The GKE runner is labeled `gke`, `k8s`, `west1-b`.
For more information in Gitlab CI, see [here](https://about.gitlab.com/features/gitlab-ci-cd/).
GitLab runner version: `gitlab/gitlab-runner:v12.0.1`
GitLab runner version: `gitlab/gitlab-runner:v12.2.0`
#### Personal (static) Websites
I deploy a handful of static websites reliably on the cluster, providing reliability via replication and loadbalancing (via NGINX and Kubernetes) and low-latency (courtesy of Google Cloud's networking mostly). These include my personal website, [justin.palpant.us](https://justin.palpant.us) and my brother's, [tim.palpant.us](https://tim.palpant.us).
......@@ -126,14 +126,33 @@ A single-node Kubernetes master running "on-prem". This single-node cluster has
### Kubernetes Nodes
#### 193.168.0.31/ubuntu-node-01.sfo.palpant.us
Ubuntu 18.04.2 running on Intel i9-9900k, custom build.
Ubuntu 19.04 running on Intel i9-9900k, custom build.
##### Hardware
CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
Memory: 2x16GiB DDR4 2400MHz
Disk: Samsung SSD 970 EVO 250GB (nvme0n1); Samsung SSD 850 EVO 1TB (sda); Samsung SSD 840 PRO Series (sdb); LVM RAID1 between nvme0n1 and sdb for `/`
Disk: Samsung SSD 970 EVO 250GB (nvme0n1); Samsung SSD 850 EVO 1TB (sda); Samsung SSD 840 PRO Series (sdb).
```
sda 8:0 0 931.5G 0 disk
└─ubuntu--vg-homes 253:5 0 931.5G 0 lvm /home
sdb 8:16 0 238.5G 0 disk
├─sdb1 8:17 0 400M 0 part
└─sdb2 8:18 0 238.1G 0 part
├─ubuntu--vg-root_rmeta_0 253:0 0 4M 0 lvm
│ └─ubuntu--vg-root 253:4 0 232.5G 0 lvm /
└─ubuntu--vg-root_rimage_0 253:1 0 232.5G 0 lvm
└─ubuntu--vg-root 253:4 0 232.5G 0 lvm /
nvme0n1 259:0 0 232.9G 0 disk
├─nvme0n1p1 259:1 0 232.5G 0 part
│ ├─ubuntu--vg-root_rmeta_1 253:2 0 4M 0 lvm
│ │ └─ubuntu--vg-root 253:4 0 232.5G 0 lvm /
│ └─ubuntu--vg-root_rimage_1 253:3 0 232.5G 0 lvm
│ └─ubuntu--vg-root 253:4 0 232.5G 0 lvm /
└─nvme0n1p2 259:2 0 400M 0 part /boot/efi
```
GFX: Geforce GTX 1050Ti 4GiB
......@@ -158,34 +177,22 @@ GFX: Geforce GTX 1050Ti 4GiB
Remote bandwidth test:
```bash
jpalpant@VSN00230:/Volumes/local $ iperf -c 192.168.0.32
$ iperf -c 192.168.0.31
------------------------------------------------------------
Client connecting to 192.168.0.32, TCP port 5001
TCP window size: 128 KByte (default)
Client connecting to 192.168.0.31, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[ 4] local 10.0.1.6 port 63311 connected with 192.168.0.32 port 5001
[ 3] local 10.0.1.10 port 52684 connected with 192.168.0.31 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 62.6 MBytes 52.5 Mbits/sec
jpalpant@VSN00230:/Volumes/local $ iperf -c 192.168.0.33
------------------------------------------------------------
Client connecting to 192.168.0.33, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 4] local 10.0.1.6 port 63328 connected with 192.168.0.33 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 62.8 MBytes 52.6 Mbits/sec
[ 3] 0.0-10.1 sec 25.1 MBytes 20.9 Mbits/sec
```
Remote latency test:
```bash
--- 192.168.0.33 ping statistics ---
81 packets transmitted, 81 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 5.884/6.915/28.148/2.844 ms
--- 192.168.0.32 ping statistics ---
53 packets transmitted, 53 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 5.939/6.530/10.834/1.008 ms
--- 192.168.0.31 ping statistics ---
57 packets transmitted, 57 received, 0% packet loss, time 56087ms
rtt min/avg/max/mdev = 7.830/15.914/63.813/14.019 ms
```
###### L2TP VPN Server
......@@ -266,12 +273,12 @@ CloudSQL proxy version: `gcr.io/cloudsql-docker/gce-proxy:1.11`
#### GitLab CI Runner
The GitLab runner which interacts with `palpantlab-sfo` is tagged with tags `k8s`, `sfo`, and `gpu`. It uses nvidia-device-drivers to provide access to the NVIDIA GPU directly to automated builds, should it be necessary (e.g. to run tests that use tensorflow-gpu). That runner is maintained with the rest of the GitLab deployment, in [palpantlab-gitlab](https://gitlab.palpant.us/justin/palpantlab-gitlab).
GitLab runner version: `gitlab/gitlab-runner:v12.0.1`
GitLab runner version: `gitlab/gitlab-runner:v12.2.0`
#### transmission-web
I deployed the open-source project [haugene/docker-transmission-openvpn](https://github.com/haugene/docker-transmission-openvpn) as a Kubernetes Deployment and Service on the on-prem cluster. It has access to the NAS as well as to the filesystem on the local node, and it has sufficient privilege to open an OpenVPN tunnel, so all torrent traffic goes through NordVPN. The deployment of this webapp is managed at [palpantlab-tranmission](https://gitlab.palpant.us/justin/palpantlab-transmission). In the most recent change I added a public portal to the web UI. Previously the service was only accessible to people with 1) Kubernetes cluster credentials and 2) VPN credentials (which was only me). Now the service is public, but protected with [pusher/oauth2_proxy](https://github.com/pusher/oauth2_proxy) and can only be accessed with a Google email that belongs to the appropriate group. That web UI is accessible at [https://transmission.palpant.us](https://transmission.palpant.us) (but of course the only thing the page shows is a "Sign in with Google" button.
I deployed the open-source project [haugene/docker-transmission-openvpn](https://github.com/haugene/docker-transmission-openvpn) as a Kubernetes Deployment and Service on the on-prem cluster. It has access to the NAS as well as to the filesystem on the local node, and it has sufficient privilege to open an OpenVPN tunnel, so all torrent traffic goes through NordVPN. The deployment of this webapp is managed at [palpantlab-tranmission](https://gitlab.palpant.us/justin/palpantlab-transmission). In the most recent change I added a public portal to the web UI. Previously the service was only accessible to people with 1) Kubernetes cluster credentials and 2) VPN credentials (which was only me). Now the service is public, but protected with [pusher/oauth2_proxy](https://github.com/pusher/oauth2_proxy) and can only be accessed with a Google email that belongs to the appropriate group. That web UI is accessible at [https://transmission.palpant.us](https://transmission.palpant.us) (but of course the only thing the page shows is a "Sign in with Google" button).
After finding a bug with `pusher/oauth2_proxy` that doesn't correctly handle group membership in Google groups, I'm currently using my own fork of that repo, [justin/oauth2_proxy](https://gitlab.palpant.us/justin/oauth2_proxy) and attempting to PR the fix.
After finding a bug with `pusher/oauth2_proxy` handling of group membership in Google groups, I forked the project [here](https://gitlab.palpant.us/justin/oauth2_proxy), set up CI and builds, and implemented a change to support better group detection. That change has been PRed against the upstream and merged in [PR#224](https://github.com/pusher/oauth2_proxy/pull/224).
# Copyright and License
Copyright 2019 Justin Palpant
......
Subproject commit d89ff5dd3541448aee3c8af64d14bd484319f906
Subproject commit 960331364882e308e1980a2ee5fc8a63d6a81ed0
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment