User Guide

Getting started with AWS

In this getting started guide, we walk through how to initialise Tarmak with a new Provider (AWS), a new Environment and then provision a Kubernetes cluster. This will comprise of Kubernetes master and worker nodes, etcd clusters, Vault and a bastion node with a public IP address (see Architecture overview for details of cluster components)

Prerequisites

Initialise configuration

Simply run tarmak init to initialise configuration for the first time. You will be prompted for the necessary configuration to set-up a new Provider (AWS) and Environment. The list below describes the questions you will be asked.

Note

If you are not using Vault’s AWS secret backend, you can authenticate with AWS in the same way as the AWS CLI. More details can be found at Configuring the AWS CLI.

  • Configuring a new Provider
    • Provider name: must be unique
    • Cloud: Amazon (AWS) is the default and only option for now (more clouds to come)
    • Credentials: Amazon CLI auth (i.e. env variables/profile) or Vault (optional)
    • Name prefix: for state buckets and DynamoDB tables
    • Public DNS zone: will be created if not already existing, must be delegated from the root
  • Configuring a new Environment
    • Environment name: must be unique
    • Project name: used for AWS resource labels
    • Project administrator mail address
    • Cloud region: pick a region fetched from AWS (using Provider credentials)
  • Configuring new Cluster(s)
    • Single or multi-cluster environment
    • Cloud availability zone(s): pick zone(s) fetched from AWS

Once initialised, the configuration will be created at $HOME/.tarmak/tarmak.yaml (default).

Create an AMI

Next we create an AMI for this environment by running tarmak clusters images build (this is the step that requires Docker to be installed locally).

% tarmak clusters images build
<output omitted>

Create the cluster

To create the cluster, run tarmak clusters apply.

% tarmak clusters apply
<output omitted>

Warning

The first time this command is run, Tarmak will create a hosted zone and then fail with the following error.

* failed verifying delegation of public zone 5 times, make sure the zone k8s.jetstack.io is delegated to nameservers [ns-100.awsdns-12.com ns-1283.awsdns-32.org ns-1638.awsdns-12.co.uk ns-842.awsdns-41.net]

When creating a multi-cluster environment, the hub cluster must first be applied . To change the current cluster use the flag --current-cluster. See tarmak cluster help for more information.

You should now change the nameservers of your domain to the four listed in the error. If you only wish to delegate a subdomain containing your zone to AWS without delegating the parent domain see Creating a Subdomain That Uses Amazon Route 53 as the DNS Service without Migrating the Parent Domain.

To complete the cluster provisioning, run tarmak clusters apply once again.

Note

This process may take 30-60 minutes to complete. You can stop it by sending the signal SIGTERM or SIGINT (Ctrl-C) to the process. Tarmak will not exit immediately. It will wait for the currently running step to finish and then exit. You can complete the process by re-running the command.

Interacting with Kubernetes

Once a Kubernetes cluster has been provisioned and it’s state has converged, it can now be interacted with kubectl as usual.

% tarmak cluster kubectl get all --all-namespaces

A Kubeconfig file can also be generated with it’s file path output.

% tarmak cluster kubeconfig

This command also supports environment evaluation to write to KUBECONFIG so is able to perform conveniences like follows.

% eval $(tarmak cluster kubeconfig)
% kubectl get nodes
% helm install

Note

Both kubectl and kubeconfig support targeting public and private Kubernetes API endpoints according to how the cluster has been configured in the tarmak yaml configuration. This can be overridden through use of a global flag.

% tarmak cluster --public-api-endpoint=false kubectl

Destroy the cluster

To destroy the cluster, run tarmak clusters destroy.

% tarmak clusters destroy
<output omitted>

Note

This process may take 30-60 minutes to complete. You can stop it by sending the signal SIGTERM or SIGINT (Ctrl-C) to the process. Tarmak will not exit immediately. It will wait for the currently running step to finish and then exit. You can complete the process by re-running the command.

Configuration Options

After generating your tarmak.yaml configuration file there are a number of options you can set that are not exposed via tarmak init.

Note

Kubernetes resources created by Tarmak have their lifecycle managed by Addon-manager and as such will be kept synchronised according to the source manifests found in /etc/kubernetes/apply on all master instances. The Addon-manager will watch for these resources with the addonmanager.kubernetes.io/mode label. Information on how each mode on the label behaves can be found in the Kubernetes documentation.

Pod Security Policy

Note: For cluster versions greater than 1.8.0 this is applied by default. For cluster versions before 1.6.0 it is not applied.

To enable Pod Security Policy for an environment, include the following in the configuration file under the Kubernetes field of that environment:

kubernetes:
    podSecurityPolicy:
        enabled: true

(By default, the Tarmak configuration file is stored at $HOME/.tarmak/tarmak.yaml).

The PodSecurityPolicy manifests - also listed below - can be found in the puppet/modules/kubernetes/templates/ directory.

Namespaces

By default the restricted Pod Security Policy is applied to all namespaces except kube-system and monitoring namespace.

It is possible to allow other namespaces to use the privileged Pod Security Policy. This can be done by deploying an extra RoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: default:privileged
  namespace: example
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: psp:privileged
subjects:
- kind: Group
  name: system:serviceaccounts:example
  apiGroup: rbac.authorization.k8s.io

Cluster Autoscaler

Tarmak supports deploying Cluster Autoscaler when spinning up a Kubernetes cluster to autoscale worker instance pools. The following tarmak.yaml snippet shows how you would enable Cluster Autoscaler.

kubernetes:
  clusterAutoscaler:
    enabled: true
...

The above configuration would deploy Cluster Autoscaler with an image of gcr.io/google_containers/cluster-autoscaler using the recommend version based on the version of your Kubernetes cluster. The configuration block accepts three optional fields of image, version and scaleDownUtilizationThreshold allowing you to change these defaults. Note that the final image tag used when deploying Cluster Autoscaler will be the configured version prepended with the letter v.

The current implementation will configure the first instance pool of type worker in your cluster configuration to scale between minCount and maxCount. We plan to add support for an arbitrary number of worker instance pools.

Overprovisioning

Tarmak supports overprovisioning to give a fixed or proportional amount of headroom in the cluster. The technique used to implement overprovisioning is the same as described in the cluster autoscaler documentation. The following tarmak.yaml snippet shows how to configure fixed overprovisioning. Note that cluster autoscaling must also be enabled.

kubernetes:
  clusterAutoscaler:
    enabled: true
    overprovisioning:
      enabled: true
      reservedMillicoresPerReplica: 100
      reservedMegabytesPerReplica: 100
      replicaCount: 10
...

This will deploy 10 pause Pods with a negative PriorityClass so that they will be preempted by any other pending Pods. Each Pod will request the specified number of millicores and megabytes. The following tarmak.yaml snippet shows how to configure proportional overprovisioning.

kubernetes:
  clusterAutoscaler:
    enabled: true
    overprovisioning:
      enabled: true
      reservedMillicoresPerReplica: 100
      reservedMegabytesPerReplica: 100
      nodesPerReplica: 1
      coresPerReplica: 4
...

The nodesPerReplica and coresPerReplica configuration parameters are described in the cluster-proportional-autoscaler documentation.

The image and version used by the cluster-proportional-autoscaler can also be specified using the image and version fields of the overprovisioning block. These values default to k8s.gcr.io/cluster-proportional-autoscaler-amd64 and 1.1.2 respectively.

Logging

Each Kubernetes cluster can be configured with a number of logging sinks. The only sink currently supported is Elasticsearch. An example configuration is shown below:

apiVersion: api.tarmak.io/v1alpha1
kind: Config
clusters:
  loggingSinks:
  - types:
    - application
    - platform
    elasticSearch:
      host: example.amazonaws.com
      port: 443
      logstashPrefix: test
      tls: true
      tlsVerify: false
      httpBasicAuth:
        username: administrator
        password: mypassword
  - types:
    - all
    elasticSearch:
      host: example2.amazonaws.com
      port: 443
      tls: true
      amazonESProxy:
        port: 9200
...

A full list of the configuration parameters are shown below:

  • General configuration parameters

    • types - the types of logs to ship. The accepted values are:

      • platform (kernel, systemd and platform namespace logs)
      • application (all other namespaces)
      • audit (apiserver audit logs)
      • all
  • Elasticsearch configuration parameters
    • host - IP address or hostname of the target Elasticsearch instance

    • port - TCP port of the target Elasticsearch instance

    • logstashPrefix - Shipped logs are in a Logstash compatible format. This field specifies the Logstash index prefix * tls - enable or disable TLS support

    • tlsVerify - force certificate validation (only valid when not using the AWS ES Proxy)

    • tlsCA - Custom CA certificate for Elasticsearch instance (only valid when not using the AWS ES Proxy)

    • httpBasicAuth - configure basic auth (only valid when not using the AWS ES Proxy)

      • username
      • password
    • amazonESProxy - configure AWS ES Proxy

      • port - Port to listen on (a free port will be chosen for you if omitted)

Setting up an AWS hosted Elasticsearch Cluster

AWS provides a hosted Elasticsearch cluster that can be used for log aggregation. This snippet will setup an Elasticsearch domain in your account and create a policy along with it that will allow shipping of logs into the cluster:

variable "name" {
  default = "tarmak-logs"
}

variable "region" {
  default = "eu-west-1"
}

provider "aws" {
  region = "${var.region}"
}

data "aws_caller_identity" "current" {}

data "aws_iam_policy_document" "es" {
  statement {
    actions = [
      "es:*",
    ]

    principals {
      type = "AWS"

      identifiers = [
        "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",
      ]
    }
  }
}

resource "aws_elasticsearch_domain" "es" {
  domain_name           = "${var.name}"
  elasticsearch_version = "6.2"

  cluster_config {
    instance_type = "t2.medium.elasticsearch"
  }

  ebs_options {
    ebs_enabled = true
    volume_type = "gp2"
    volume_size = 30
  }

  access_policies = "${data.aws_iam_policy_document.es.json}"
}

data "aws_iam_policy_document" "es_shipping" {
  statement {
    actions = [
      "es:ESHttpHead",
      "es:ESHttpPost",
      "es:ESHttpGet",
    ]

    resources = [
      "arn:aws:es:${var.region}:${data.aws_caller_identity.current.account_id}:domain/${var.name}/*",
    ]
  }
}

resource "aws_iam_policy" "es_shipping" {
  name        = "${var.name}-shipping"
  description = "Allows shipping of logs to elasticsearch"

  policy = "${data.aws_iam_policy_document.es_shipping.json}"
}

output "elasticsearch_endpoint" {
  value = "${aws_elasticsearch_domain.es.endpoint}"
}

output "elasticsearch_shipping_policy_arn" {
  value = "${aws_iam_policy.es_shipping.arn}"
}

Once terraform has been successfully run it will output, the resulting AWS Elasticsearch endpoint and the policy that allow shipping to it:

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:

elasticsearch_endpoint = search-tarmak-logs-xyz.eu-west-1.es.amazonaws.com
elasticsearch_shipping_policy_arn = arn:aws:iam::1234:policy/tarmak-logs-shipping

Both of those outputs can then be used in the tarmak configuration:

apiVersion: api.tarmak.io/v1alpha1
clusters:
- name: cluster
  loggingSinks:
  - types: ["all"]
    elasticSearch:
      host: ${elasticsearch_endpoint}
      tls: true
      amazonESProxy: {}
  amazon:
    additionalIAMPolicies:
    - ${elasticsearch_shipping_policy_arn}

Configuring Index Templates

Fluentbit will publish into a new index everyday. To optimise all of those indices for our logging purpose, it is beneficial to adapt some settings through index templates. We suggest at least considering raising the field limit from the default of 1000. Also depending on the size of the Elasticsearch installation the number of shards and replicas should be adapted. This example here contains suggested settings for a single node setup:

{
  "index_patterns": [
    "logstash-*"
  ],
  "settings": {
    "number_of_shards": 10,
    "number_of_replicas": 0,
    "mapping": {
      "total_fields": {
        "limit": "10000"
      }
    }
  }
}
curl -v -XPOST 'localhost:9200/_template/logstash' -H 'Content-Type: application/json' -d @settings.json

EBS Encryption

AWS offers encrypted EBS (Elastic Block Storage); however, Encryption of EBS volumes is not enabled by Tarmak by default. When enabled, building the image and applying a Tarmak cluster will take considerably longer.

The following tarmak.yaml snippet shows how to enable encrypted EBS.

clusters:
- amazon:
    ebsEncrypted: true
...

OIDC Authentication

Tarmak supports authentication using OIDC. The following snippet demonstrates how you would configure OIDC authentication in tarmak.yaml. For details on the configuration options, visit the Kubernetes documentation here. Note that if the version of your cluster is less than 1.10.0, the signingAlgs parameter is ignored.

kubernetes:
    apiServer:
        oidc:
            clientID: 1a2b3c4d5e6f7g8h
            groupsClaim: groups
            groupsPrefix: "oidc:"
            issuerURL: https://domain/application-server
            signingAlgs:
            - RS256
            usernameClaim: preferred_username
            usernamePrefix: "oidc:"
...

For the above setup, ID tokens presented to the apiserver will need to contain claims called preferred_username and groups representing the username and groups associated with the client. These values will then be prepended with oidc: before authorisation rules are applied, so it is important that this is taken into account when configuring cluster authorisation.

Jenkins

You can install Jenkins as part of your hub. This can be achieved by adding an extra instance pool to your hub. This instance pool can be extended with an annotation tarmak.io/jenkins-certificate-arn. The value of this annotation will be ARN pointing to an Amazon Certificate. When you set this annotation, your Jenkins will be secured with HTTPS. You need to make sure your SSL certificate is valid for jenkins.<environment>.<zone>.

- image: centos-puppet-agent
  maxCount: 1
  metadata:
    annotations:
      tarmak.io/jenkins-certificate-arn: "arn:aws:acm:eu-west-1:228615251467:certificate/81e0c595-f5ad-40b2-8062-683b215bedcf"
    creationTimestamp: null
    name: jenkins
  minCount: 1
  size: large
  type: jenkins
  volumes:
  - metadata:
      creationTimestamp: null
      name: root
    size: 16Gi
    type: ssd
  - metadata:
      creationTimestamp: null
      name: data
    size: 16Gi
    type: ssd
...

Dashboard

Tarmak supports deploying Kubernetes Dashboard when spinning up a Kubernetes cluster. The following tarmak.yaml snippet shows how you would enable Kubernetes Dashboard.

kubernetes:
  dashboard:
    enabled: true
...

The above configuration would deploy Kubernetes Dashboard with an image of gcr.io/google_containers/kubernetes-dashboard-amd64 using the recommended version based on the version of your Kubernetes cluster. The configuration block accepts two optional fields of image and version allowing you to change these defaults. Note that the final image tag used when deploying Tiller will be the configured version prepended with the letter v.

Warning

Before Dashboard version 1.7, when RBAC is enabled (from Kubernetes version 1.6) cluster-wide cluster-admin privileges are granted to Dashboard. From Dashboard version 1.7, only minimal privileges are granted that allow Dashboard to work. See Dashboard’s access control documentation for more details.

Tiller

Tarmak supports deploying Tiller, the server-side component of Helm, when spinning up a Kubernetes cluster. Tiller is configured to listen on localhost only which prevents arbitrary Pods in the cluster connecting to its unauthenticated endpoint. Helm clients can still talk to Tiller by port forwarding through the Kubernetes API Server. The following tarmak.yaml snippet shows how you would enable Tiller.

kubernetes:
  tiller:
    enabled: true
    image: gcr.io/kubernetes-helm/tiller
    version: 2.9.1
...

The above configuration would deploy version 2.9.1 of Tiller with an image of gcr.io/kubernetes-helm/tiller. The configuration block accepts two optional fields of image and version allowing you to change these defaults. Note that the final image tag used when deploying Tiller will be the configured version prepended with the letter v. The version is particularly important when deploying Tiller since its minor version must match the minor version of any Helm clients.

Warning

Tiller is deployed with the cluster-admin ClusterRole bound to its service account and therefore has far reaching privileges. Helm’s security best practices should also be considered.

Prometheus

By default Tarmak will deploy a Prometheus installation and some exporters into the monitoring namespace.

This can be completely disabled with the following cluster configuration:

kubernetes:
  prometheus:
    enabled: false

Another possibility would be to use the Tarmak provisioned Prometheus only for scraping exporters on instances that are not part of the Kubernetes cluster. Using federation, those metrics could then be integrated into an existing Prometheus deployment.

To have Prometheus only monitor nodes external to the cluster, use the following configuration instead:

kubernetes:
  prometheus:
    enabled: true
    mode: ExternalScrapeTargetsOnly

Finally, you may wish to have Tarmak only install the exporters on the external nodes. If this is your desired configuration, then set the following mode in the yaml:

kubernetes:
  prometheus:
    enabled: true
    mode: ExternalExportersOnly

API Server

It is possible to let Tarmak create an public endpoint for your APIserver. This can be used together with Secure public endpoints.

kubernetes:
  apiServer:
    public: true

Secure public endpoints

Public endpoints (Jenkins, bastion host and if enabled apiserver) can be secured by limiting the access to a list of CIDR blocks. This can be configured on a environment level for all public endpoint and if wanted can be overwritten on a specific public endpoint.

Environment level

This can be done by adding an adminCIDRs list to an environments block, if nothing has been set, the default is 0.0.0.0/0:

environments:
- contact: hello@example.com
  location: eu-west-1
  metadata:
    name: example
  privateZone: example.local
  project: example-project
  provider: aws
  adminCIDRs:
  - x.x.x.x/32
  - y.y.y.y/24

Jenkins and bastion host

The environment level can be overwritten for Jenkins and bastion host by adding allowCIDRs in the instance pool block:

instancePools:
- image: centos-puppet-agent
  allowCIDRs:
  - x.x.x.x/32
  maxCount: 1
  metadata:
    name: jenkins
  minCount: 1
  size: large
  type: jenkins

API Server

For API server you can overwrite the environment level by adding allowCIDRs to the kubernetes block.

Warning

For this to work, you need to set your API Server public first.

kubernetes:
  apiServer:
      public: true
      allowCIDRs:
      - y.y.y.y/24

API Server Admission Plugins

API admission control plugins can be enabled and disabled through the Tarmak configuration file. Information on admission plugins can be found here. Enabled and disabled plugins can be configured as follows:

kubernetes:
  apiServer:
    enableAdmissionControllers:
       - "DefaultStorageClass"
       - "DefaultTolerationSeconds"
    disableAdmissionControllers:
       - "MutatingAdmissionWebhook"

Note

Disabling admission control plugins is only available with Kubernetes version 1.11+

Unless one or more enabled admission controller has been defined, the following defaults will be applied in this order:

Admission Controller Plugin Minimum Requirement
Initializers v1.8.x
NamespaceLifecycle none
LimitRanger none
ServiceAccount none
DefaultStorageClass v1.4.x
DefaultTolerationSeconds v1.6.x
MutatingAdmissionWebhook v1.9.x
ValidatingAdmissionWebhook v1.9.x
ResourceQuota none
PodSecurityPolicy Pod Security Policy Enabled
NodeRestriction v1.8.x
Priority Pod Priority Enabled

Additional IAM policies

Additional IAM policies can be added by adding those ARNs to the tarmak.yaml config. You can add additional IAM policies to the cluster and instance pool blocks. When you define additional IAM policies on both levels, they will be merged when applied to a specific instance pool.

Cluster

You can add additional IAM policies that will be added to all the instance pools of the whole cluster.

apiVersion: api.tarmak.io/v1alpha1
clusters:
- amazon:
    additionalIAMPolicies:
    - "arn:aws:iam::xxxxxxx:policy/policy_name"

Instance pool

It is possible to add extra policies to only a specific instance pool.

- image: centos-puppet-agent
  amazon:
    additionalIAMPolicies:
    - "arn:aws:iam::xxxxxxx:policy/policy_name"
  maxCount: 3
  metadata:
    name: worker
  minCount: 3
  size: medium
  subnets:
  - metadata:
    zone: eu-west-1a
  - metadata:
    zone: eu-west-1b
  - metadata:
    zone: eu-west-1c
  type: worker

Node Taints & Labels

You might have added additional instance pools for a specific workload. In these cases it might be useful to label and or taint the nodes in this instance pool.

You add labels and taints in the tarmak yaml like this:

- image: centos-puppet-agent
  maxCount: 3
  metadata:
    name: worker
  minCount: 3
  size: medium
  type: worker
  labels:
  - key: "ssd"
    value: "true"
  taints:
  - key: "gpu"
    value: "gtx1170"
    effect: "NoSchedule"

Note, these are only applied when the node is first registered. Changes to these values will not remove taints and labels from nodes that are already registered.

API Server ELB Access Logs

Tarmak features storing access logs of the internal and public, if enabled, API server ELB. This is achieved through enabling configuration options in the tarmak.yaml. You must specify at least the S3 bucket name with options to also specify the bucket prefix and interval of 5 or 60 minutes. Interval defaults to 5 minutes.

kubernetes:
  apiServer:
    public: true
    amazon:
      internalELBAccessLogs:
        bucket: cluster-internal-accesslogs
      publicELBAccessLogs:
        bucket: cluster-public-accesslogs

Note that the S3 bucket needs to exist in the same region, with the correct S3 policy permissions. Information on how to correctly set these permissions can be found here.

Instance store

Certain AWS instance types have support for instance store. It is possible to configure instance store for containers with Tarmak by specifying an instance type that has instance store) capabilities (e.g c5d.xlarge) in the size parameter of your instance pool. Also make sure you don’t define any volumes with the name docker.

instancePools:
- image: centos-puppet-agent
  maxCount: 3
  metadata:
    name: worker
  minCount: 3
  size: c5d.xlarge
  subnets:
    zone: eu-west-1a
  - metadata:
      creationTimestamp: null
    zone: eu-west-1b
  - metadata:
      creationTimestamp: null
    zone: eu-west-1c
  type: worker
  volumes:
  - metadata:
      name: root
    size: 16Gi
    type: ssd

Configuration of vault-helper

It is possible to configure the download URL of vault-helper to something other than the default github store. To make this cluster wide change, add the custom URL to the url attribute, under the vaultHelper header in the cluster:

clusters:
- environment: env
  vaultHelper:
    url: https://example.com/custom_vault-helper_location

Feature Gates

Feature gates can be enabled or disabled on Kubernetes components through the Tarmak configuration file. The feature gates can be set on the API server, Kubelet, Scheduler, Controller Manager and Kube-Proxy and will take effect cluster wide. Feature gates can also be set globally to all components however, note that any matching component feature gate set will override the global value on that component. To change feature gates, each component takes a list that will be applied to their corresponding command line flags or configuration file under the Kubernetes code block like following:

kubernetes:
  globalFeatureGates:
    AllAlpha: true
  apiServer:
    featureGates:
      APIResponseCompression: false
  kubelet:
    featureGates:
      CustomPodDNS: true
      CPUManager: false
  kubeProxy:
    featureGates:
      CSIPersistentVolume: false
      DebugContainers: true
  controllerManager:
    featureGates:
      AttachVolumeLimit: true
  scheduler:
    featureGates:
      CPUManager: false

Calico Backend

By default Tarmak will deploy Calico into your Kubernetes cluster, configured to use etcd as the backend. Calico also supports using the Kubernetes API server instead, which can be configured by changing the Calico option in the Tarmak config like the following:

kubernetes:
 calico:
   backend: kubernetes
   enableTypha: true
   typhaReplicas: 1

This change will take effect cluster wide.

Calico also supports using Typha, a purpose built, fan-out daemon to reduce load on the targeted data store. More information can be found on it’s project page.

Note

Typha should only typically be enabled when your Kubernetes node count exceeds 50.

Enabling Typha, along with setting the number of replicas is shown above.

Cluster Services

Deploying Grafana, InfluxDB and Heapster can be toggled in the Tarmak configuration like follows:

kubernetes:
  grafana:
    enabled: true
  heapster:
    enabled: false
  influxDB:
    enabled: true

Grafana

Grafana is deployed as part of Tarmak. You can access Grafana through a Kubernetes cluster service. Do the following steps to access Grafana:

  1. Create a proxy
tarmak kubectl proxy
  1. In the browser go to
http://127.0.0.1:8001/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy/