User Guide¶
Getting started with AWS¶
In this getting started guide, we walk through how to initialise Tarmak with a new Provider (AWS), a new Environment and then provision a Kubernetes cluster. This will comprise of Kubernetes master and worker nodes, etcd clusters, Vault and a bastion node with a public IP address (see Architecture overview for details of cluster components)
Prerequisites¶
- Docker
- An AWS account that has accepted the CentOS licence terms
- A public DNS zone that can be delegated to AWS Route 53
- Optional: Vault with the AWS secret backend configured
Overview of steps to follow¶
Initialise configuration¶
Simply run tarmak init
to initialise configuration for the first time. You
will be prompted for the necessary configuration to set-up a new Provider (AWS) and Environment. The
list below describes the questions you will be asked.
Note
If you are not using Vault’s AWS secret backend, you can authenticate with AWS in the same way as the AWS CLI. More details can be found at Configuring the AWS CLI.
- Configuring a new Provider
- Provider name: must be unique
- Cloud: Amazon (AWS) is the default and only option for now (more clouds to come)
- Credentials: Amazon CLI auth (i.e. env variables/profile) or Vault (optional)
- Name prefix: for state buckets and DynamoDB tables
- Public DNS zone: will be created if not already existing, must be delegated from the root
- Configuring a new Environment
- Environment name: must be unique
- Project name: used for AWS resource labels
- Project administrator mail address
- Cloud region: pick a region fetched from AWS (using Provider credentials)
- Configuring new Cluster(s)
- Single or multi-cluster environment
- Cloud availability zone(s): pick zone(s) fetched from AWS
Once initialised, the configuration will be created at $HOME/.tarmak/tarmak.yaml
(default).
Create an AMI¶
Next we create an AMI for this environment by running tarmak clusters images
build
(this is the step that requires Docker to be installed locally).
% tarmak clusters images build
<output omitted>
Create the cluster¶
To create the cluster, run tarmak clusters apply
.
% tarmak clusters apply
<output omitted>
Warning
The first time this command is run, Tarmak will create a hosted zone and then fail with the following error.
* failed verifying delegation of public zone 5 times, make sure the zone k8s.jetstack.io is delegated to nameservers [ns-100.awsdns-12.com ns-1283.awsdns-32.org ns-1638.awsdns-12.co.uk ns-842.awsdns-41.net]
When creating a multi-cluster environment, the hub cluster must first be
applied . To change the current cluster use the flag --current-cluster
.
See tarmak cluster help
for more information.
You should now change the nameservers of your domain to the four listed in the error. If you only wish to delegate a subdomain containing your zone to AWS without delegating the parent domain see Creating a Subdomain That Uses Amazon Route 53 as the DNS Service without Migrating the Parent Domain.
To complete the cluster provisioning, run tarmak clusters apply
once again.
Note
This process may take 30-60 minutes to complete. You can stop it by sending the signal SIGTERM or SIGINT (Ctrl-C) to the process. Tarmak will not exit immediately. It will wait for the currently running step to finish and then exit. You can complete the process by re-running the command.
Interacting with Kubernetes¶
Once a Kubernetes cluster has been provisioned and it’s state has converged, it
can now be interacted with kubectl
as usual.
% tarmak cluster kubectl get all --all-namespaces
A Kubeconfig file can also be generated with it’s file path output.
% tarmak cluster kubeconfig
This command also supports environment evaluation to write to KUBECONFIG
so
is able to perform conveniences like follows.
% eval $(tarmak cluster kubeconfig)
% kubectl get nodes
% helm install
Note
Both kubectl
and kubeconfig
support targeting public and private
Kubernetes API endpoints according to how the cluster has been configured in
the tarmak yaml configuration. This can be overridden through use of a global
flag.
% tarmak cluster --public-api-endpoint=false kubectl
Destroy the cluster¶
To destroy the cluster, run tarmak clusters destroy
.
% tarmak clusters destroy
<output omitted>
Note
This process may take 30-60 minutes to complete.
You can stop it by sending the signal SIGTERM
or SIGINT
(Ctrl-C) to the process.
Tarmak will not exit immediately.
It will wait for the currently running step to finish and then exit.
You can complete the process by re-running the command.
Configuration Options¶
After generating your tarmak.yaml configuration file there are a number of options you can set that are not exposed via tarmak init.
Note
Kubernetes resources created by Tarmak have their lifecycle managed by
Addon-manager
and as such will be kept synchronised according to the
source manifests found in /etc/kubernetes/apply
on all master instances.
The Addon-manager will watch for these resources with the
addonmanager.kubernetes.io/mode
label. Information on how each mode on
the label behaves can be found in the Kubernetes documentation.
Pod Security Policy¶
Note: For cluster versions greater than 1.8.0 this is applied by default. For cluster versions before 1.6.0 it is not applied.
To enable Pod Security Policy for an environment, include the following in the configuration file under the Kubernetes field of that environment:
kubernetes:
podSecurityPolicy:
enabled: true
(By default, the Tarmak configuration file is stored at
$HOME/.tarmak/tarmak.yaml
).
The PodSecurityPolicy manifests - also listed below - can be found in the
puppet/modules/kubernetes/templates/
directory.
Namespaces¶
By default the restricted Pod Security Policy is applied to all namespaces
except kube-system
and monitoring
namespace.
It is possible to allow other namespaces to use the privileged Pod Security Policy. This can be done by deploying an extra RoleBinding:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: default:privileged
namespace: example
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: psp:privileged
subjects:
- kind: Group
name: system:serviceaccounts:example
apiGroup: rbac.authorization.k8s.io
Cluster Autoscaler¶
Tarmak supports deploying Cluster Autoscaler when spinning up a Kubernetes cluster to autoscale worker instance pools. The following tarmak.yaml snippet shows how you would enable Cluster Autoscaler.
kubernetes:
clusterAutoscaler:
enabled: true
...
The above configuration would deploy Cluster Autoscaler with an image of gcr.io/google_containers/cluster-autoscaler using the recommend version based on the version of your Kubernetes cluster. The configuration block accepts three optional fields of image, version and scaleDownUtilizationThreshold allowing you to change these defaults. Note that the final image tag used when deploying Cluster Autoscaler will be the configured version prepended with the letter v.
The current implementation will configure the first instance pool of type worker in your cluster configuration to scale between minCount and maxCount. We plan to add support for an arbitrary number of worker instance pools.
Overprovisioning¶
Tarmak supports overprovisioning to give a fixed or proportional amount of headroom in the cluster. The technique used to implement overprovisioning is the same as described in the cluster autoscaler documentation. The following tarmak.yaml snippet shows how to configure fixed overprovisioning. Note that cluster autoscaling must also be enabled.
kubernetes:
clusterAutoscaler:
enabled: true
overprovisioning:
enabled: true
reservedMillicoresPerReplica: 100
reservedMegabytesPerReplica: 100
replicaCount: 10
...
This will deploy 10 pause Pods with a negative PriorityClass so that they will be preempted by any other pending Pods. Each Pod will request the specified number of millicores and megabytes. The following tarmak.yaml snippet shows how to configure proportional overprovisioning.
kubernetes:
clusterAutoscaler:
enabled: true
overprovisioning:
enabled: true
reservedMillicoresPerReplica: 100
reservedMegabytesPerReplica: 100
nodesPerReplica: 1
coresPerReplica: 4
...
The nodesPerReplica and coresPerReplica configuration parameters are described in the cluster-proportional-autoscaler documentation.
The image and version used by the cluster-proportional-autoscaler can also be specified using the image and version fields of the overprovisioning block. These values default to k8s.gcr.io/cluster-proportional-autoscaler-amd64 and 1.1.2 respectively.
Logging¶
Each Kubernetes cluster can be configured with a number of logging sinks. The only sink currently supported is Elasticsearch. An example configuration is shown below:
apiVersion: api.tarmak.io/v1alpha1
kind: Config
clusters:
loggingSinks:
- types:
- application
- platform
elasticSearch:
host: example.amazonaws.com
port: 443
logstashPrefix: test
tls: true
tlsVerify: false
httpBasicAuth:
username: administrator
password: mypassword
- types:
- all
elasticSearch:
host: example2.amazonaws.com
port: 443
tls: true
amazonESProxy:
port: 9200
...
A full list of the configuration parameters are shown below:
General configuration parameters
types
- the types of logs to ship. The accepted values are:- platform (kernel, systemd and platform namespace logs)
- application (all other namespaces)
- audit (apiserver audit logs)
- all
- Elasticsearch configuration parameters
host
- IP address or hostname of the target Elasticsearch instanceport
- TCP port of the target Elasticsearch instancelogstashPrefix
- Shipped logs are in a Logstash compatible format. This field specifies the Logstash index prefix *tls
- enable or disable TLS supporttlsVerify
- force certificate validation (only valid when not using the AWS ES Proxy)tlsCA
- Custom CA certificate for Elasticsearch instance (only valid when not using the AWS ES Proxy)httpBasicAuth
- configure basic auth (only valid when not using the AWS ES Proxy)username
password
amazonESProxy
- configure AWS ES Proxyport
- Port to listen on (a free port will be chosen for you if omitted)
Setting up an AWS hosted Elasticsearch Cluster¶
AWS provides a hosted Elasticsearch cluster that can be used for log aggregation. This snippet will setup an Elasticsearch domain in your account and create a policy along with it that will allow shipping of logs into the cluster:
variable "name" {
default = "tarmak-logs"
}
variable "region" {
default = "eu-west-1"
}
provider "aws" {
region = "${var.region}"
}
data "aws_caller_identity" "current" {}
data "aws_iam_policy_document" "es" {
statement {
actions = [
"es:*",
]
principals {
type = "AWS"
identifiers = [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",
]
}
}
}
resource "aws_elasticsearch_domain" "es" {
domain_name = "${var.name}"
elasticsearch_version = "6.2"
cluster_config {
instance_type = "t2.medium.elasticsearch"
}
ebs_options {
ebs_enabled = true
volume_type = "gp2"
volume_size = 30
}
access_policies = "${data.aws_iam_policy_document.es.json}"
}
data "aws_iam_policy_document" "es_shipping" {
statement {
actions = [
"es:ESHttpHead",
"es:ESHttpPost",
"es:ESHttpGet",
]
resources = [
"arn:aws:es:${var.region}:${data.aws_caller_identity.current.account_id}:domain/${var.name}/*",
]
}
}
resource "aws_iam_policy" "es_shipping" {
name = "${var.name}-shipping"
description = "Allows shipping of logs to elasticsearch"
policy = "${data.aws_iam_policy_document.es_shipping.json}"
}
output "elasticsearch_endpoint" {
value = "${aws_elasticsearch_domain.es.endpoint}"
}
output "elasticsearch_shipping_policy_arn" {
value = "${aws_iam_policy.es_shipping.arn}"
}
Once terraform has been successfully run it will output, the resulting AWS Elasticsearch endpoint and the policy that allow shipping to it:
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
Outputs:
elasticsearch_endpoint = search-tarmak-logs-xyz.eu-west-1.es.amazonaws.com
elasticsearch_shipping_policy_arn = arn:aws:iam::1234:policy/tarmak-logs-shipping
Both of those outputs can then be used in the tarmak configuration:
apiVersion: api.tarmak.io/v1alpha1
clusters:
- name: cluster
loggingSinks:
- types: ["all"]
elasticSearch:
host: ${elasticsearch_endpoint}
tls: true
amazonESProxy: {}
amazon:
additionalIAMPolicies:
- ${elasticsearch_shipping_policy_arn}
Configuring Index Templates¶
Fluentbit will publish into a new index everyday. To optimise all of those indices for our logging purpose, it is beneficial to adapt some settings through index templates. We suggest at least considering raising the field limit from the default of 1000. Also depending on the size of the Elasticsearch installation the number of shards and replicas should be adapted. This example here contains suggested settings for a single node setup:
{
"index_patterns": [
"logstash-*"
],
"settings": {
"number_of_shards": 10,
"number_of_replicas": 0,
"mapping": {
"total_fields": {
"limit": "10000"
}
}
}
}
curl -v -XPOST 'localhost:9200/_template/logstash' -H 'Content-Type: application/json' -d @settings.json
EBS Encryption¶
AWS offers encrypted EBS (Elastic Block Storage); however, Encryption of EBS volumes is not enabled by Tarmak by default. When enabled, building the image and applying a Tarmak cluster will take considerably longer.
The following tarmak.yaml snippet shows how to enable encrypted EBS.
clusters:
- amazon:
ebsEncrypted: true
...
OIDC Authentication¶
Tarmak supports authentication using OIDC. The following snippet demonstrates how you would configure OIDC authentication in tarmak.yaml. For details on the configuration options, visit the Kubernetes documentation here. Note that if the version of your cluster is less than 1.10.0, the signingAlgs parameter is ignored.
kubernetes:
apiServer:
oidc:
clientID: 1a2b3c4d5e6f7g8h
groupsClaim: groups
groupsPrefix: "oidc:"
issuerURL: https://domain/application-server
signingAlgs:
- RS256
usernameClaim: preferred_username
usernamePrefix: "oidc:"
...
For the above setup, ID tokens presented to the apiserver will need to contain claims called preferred_username and groups representing the username and groups associated with the client. These values will then be prepended with oidc: before authorisation rules are applied, so it is important that this is taken into account when configuring cluster authorisation.
Jenkins¶
You can install Jenkins as part of your hub. This can be achieved by adding an
extra instance pool to your hub. This instance pool can be extended with an
annotation tarmak.io/jenkins-certificate-arn
. The value of this annotation
will be ARN pointing to an Amazon Certificate. When you set this annotation,
your Jenkins will be secured with HTTPS. You need to make sure your SSL
certificate is valid for jenkins.<environment>.<zone>
.
- image: centos-puppet-agent
maxCount: 1
metadata:
annotations:
tarmak.io/jenkins-certificate-arn: "arn:aws:acm:eu-west-1:228615251467:certificate/81e0c595-f5ad-40b2-8062-683b215bedcf"
creationTimestamp: null
name: jenkins
minCount: 1
size: large
type: jenkins
volumes:
- metadata:
creationTimestamp: null
name: root
size: 16Gi
type: ssd
- metadata:
creationTimestamp: null
name: data
size: 16Gi
type: ssd
...
Dashboard¶
Tarmak supports deploying Kubernetes Dashboard when spinning up a Kubernetes cluster. The following tarmak.yaml snippet shows how you would enable Kubernetes Dashboard.
kubernetes:
dashboard:
enabled: true
...
The above configuration would deploy Kubernetes Dashboard with an image of gcr.io/google_containers/kubernetes-dashboard-amd64 using the recommended version based on the version of your Kubernetes cluster. The configuration block accepts two optional fields of image and version allowing you to change these defaults. Note that the final image tag used when deploying Tiller will be the configured version prepended with the letter v.
Warning
Before Dashboard version 1.7, when RBAC is enabled (from Kubernetes version
1.6) cluster-wide cluster-admin
privileges are granted to Dashboard. From
Dashboard version 1.7, only minimal privileges are granted that allow
Dashboard to work. See Dashboard’s access control documentation for more
details.
Tiller¶
Tarmak supports deploying Tiller, the server-side component of Helm, when spinning up a Kubernetes cluster. Tiller is configured to listen on localhost only which prevents arbitrary Pods in the cluster connecting to its unauthenticated endpoint. Helm clients can still talk to Tiller by port forwarding through the Kubernetes API Server. The following tarmak.yaml snippet shows how you would enable Tiller.
kubernetes:
tiller:
enabled: true
image: gcr.io/kubernetes-helm/tiller
version: 2.9.1
...
The above configuration would deploy version 2.9.1 of Tiller with an image of gcr.io/kubernetes-helm/tiller. The configuration block accepts two optional fields of image and version allowing you to change these defaults. Note that the final image tag used when deploying Tiller will be the configured version prepended with the letter v. The version is particularly important when deploying Tiller since its minor version must match the minor version of any Helm clients.
Warning
Tiller is deployed with the cluster-admin
ClusterRole bound to its
service account and therefore has far reaching privileges. Helm’s security
best practices
should also be considered.
Prometheus¶
By default Tarmak will deploy a Prometheus
installation and some exporters into the monitoring
namespace.
This can be completely disabled with the following cluster configuration:
kubernetes:
prometheus:
enabled: false
Another possibility would be to use the Tarmak provisioned Prometheus only for scraping exporters on instances that are not part of the Kubernetes cluster. Using federation, those metrics could then be integrated into an existing Prometheus deployment.
To have Prometheus only monitor nodes external to the cluster, use the following configuration instead:
kubernetes:
prometheus:
enabled: true
mode: ExternalScrapeTargetsOnly
Finally, you may wish to have Tarmak only install the exporters on the external nodes. If this is your desired configuration, then set the following mode in the yaml:
kubernetes:
prometheus:
enabled: true
mode: ExternalExportersOnly
API Server¶
It is possible to let Tarmak create an public endpoint for your APIserver. This can be used together with Secure public endpoints.
kubernetes:
apiServer:
public: true
Secure public endpoints¶
Public endpoints (Jenkins, bastion host and if enabled apiserver) can be secured by limiting the access to a list of CIDR blocks. This can be configured on a environment level for all public endpoint and if wanted can be overwritten on a specific public endpoint.
Environment level¶
This can be done by adding an adminCIDRs
list to an environments block,
if nothing has been set, the default is 0.0.0.0/0:
environments:
- contact: hello@example.com
location: eu-west-1
metadata:
name: example
privateZone: example.local
project: example-project
provider: aws
adminCIDRs:
- x.x.x.x/32
- y.y.y.y/24
Jenkins and bastion host¶
The environment level can be overwritten for Jenkins and bastion host
by adding allowCIDRs
in the instance pool block:
instancePools:
- image: centos-puppet-agent
allowCIDRs:
- x.x.x.x/32
maxCount: 1
metadata:
name: jenkins
minCount: 1
size: large
type: jenkins
API Server¶
For API server you can overwrite the environment level by adding allowCIDRs
to the kubernetes block.
Warning
For this to work, you need to set your API Server public first.
kubernetes:
apiServer:
public: true
allowCIDRs:
- y.y.y.y/24
API Server Admission Plugins¶
API admission control plugins can be enabled and disabled through the Tarmak configuration file. Information on admission plugins can be found here. Enabled and disabled plugins can be configured as follows:
kubernetes:
apiServer:
enableAdmissionControllers:
- "DefaultStorageClass"
- "DefaultTolerationSeconds"
disableAdmissionControllers:
- "MutatingAdmissionWebhook"
Note
Disabling admission control plugins is only available with Kubernetes version 1.11+
Unless one or more enabled admission controller has been defined, the following defaults will be applied in this order:
Admission Controller Plugin | Minimum Requirement |
Initializers | v1.8.x |
NamespaceLifecycle | none |
LimitRanger | none |
ServiceAccount | none |
DefaultStorageClass | v1.4.x |
DefaultTolerationSeconds | v1.6.x |
MutatingAdmissionWebhook | v1.9.x |
ValidatingAdmissionWebhook | v1.9.x |
ResourceQuota | none |
PodSecurityPolicy | Pod Security Policy Enabled |
NodeRestriction | v1.8.x |
Priority | Pod Priority Enabled |
Additional IAM policies¶
Additional IAM policies can be added by adding those ARNs to the tarmak.yaml
config. You can add additional IAM policies to the cluster
and
instance pool
blocks. When you define additional IAM policies on both
levels, they will be merged when applied to a specific instance pool.
Cluster¶
You can add additional IAM policies that will be added to all the instance pools of the whole cluster.
apiVersion: api.tarmak.io/v1alpha1
clusters:
- amazon:
additionalIAMPolicies:
- "arn:aws:iam::xxxxxxx:policy/policy_name"
Instance pool¶
It is possible to add extra policies to only a specific instance pool.
- image: centos-puppet-agent
amazon:
additionalIAMPolicies:
- "arn:aws:iam::xxxxxxx:policy/policy_name"
maxCount: 3
metadata:
name: worker
minCount: 3
size: medium
subnets:
- metadata:
zone: eu-west-1a
- metadata:
zone: eu-west-1b
- metadata:
zone: eu-west-1c
type: worker
Node Taints & Labels¶
You might have added additional instance pools for a specific workload. In these cases it might be useful to label and or taint the nodes in this instance pool.
You add labels and taints in the tarmak yaml like this:
- image: centos-puppet-agent
maxCount: 3
metadata:
name: worker
minCount: 3
size: medium
type: worker
labels:
- key: "ssd"
value: "true"
taints:
- key: "gpu"
value: "gtx1170"
effect: "NoSchedule"
Note, these are only applied when the node is first registered. Changes to these values will not remove taints and labels from nodes that are already registered.
API Server ELB Access Logs¶
Tarmak features storing access logs of the internal and public, if enabled, API server ELB. This is achieved through enabling configuration options in the tarmak.yaml. You must specify at least the S3 bucket name with options to also specify the bucket prefix and interval of 5 or 60 minutes. Interval defaults to 5 minutes.
kubernetes:
apiServer:
public: true
amazon:
internalELBAccessLogs:
bucket: cluster-internal-accesslogs
publicELBAccessLogs:
bucket: cluster-public-accesslogs
Note that the S3 bucket needs to exist in the same region, with the correct S3 policy permissions. Information on how to correctly set these permissions can be found here.
Instance store¶
Certain AWS instance types have support for instance store.
It is possible to configure instance store for containers with Tarmak by
specifying an instance type that has instance store) capabilities
(e.g c5d.xlarge
) in the size parameter of your instance pool. Also
make sure you don’t define any volumes with the name docker
.
instancePools:
- image: centos-puppet-agent
maxCount: 3
metadata:
name: worker
minCount: 3
size: c5d.xlarge
subnets:
zone: eu-west-1a
- metadata:
creationTimestamp: null
zone: eu-west-1b
- metadata:
creationTimestamp: null
zone: eu-west-1c
type: worker
volumes:
- metadata:
name: root
size: 16Gi
type: ssd
Configuration of vault-helper¶
It is possible to configure the download URL of vault-helper to something other
than the default github store. To make this cluster wide change, add the custom
URL to the url
attribute, under the vaultHelper
header in the cluster:
clusters:
- environment: env
vaultHelper:
url: https://example.com/custom_vault-helper_location
Feature Gates¶
Feature gates can be enabled or disabled on Kubernetes components through the Tarmak configuration file. The feature gates can be set on the API server, Kubelet, Scheduler, Controller Manager and Kube-Proxy and will take effect cluster wide. Feature gates can also be set globally to all components however, note that any matching component feature gate set will override the global value on that component. To change feature gates, each component takes a list that will be applied to their corresponding command line flags or configuration file under the Kubernetes code block like following:
kubernetes:
globalFeatureGates:
AllAlpha: true
apiServer:
featureGates:
APIResponseCompression: false
kubelet:
featureGates:
CustomPodDNS: true
CPUManager: false
kubeProxy:
featureGates:
CSIPersistentVolume: false
DebugContainers: true
controllerManager:
featureGates:
AttachVolumeLimit: true
scheduler:
featureGates:
CPUManager: false
Calico Backend¶
By default Tarmak will deploy Calico into your Kubernetes cluster, configured to use etcd as the backend. Calico also supports using the Kubernetes API server instead, which can be configured by changing the Calico option in the Tarmak config like the following:
kubernetes:
calico:
backend: kubernetes
enableTypha: true
typhaReplicas: 1
This change will take effect cluster wide.
Calico also supports using Typha, a purpose built, fan-out daemon to reduce load on the targeted data store. More information can be found on it’s project page.
Note
Typha should only typically be enabled when your Kubernetes node count exceeds 50.
Enabling Typha, along with setting the number of replicas is shown above.
Cluster Services¶
Deploying Grafana, InfluxDB and Heapster can be toggled in the Tarmak configuration like follows:
kubernetes:
grafana:
enabled: true
heapster:
enabled: false
influxDB:
enabled: true
Grafana¶
Grafana is deployed as part of Tarmak. You can access Grafana through a Kubernetes cluster service. Do the following steps to access Grafana:
- Create a proxy
tarmak kubectl proxy
- In the browser go to
http://127.0.0.1:8001/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy/