Kubernetes1.11.1集群部署手册
1 环境概述1.1 操作系统1.2组件版本1.3插件1.4镜像仓库1.5主要配置策略1.5.1kube-apiserver高可用1.5.2kube-controller-manager高可用1.5.3kube-scheduler高可用1.5.4kubelet1.5.5kube-proxy1.5.6集群插件2 安装部署2.1 节点规划2.1 OS初始化2.1.1升级内核2.1.2安装依赖包和repo文件2.1.3 加载ipvs相关内核模块2.1.4关闭 NUMA2.1.5关闭防火墙2.1.6 关闭Selinux2.1.7关闭swap2.1.8优化内核参数2.1.9关闭无关的服务2.1.10设置 rsyslogd 和 systemd journald 2.2k8s通用配置2.2.1安装CFSSL2.2.2创建CA证书和秘钥2.2.3部署 kubectl 命令行工具2.3部署apiserver高可用2.3.1基本配置2.3.2安装haproxy+keeplived2.4部署etcd集群2.4.1基本配置2.4.2安装etcd2.5部署master集群2.5.1基本配置2.5.2部署kube-apiserver2.5.3部署kube-controller-manager2.5.4部署kube-scheduler2.6部署 node节点2.6.1基本配置2.6.2部署 flannel 网络2.6.3部署 docker2.6.4部署 kubelet2.6.4部署 kube-proxy2.7验证集群功能2.8部署集群插件2.8.1部署 coredns 插件A.浏览器访问 kube-apiserver 安全端口
注意:安装Docker需要内核版本在4.4.x以上。
CentOS系统最好选择7.4-7.7,Centos7.4之前的版本安装docker会无法使用overlay2为docker的默认存储引擎。
| OS版本 | kernel内核 |
|---|---|
| CentOS 7.7.1908 | 4.4.198 |
| 组件 | 版本 |
|---|---|
| Kubernetes | 1.16.2 |
| Docker | 19.03.5 |
| Etcd | 3.4.3 |
| Flanneld | 0.11.0 |
搭建Kubernetes集群环境有以下三种方式:
不能用于生产环境。不推荐用在生产环境,但是可以通过学习这种部署方法来体会一些官方推荐的kubernetes最佳实践的设计和思想。kubeadm的目标是提供一个最小可用的可以通过Kubernetes一致性测试的集群,所以并不会安装任何除此之外的非必须的addon。kubeadm默认情况下并不会安装一个网络解决方案,所以用kubeadm安装完之后,需要自己来安装一个网络的插件。所以说,目前的kubeadm是不能用于生产环境的
可用于生产方式部署。生产环境双master高可用,数据库最少3台,奇数增加。node节点最少2个,可以用云服务器SLB进行负载均衡,也可以使用nginx+keepa高可用。
| 主机名 | IP地址 | 角色 | 组件 |
|---|---|---|---|
| k8s-master-01 | 192.168.209.101 | master | kube-apiserver、kube-controller-manager、kube-scheduler |
| k8s-master-02 | 192.168.209.102 | master | kube-apiserver、kube-controller-manager、kube-scheduler |
| k8s-master-03 | 192.168.209.103 | master | kube-apiserver、kube-controller-manager、kube-scheduler |
| k8s-node-01 | 192.168.209.121 | node | kubelet、kube-proxy、docker |
| k8s-node-02 | 192.168.209.122 | node | kubelet、kube-proxy、docker |
| k8s-node-03 | 192.168.209.123 | node | docker、kubelet、kube-proxy |
| k8s-etcd-01 | 192.168.209.111 | db | etcd |
| k8s-etcd-02 | 192.168.209.112 | db | etcd |
| k8s-etcd-03 | 192.168.209.113 | db | etcd |
| k8s-lb-01 | 192.168.209.98 | api-ha | nginx、keepalived |
| k8s-lb-02 | 192.168.209.99 | api-ha | nginx、keepalived |
| k8s-lb-vip | 192.168.209.100 |

CentOS 7.x 系统自带的 3.10.x 内核存在一些 Bugs,导致运行的 Docker、Kubernetes不稳定,例如:
1. 高版本的 docker(1.13 以后) 启用了 3.10 kernel 实验支持的 kernel memory account 功能(无法关闭),当节点压力大如频繁启动和停止容器时会导致 cgroup memory leak;
2. 网络设备引用计数泄漏,会导致类似于报错:"kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1";
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpmxxxxxxxxxxyum -y updateyum --enablerepo=elrepo-kernel install -y kernel-ltxxxxxxxxxxgrub2-set-default 0rebootxyum install -y lrzsz vim wget curl curl-devel zip unzip telnet ftp tree screen lsof tcpdump expect expect-devel p7zip p7zip-plugins convmv ntp ntpdate net-tools man.x86_64 man-pages.noarch bash-completion lvm2 conntrack ipvsadm ipset jq iptables sysstat libseccomp yum-utils device-mapper-persistent-data wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repowget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repoyum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo或wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repocat > /etc/yum.repos.d/kubernetes.repo <<EOF[kubernetes]name=Kubernetesbaseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1gpgcheck=1repo_gpgcheck=1gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpgEOFxxxxxxxxxxyum clean all && yum -y makecache && yum -y updatexxxxxxxxxxyum list | grep kubernetes | sort -ryum list | grep docker-ce | sort -r xxxxxxxxxxcat >> /etc/profile <<EOFalias vi='vim'alias grep='grep --colour=auto'EOFsource /etc/profilentpdate 0.centos.pool.ntp.org && hwclock -w && hwclock --systohcsed -i "s/^#UseDNS yes/UseDNS no/" /etc/ssh/sshd_configsystemctl restart sshduseradd opspasswd opsmkdir -p /appchown -R ops.ops /appsed -i '$a ops ALL=(ALL) ALL\nops ALL=(ALL) NOPASSWD: ALL' /etc/sudoers
xxxxxxxxxxmodprobe -- ip_vsmodprobe -- ip_vs_rrmodprobe -- ip_vs_wrrmodprobe -- ip_vs_shmodprobe -- nf_conntrack_ipv4xxxxxxxxxxcat > /etc/sysconfig/modules/ipvs.modules <<EOFmodprobe -- ip_vsmodprobe -- ip_vs_rrmodprobe -- ip_vs_wrrmodprobe -- ip_vs_shmodprobe -- nf_conntrack_ipv4EOFxxxxxxxxxxlsmod | grep ip_vsxxxxxxxxxxcp /etc/default/grub{,.bak} sed -i "s/quiet/quiet ipv6.disable=1 numa=off/" /etc/default/grubxxxxxxxxxxcp /boot/grub2/grub.cfg{,.bak}diff /etc/default/grub.bak /etc/default/grubgrub2-mkconfig -o /boot/grub2/grub.cfgxxxxxxxxxxsystemctl stop firewalld.servicexxxxxxxxxxsystemctl disable firewalld.servicexxxxxxxxxxiptables -F && iptables -X && iptables -F -t nat && iptables -X -t natxxxxxxxxxxiptables -P FORWARD ACCEPTxxxxxxxxxxsetenforce 0xxxxxxxxxxsed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/configxxxxxxxxxxswapoff -axxxxxxxxxxsed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstabxxxxxxxxxxcp /etc/security/limits.conf{,.org}cat >> /etc/security/limits.conf << EOF* soft nofile 655350* hard nofile 655350* soft nproc unlimited* hard nproc unlimitedEOFcat >> /etc/rc.local<< EOFulimit -SHn 655350ulimit -SHu unlimitedulimit -SHs unlimitedEOFcp /etc/sysctl.conf{,.org}cat >> /etc/sysctl.conf << EOFkernel.msgmnb = 65536kernel.msgmax = 65536kernel.shmmax = 68719476736kernel.shmall = 4294967296EOF/sbin/sysctl -pcat > /etc/sysctl.d/k8s.conf <<EOFnet.bridge.bridge-nf-call-iptables=1net.bridge.bridge-nf-call-ip6tables=1net.ipv4.ip_forward=1net.ipv4.tcp_tw_recycle=0vm.swappiness=0vm.overcommit_memory=1vm.panic_on_oom=0fs.inotify.max_user_instances=8192fs.inotify.max_user_watches=1048576fs.file-max=52706963fs.nr_open=52706963#net.netfilter.nf_conntrack_max=2310720EOFxxxxxxxxxxsysctl -p /etc/sysctl.d/k8s.confxxxxxxxxxxsystemctl stop postfix && systemctl disable postfixjournald 默认将日志转发给 rsyslog,这会导致日志写了多份,/var/log/messages 中包含了太多无关日志,不方便后续查看,同时也影响系统性能。
xxxxxxxxxxmkdir /var/log/journalmkdir /etc/systemd/journald.conf.dcat > /etc/systemd/journald.conf.d/99-prophet.conf <<EOF[Journal]# 持久化保存到磁盘Storage=persistent# 压缩历史日志Compress=yesSyncIntervalSec=5mRateLimitInterval=30sRateLimitBurst=1000# 最大占用空间 2GSystemMaxUse=2G# 单日志文件最大 100MSystemMaxFileSize=100M# 日志保存时间 2 周MaxRetentionSec=2week# 不将日志转发到 syslogForwardToSyslog=noEOFsystemctl restart systemd-journaldxxxxxxxxxxmkdir -p /app/{opt,bin,etc,cert,data,logs,wal}xxxxxxxxxxwget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64xxxxxxxxxxcp cfssl_linux-amd64 /app/bin/cfsslcp cfssljson_linux-amd64 /app/bin/cfssljsoncp cfssl-certinfo_linux-amd64 /app/bin/cfssl-certinfochmod +x /app/bin/*xxxxxxxxxxcat >> /etc/profile <<EOFexport PATH=\$PATH:/app/binEOFsource /etc/profile
为确保安全,kubernetes 系统各组件需要使用 x509 证书对通信进行加密和认证。 CA (Certificate Authority) 是自签名的根证书,用来签名后续创建的其它证书。 本文档使用 CloudFlare 的 PKI 工具集 fssl创建所有证书。
CA 证书是集群所有节点共享的,只需要创建一个 CA 证书,后续创建的所有证书都由它签名。
CA 配置文件用于配置根证书的使用场景 (profile) 和具体参数 (usage,过期时间、服务端认证、客户端认证、加密等),后续在签名其它证书时需要指定特定场景。
xxxxxxxxxxcat > /app/cert/ca-config.json <<EOF{ "signing": { "default": { "expiry": "876000h" }, "profiles": { "kubernetes": { "usages": [ "signing", "key encipherment", "server auth", "client auth" ], "expiry": "876000h" } } }}EOFsigning:表示该证书可用于签名其它证书,生成的 ca.pem 证书中 CA=TRUE; server auth:表示 client 可以用该该证书对 server 提供的证书进行验证; client auth:表示 server 可以用该该证书对 client 提供的证书进行验证;
xxxxxxxxxxcat > /app/cert/ca-csr.json <<EOF{ "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "System" } ], "ca": { "expiry": "876000h" }}EOFCN:Common Name,kube-apiserver 从证书中提取该字段作为请求的用户名 (User Name),浏览器使用该字段验证网站是否合法; O:Organization,kube-apiserver 从证书中提取该字段作为请求用户所属的组 (Group); kube-apiserver 将提取的 User、Group 作为 RBAC 授权的用户标识;
xxxxxxxxxxcd /app/certcfssl gencert -initca ca-csr.json | cfssljson -bare call ca*-rw-r--r-- 1 root root 294 Nov 5 17:23 ca-config.json-rw-r--r-- 1 root root 1001 Nov 5 17:23 ca.csr-rw-r--r-- 1 root root 246 Nov 5 17:23 ca-csr.json-rw------- 1 root root 1679 Nov 5 17:23 ca-key.pem-rw-r--r-- 1 root root 1363 Nov 5 17:23 ca.pemrm -rf ca.csr ca-csr.jsonkubectl 默认从 ~/.kube/config 文件读取 kube-apiserver 地址和认证信息,如果没有配置,执行 kubectl 命令时可能会出错:
xxxxxxxxxx$ kubectl get podsThe connection to the server localhost:8080 was refused - did you specify the right host or port?本文档只需要部署一次,生成的 kubeconfig 文件是通用的,可以拷贝到需要执行 kubectl 命令的机器,重命名为 ~/.kube/config;
下载和解压:
xxxxxxxxxxcd /app/optwget https://dl.k8s.io/v1.16.2/kubernetes-client-linux-amd64.tar.gztar -xzf kubernetes-client-linux-amd64.tar.gzcp /app/opt/kubernetes/client/bin/kubectl /app/binchmod +x /app/bin/kubectlkubectl version将二进制文件拷贝到所有节点:
xxxxxxxxxxfor node_ip in k8s-base-02 k8s-base-03 do echo ">>> ${node_ip}" scp /app/opt/kubernetes/client/bin/kubectl root@${node_ip}:/app/bin ssh root@${node_ip} "chmod +x /app/bin/kubectl && source /etc/profile && kubectl version && kubectl version" donekubectl 与 apiserver https 安全端口通信,apiserver 对提供的证书进行认证和授权。
kubectl 作为集群的管理工具,需要被授予最高权限,这里创建具有最高权限的 admin 证书。
后面只有apiserver和kubelet这两个服务启动参数会用到admin的ca证书;kubectl工具和kubelet服务不是一回事。
创建证书签名请求:
xxxxxxxxxxcd /app/certcat > /app/cert/admin-csr.json <<EOF{"CN": "admin","hosts": [],"key": {"algo": "rsa","size": 2048},"names": [{"C": "CN","ST": "BeiJing","L": "BeiJing","O": "system:masters","OU": "System"}]}EOF
O 为
system:masters,kube-apiserver 收到该证书后将请求的 Group 设置为 system:masters;预定义的 ClusterRoleBinding
cluster-admin将 Groupsystem:masters与 Rolecluster-admin绑定,该 Role 授予所有 API的权限;该证书只会被 kubectl 当做 client 证书使用,所以 hosts 字段为空;
生成证书和私钥:
xxxxxxxxxxcfssl gencert -ca=/app/cert/ca.pem \ -ca-key=/app/cert/ca-key.pem \ -config=/app/cert/ca-config.json \ -profile=kubernetes admin-csr.json | cfssljson -bare adminls -al admin*-rw-r--r-- 1 root root 1009 Nov 11 12:12 admin.csr-rw-r--r-- 1 root root 229 Nov 11 12:12 admin-csr.json-rw------- 1 root root 1679 Nov 11 12:12 admin-key.pem-rw-r--r-- 1 root root 1403 Nov 11 12:12 admin.pemrm -rf admin.csr admin-csr.jsonkubeconfig 为 kubectl 的配置文件,包含访问 apiserver 的所有信息,如 apiserver 地址、CA 证书和自身使用的证书;
xxxxxxxxxxcd /app/etc# 设置集群参数kubectl config set-cluster kubernetes \ --certificate-authority=/app/cert/ca.pem \ --embed-certs=true \ --server=https://192.168.209.100:8443\ --kubeconfig=kubectl.kubeconfig# 设置客户端认证参数kubectl config set-credentials admin \ --client-certificate=/app/cert/admin.pem \ --client-key=/app/cert/admin-key.pem \ --embed-certs=true \ --kubeconfig=kubectl.kubeconfig# 设置上下文参数kubectl config set-context kubernetes \ --cluster=kubernetes \ --user=admin \ --kubeconfig=kubectl.kubeconfig# 设置默认上下文kubectl config use-context kubernetes --kubeconfig=kubectl.kubeconfigcp kubectl.kubeconfig ~/.kube/config--certificate-authority:验证 kube-apiserver 证书的根证书;
--serve:kube-apiserver的地址,如果apiserer前有负载均衡,则是VIP地址;
--client-certificate、--client-key:刚生成的 admin 证书和私钥,连接 kube-apiserver 时使用;
--embed-certs=true:将 ca.pem 和 admin.pem 证书内容嵌入到生成的 kubectl.kubeconfig 文件中(不加时,写入的是证书文件路径,后续拷贝 kubeconfig 到其它机器时,还需要单独拷贝证书文件,不方便。);
分发到所有使用 kubectl 命令的节点:
xxxxxxxxxxfor node_ip in k8s-base-02 k8s-base-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p ~/.kube" scp kubectl.kubeconfig root@${node_ip}:~/.kube/config done保存的文件名为 ~/.kube/config;
xxxxxxxxxxhostnamectl --static set-hostname k8s-lb-01cat >> /etc/hosts <<EOF192.168.209.98 k8s-lb-01192.168.209.99 k8s-lb-02EOFmkdir -p /app/{opt,bin,etc,logs}xxxxxxxxxxssh-keygen -t rsa -P "" -f ~/.ssh/id_rsafor node_ip in k8s-lb-01 k8s-lb-02do expect -c " spawn ssh-copy-id -i /root/.ssh/id_rsa.pub root@${node_ip} expect { \"*yes/no*\" {send \"yes\r\"; exp_continue} \"*password*\" {send \"hello\r\"; exp_continue} \"*Password*\" {send \"hello\r\";} } "donexxxxxxxxxxssh k8s-lb-02 "hostnamectl --static set-hostname k8s-lb-02 && mkdir -p /app/{opt,bin,etc,logs}"xxxxxxxxxxssh k8s-lb-02 "cat >> /etc/hosts <<EOF192.168.209.98 k8s-lb-01192.168.209.99 k8s-lb-02EOF"
xxxxxxxxxxfor node_ip in k8s-lb-01 k8s-lb-02 do echo ">>> ${node_ip}" ssh root@${node_ip} "yum -y install haproxy keepalived && rpm -qa|grep haproxy && rpm -qa|grep keepalived" donehaproxy配置文件修改:
xxxxxxxxxxmv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orgcat > /etc/haproxy/haproxy.cfg <<EOFglobal maxconn 2000 ulimit-n 16384 log 127.0.0.1 local0 err stats timeout 30sdefaults log global mode http option httplog timeout connect 5000 timeout client 50000 timeout server 50000 timeout http-request 15s timeout http-keep-alive 15sfrontend monitor-in bind *:33305 mode http option httplog monitor-uri /monitorlisten stats bind *:8006 mode http stats enable stats hide-version stats uri /stats stats refresh 30s stats realm Haproxy\ Statistics stats auth admin:adminfrontend k8s-api bind 0.0.0.0:8443 bind 127.0.0.1:8443 mode tcp option tcplog tcp-request inspect-delay 5s default_backend k8s-apibackend k8s-api mode tcp option tcplog option tcp-check balance roundrobin default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100 server k8s-api-1 192.168.209.101:6443 check server k8s-api-2 192.168.209.102:6443 check server k8s-api-3 192.168.209.103:6443 checkEOFkeeplived配置文件修改:
xxxxxxxxxxmv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.orgcat > /etc/keepalived/keepalived.conf <<EOFglobal_defs { notification_email { root@localhost } notification_email_from keepalived@localhost smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id s1.keepalived #vrrp_mcast_group4 224.0.100.19 vrrp_skip_check_adv_addr #vrrp_strict #将严格遵守vrrp协议这一项关闭,否则会因为不是组播而无法启动keepalived vrrp_iptables vrrp_garp_interval 0 vrrp_gna_interval 0}vrrp_script haproxy-check { script "/bin/bash /app/bin/check_haproxy.sh" interval 3 weight -2 fall 10 rise 2}vrrp_instance haproxy-vip { state MASTER priority 100 interface ens32 virtual_router_id 47 advert_int 3 authentication { auth_type PASS auth_pass 1111 } unicast_src_ip 192.168.209.98 unicast_peer { 192.168.209.99 } virtual_ipaddress { 192.168.209.100 } track_script { haproxy-check }}EOFxxxxxxxxxxcat > /app/bin/check_haproxy.sh <<EOFerrorExit() { echo "*** $*" 1>&2 exit 1}if ip addr | grep -q 192.168.209.100 ; then curl -s --max-time 2 --insecure https://192.168.209.100:8443/ -o /dev/null || errorExit "Error GET https://192.168.209.100:8443/"fiEOFchmod +x /app/bin/check_haproxy.sh分发keeplived及haproxy文件给所有lb
xxxxxxxxxxfor node_ip in k8s-lb-02 do echo ">>> ${node_ip}" scp /etc/haproxy/haproxy.cfg root@${node_ip}:/etc/haproxy scp /etc/keepalived/keepalived.conf root@${node_ip}:/etc/keepalived scp /app/bin/check_haproxy.sh root@${node_ip}:/app/bin ssh root@${node_ip} "sed -i 's#s1.keepalived#s2.keepalived#g' /etc/keepalived/keepalived.conf" ssh root@${node_ip} "sed -i 's#MASTER#BACKUP#g' /etc/keepalived/keepalived.conf" ssh root@${node_ip} "sed -i 's#priority 100#priority 80#g' /etc/keepalived/keepalived.conf" ssh root@${node_ip} "sed -i 's#virtual_router_id 47#virtual_router_id 27#g' /etc/keepalived/keepalived.conf" ssh root@${node_ip} "sed -i 's#192.168.209.99#192.168.209.98#g' /etc/keepalived/keepalived.conf" ssh root@${node_ip} "sed -i 's#unicast_src_ip 192.168.209.98#unicast_src_ip 192.168.209.99#g' /etc/keepalived/keepalived.conf" done启动lb节点服务:
xxxxxxxxxxfor node_ip in k8s-lb-01 k8s-lb-02 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl enable --now haproxy keepalived && systemctl restart haproxy keepalived && systemctl status haproxy keepalived && ps -ef | grep haproxy && ps -ef | grep keepalived && netstat -tnulp" done验证
http://192.168.209.98:33305/monitor
http://192.168.209.99:33305/monitor
http://192.168.209.98:8006/stats
http://192.168.209.99:8006/stats
xxxxxxxxxxhostnamectl --static set-hostname k8s-etcd-01cat >> /etc/hosts <<EOF192.168.209.111 k8s-etcd-01192.168.209.112 k8s-etcd-02192.168.209.113 k8s-etcd-03EOFxxxxxxxxxxssh-keygen -t rsa -P "" -f ~/.ssh/id_rsafor node_ip in k8s-etcd-01 k8s-etcd-02 k8s-etcd-03;doexpect -c "spawn ssh-copy-id -i /root/.ssh/id_rsa.pub root@${node_ip} expect { \"*yes/no*\" {send \"yes\r\"; exp_continue} \"*password*\" {send \"hello\r\"; exp_continue} \"*Password*\" {send \"hello\r\";} } "donexxxxxxxxxxssh k8s-etcd-02 "hostnamectl --static set-hostname k8s-etcd-02" &&ssh k8s-etcd-03 "hostnamectl --static set-hostname k8s-etcd-03"xxxxxxxxxxssh k8s-etcd-02 "cat >> /etc/hosts <<EOF192.168.209.111 k8s-etcd-01192.168.209.112 k8s-etcd-02192.168.209.113 k8s-etcd-03EOF" &&ssh k8s-etcd-03 "cat >> /etc/hosts <<EOF192.168.209.111 k8s-etcd-01192.168.209.112 k8s-etcd-02192.168.209.113 k8s-etcd-03EOF"
etcd 是基于 Raft 的分布式 key-value 存储系统,由 CoreOS 开发,常用于服务发现、共享配置以及并发控制(如 leader 选举、分布式锁等)。kubernetes 使用 etcd 存储所有运行数据。
到 etcd 的 release 页面 下载最新版本的发布包:
xxxxxxxxxxcd /app/optwget https://github.com/coreos/etcd/releases/download/v3.4.3/etcd-v3.4.3-linux-amd64.tar.gztar -xzf etcd-v3.4.3-linux-amd64.tar.gz分发二进制文件到集群所有节点:
xxxxxxxxxxfor node_ip in k8s-etcd-01 k8s-etcd-02 k8s-etcd-03 do echo ">>> ${node_ip}" scp etcd-v3.4.3-linux-amd64/etcd* root@${node_ip}:/app/bin ssh root@${node_ip} "chmod +x /app/bin/etcd* && source /etc/profile && etcd --version && etcdctl version" done创建证书签名请求:
xxxxxxxxxxcat > /app/cert/etcd-csr.json <<EOF{ "CN": "etcd", "hosts": [ "127.0.0.1", "192.168.209.111", "192.168.209.112", "192.168.209.113" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "System" } ]}EOFhosts 字段指定授权使用该证书的 etcd 节点 IP 或域名列表,需要将 etcd 集群的三个节点 IP 都列在其中;
生成证书和私钥:
xxxxxxxxxxcd /app/certcfssl gencert -ca=/app/cert/ca.pem \ -ca-key=/app/cert/ca-key.pem \ -config=/app/cert/ca-config.json \ -profile=kubernetes etcd-csr.json | cfssljson -bare etcd ll etcd* -rw-r--r-- 1 root root 1062 Nov 6 10:10 etcd.csr-rw-r--r-- 1 root root 305 Nov 6 10:10 etcd-csr.json-rw------- 1 root root 1679 Nov 6 10:10 etcd-key.pem-rw-r--r-- 1 root root 1436 Nov 6 10:10 etcd.pemrm -rf etcd.csr etcd-csr.json分发生成的证书和私钥到各 etcd 节点:
xxxxxxxxxxfor node_ip in k8s-etcd-02 k8s-etcd-03 do echo ">>> ${node_ip}" scp etcd*.pem root@${node_ip}:/app/cert done设置etcd配置文件
配置文件路径为/etc/etcd/etcd.config.yml,参考官方 etcd.conf.yml
xxxxxxxxxxmkdir /app/walcat > /app/etc/etcd.conf.yml <<EOFname: k8s-etcd-01data-dir: /app/datawal-dir: /app/walsnapshot-count: 10000heartbeat-interval: 100election-timeout: 1000quota-backend-bytes: 0listen-peer-urls: https://192.168.209.111:2380listen-client-urls: https://192.168.209.111:2379,http://127.0.0.1:2379max-snapshots: 5max-wals: 5cors: initial-advertise-peer-urls: https://192.168.209.111:2380advertise-client-urls: https://192.168.209.111:2379discovery: discovery-fallback: proxydiscovery-proxy: discovery-srv: initial-cluster: k8s-etcd-01=https://192.168.209.111:2380,k8s-etcd-02=https://192.168.209.112:2380,k8s-etcd-03=https://192.168.209.113:2380initial-cluster-token: k8s-etcd-clusterinitial-cluster-state: newstrict-reconfig-check: falseenable-v2: trueenable-pprof: trueclient-transport-security: ca-file: /app/cert/ca.pem cert-file: /app/cert/etcd.pem key-file: /app/cert/etcd-key.pem client-cert-auth: true trusted-ca-file: /app/cert/ca.pem auto-tls: truepeer-transport-security: ca-file: /app/cert/ca.pem cert-file: /app/cert/etcd.pem key-file: /app/cert/etcd-key.pem peer-client-cert-auth: true trusted-ca-file: /app/cert/ca.pem auto-tls: truelogger: zaplog-output: log-level: infoforce-new-cluster: falseEOFdata-dir 指定节点的数据存储目录,这些数据包括节点ID,集群ID,集群初始化配置,Snapshot文件,若未指定--wal-dir,还会存储WAL文件;
wal-dir 指定节点的was文件的存储目录,若指定了该参数,wal文件会和其他数据文件分开存储;
name:指定节点名称,当 initial-cluster-state 值为 new 时,name 的参数值必须位于 initial-cluster 列表中;
cert-file、key-file:etcd server 与 client 通信时使用的证书和私钥;
trusted-ca-file:签名 client 证书的 CA 证书,用于验证 client 证书;
peer-cert-file、peer-key-file:etcd 与 peer 通信使用的证书和私钥;
peer-trusted-ca-file:签名 peer 证书的 CA 证书,用于验证 peer 证书;
xxxxxxxxxxcat > /usr/lib/systemd/system/etcd.service <<EOF[Unit]Description=Etcd ServiceAfter=network.target [Service]Type=notifyExecStart=/app/bin/etcd --config-file=/app/etc/etcd.conf.ymlRestart=on-failureRestartSec=10 [Install]WantedBy=multi-user.targetEOFxxxxxxxxxxfor node_ip in k8s-etcd-02 k8s-etcd-03 do echo ">>> ${node_ip}" scp /app/etc/etcd.conf.yml root@${node_ip}:/app/etc/etcd.conf.yml scp /usr/lib/systemd/system/etcd.service root@${node_ip}:/usr/lib/systemd/system/etcd.service ssh root@k8s-etcd-02 "sed -i 's#name: k8s-etcd-01#name: k8s-etcd-02#g' /app/etc/etcd.conf.yml" ssh root@k8s-etcd-02 "sed -i "s/192.168.209.111/192.168.209.112/g" /app/etc/etcd.conf.yml" ssh root@k8s-etcd-02 "sed -i "s#k8s-etcd-01=https:\/\/192.168.209.112#k8s-etcd-01=https:\/\/192.168.209.111#g" /app/etc/etcd.conf.yml" ssh root@k8s-etcd-03 "sed -i 's#name: k8s-etcd-01#name: k8s-etcd-03#g' /app/etc/etcd.conf.yml" ssh root@k8s-etcd-03 "sed -i "s/192.168.209.111/192.168.209.113/g" /app/etc/etcd.conf.yml" ssh root@k8s-etcd-03 "sed -i "s#k8s-etcd-01=https:\/\/192.168.209.113#k8s-etcd-01=https:\/\/192.168.209.111#g" /app/etc/etcd.conf.yml" donexxxxxxxxxxfor node_ip in k8s-etcd-01 k8s-etcd-02 k8s-etcd-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd && systemctl status etcd" donexxxxxxxxxxfor node_ip in k8s-etcd-01 k8s-etcd-02 k8s-etcd-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "ps -ef | grep etcd && netstat -tnulp | grep etcd && lsof -i:2379" donexxxxxxxxxxfor node_ip in 192.168.209.111 192.168.209.112 192.168.209.113 do echo ">>> ${node_ip}" ETCDCTL_API=3 /app/bin/etcdctl \ --endpoints=https://${node_ip}:2379 \ --cacert=/app/cert/ca.pem \ --cert=/app/cert/etcd.pem \ --key=/app/cert/etcd-key.pem endpoint health done xxxxxxxxxx>>> 192.168.209.111https://192.168.209.111:2379 is healthy: successfully committed proposal: took = 8.18993ms>>> 192.168.209.112https://192.168.209.112:2379 is healthy: successfully committed proposal: took = 94.727521ms>>> 192.168.209.113https://192.168.209.113:2379 is healthy: successfully committed proposal: took = 55.996259ms
xxxxxxxxxxexport etcd_endpoints=https://192.168.209.111:2379,https://192.168.209.112:2379,https://192.168.209.113:2379ETCDCTL_API=3 /app/bin/etcdctl \ -w table --cacert=/app/cert/ca.pem \ --cert=/app/cert/etcd.pem \ --key=/app/cert/etcd-key.pem \ --endpoints=${etcd_endpoints} endpoint status+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ |ENDPOINT|ID|VERSION |DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.209.111:2379 | 1f7b20c64341f02d | 3.4.3 | 20 kB | true | false | 301 | 12 | 12 | | | https://192.168.209.112:2379 | 67399fad7b454529 | 3.4.3 | 20 kB | false | false | 301 | 12 | 12 | | | https://192.168.209.113:2379 | 5345e9a1fbc2ee1c | 3.4.3 | 20 kB | false | false | 301 | 12 | 12 | | +------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
当前的 leader 为192.168.209.111。
xxxxxxxxxxhostnamectl --static set-hostname k8s-master-01cat >> /etc/hosts <<EOF192.168.209.101 k8s-master-01192.168.209.102 k8s-master-02192.168.209.103 k8s-master-03EOFxxxxxxxxxxssh-keygen -t rsa -P "" -f ~/.ssh/id_rsafor node_ip in k8s-master-01 k8s-master-02 k8s-master-03do expect -c " spawn ssh-copy-id -i /root/.ssh/id_rsa.pub root@${node_ip} expect { \"*yes/no*\" {send \"yes\r\"; exp_continue} \"*password*\" {send \"hello\r\"; exp_continue} \"*Password*\" {send \"hello\r\";} } "donexxxxxxxxxxssh k8s-master-02 "hostnamectl --static set-hostname k8s-master-02" &&ssh k8s-master-03 "hostnamectl --static set-hostname k8s-master-03"xxxxxxxxxxssh k8s-master-02 "cat >> /etc/hosts <<EOF192.168.209.101 k8s-master-01192.168.209.102 k8s-master-02192.168.209.103 k8s-master-03EOF" &&ssh k8s-master-03 "cat >> /etc/hosts <<EOF192.168.209.101 k8s-master-01192.168.209.102 k8s-master-02192.168.209.103 k8s-master-03EOF"
从 CHANGELOG 页面 下载二进制 tar 文件并解压:
xxxxxxxxxxcd /app/optwget https://dl.k8s.io/v1.16.2/kubernetes-server-linux-amd64.tar.gztar -xzf kubernetes-server-linux-amd64.tar.gz将二进制文件拷贝到所有 master 节点:
xxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/opt/kubernetes/server/bin/{kube-apiserver,kube-controller-manager,kube-scheduler} root@${node_ip}:/app/bin ssh root@${node_ip} "chmod +x /app/bin/kube* && source /etc/profile && kube-apiserver --version && kube-controller-manager --version && kube-scheduler --version" done创建证书签名请求:
xxxxxxxxxxcat > /app/cert/k8s-csr.json <<EOF{ "CN": "kubernetes", "hosts": [ "127.0.0.1", "192.168.209.101", "192.168.209.102", "192.168.209.103", "192.168.209.100", "192.168.209.98", "192.168.209.99", "10.254.0.1", "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local." ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "System" } ]}EOFhosts 字段指定授权使用该证书的 IP 和域名列表,这里列出了 master 节点 IP、kubernetes 服务的 IP 和域名。如果master做负载均衡,还需增加lb的IP和VIP;
kubernetes 服务 IP 是 apiserver 自动创建的,一般是 --service-cluster-ip-range 参数指定的网段的第一个IP,后续可以通过下面命令获取:
xxxxxxxxxx$ kubectl get svc kubernetesNAME CLUSTER-IP EXTERNAL-IP PORT(S) AGEkubernetes 10.254.0.1 <none> 443/TCP 1d生成证书和私钥:
xxxxxxxxxxcfssl gencert -ca=/app/cert/ca.pem \ -ca-key=/app/cert/ca-key.pem \ -config=/app/cert/ca-config.json \ -profile=kubernetes k8s-csr.json | cfssljson -bare k8s rm -rf k8s.csr k8s-csr.json将生成的证书和私钥文件拷贝到所有 master 节点:
xxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/cert/k8s*.pem root@${node_ip}:/app/cert ssh root@${node_ip} "ls -alh /app/cert/k8s*" donexxxxxxxxxxcat > /app/etc/encryption-config.yaml <<EOFkind: EncryptionConfigapiVersion: v1resources: - resources: - secrets providers: - aescbc: keys: - name: key1 secret: O92ZTxpbTdkMbdZNt6H7m6befP4EMvkDbNWVQCnIq1c= - identity: {}EOFsecret通过命令:head -c 32 /dev/urandom | base64
将加密配置文件拷贝到 master 节点
xxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/etc/encryption-config.yaml root@${node_ip}:/app/etc ssh root@${node_ip} "ls -alh /app/etc/encryption-config.yaml" donexxxxxxxxxxcat > /app/etc/audit-policy.yaml <<EOFapiVersion: audit.k8s.io/v1beta1kind: Policyrules: # The following requests were manually identified as high-volume and low-risk, so drop them. - level: None resources: - group: "" resources: - endpoints - services - services/status users: - 'system:kube-proxy' verbs: - watch - level: None resources: - group: "" resources: - nodes - nodes/status userGroups: - 'system:nodes' verbs: - get - level: None namespaces: - kube-system resources: - group: "" resources: - endpoints users: - 'system:kube-controller-manager' - 'system:kube-scheduler' - 'system:serviceaccount:kube-system:endpoint-controller' verbs: - get - update - level: None resources: - group: "" resources: - namespaces - namespaces/status - namespaces/finalize users: - 'system:apiserver' verbs: - get # Don't log HPA fetching metrics. - level: None resources: - group: metrics.k8s.io users: - 'system:kube-controller-manager' verbs: - get - list # Don't log these read-only URLs. - level: None nonResourceURLs: - '/healthz*' - /version - '/swagger*' # Don't log events requests. - level: None resources: - group: "" resources: - events # node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes - level: Request omitStages: - RequestReceived resources: - group: "" resources: - nodes/status - pods/status users: - kubelet - 'system:node-problem-detector' - 'system:serviceaccount:kube-system:node-problem-detector' verbs: - update - patch - level: Request omitStages: - RequestReceived resources: - group: "" resources: - nodes/status - pods/status userGroups: - 'system:nodes' verbs: - update - patch # deletecollection calls can be large, don't log responses for expected namespace deletions - level: Request omitStages: - RequestReceived users: - 'system:serviceaccount:kube-system:namespace-controller' verbs: - deletecollection # Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data, # so only log at the Metadata level. - level: Metadata omitStages: - RequestReceived resources: - group: "" resources: - secrets - configmaps - group: authentication.k8s.io resources: - tokenreviews # Get repsonses can be large; skip them. - level: Request omitStages: - RequestReceived resources: - group: "" - group: admissionregistration.k8s.io - group: apiextensions.k8s.io - group: apiregistration.k8s.io - group: apps - group: authentication.k8s.io - group: authorization.k8s.io - group: autoscaling - group: batch - group: certificates.k8s.io - group: extensions - group: metrics.k8s.io - group: networking.k8s.io - group: policy - group: rbac.authorization.k8s.io - group: scheduling.k8s.io - group: settings.k8s.io - group: storage.k8s.io verbs: - get - list - watch # Default level for known APIs - level: RequestResponse omitStages: - RequestReceived resources: - group: "" - group: admissionregistration.k8s.io - group: apiextensions.k8s.io - group: apiregistration.k8s.io - group: apps - group: authentication.k8s.io - group: authorization.k8s.io - group: autoscaling - group: batch - group: certificates.k8s.io - group: extensions - group: metrics.k8s.io - group: networking.k8s.io - group: policy - group: rbac.authorization.k8s.io - group: scheduling.k8s.io - group: settings.k8s.io - group: storage.k8s.io # Default level for all other requests. - level: Metadata omitStages: - RequestReceivedEOF分发审计策略文件:
xxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/etc/audit-policy.yaml root@${node_ip}:/app/etc ssh root@${node_ip} "ls -alh /app/etc/audit-policy.yaml " done创建证书签名请求:
xxxxxxxxxxcat > proxy-client-csr.json <<EOF{ "CN": "aggregator", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "System" } ]}EOF CN 名称需要位于 kube-apiserver 的 --requestheader-allowed-names 参数中,否则后续访问 metrics 时会提示权限不足。
生成证书和私钥:
xxxxxxxxxxcfssl gencert -ca=/app/cert/ca.pem \ -ca-key=/app/cert/ca-key.pem \ -config=/app/cert/ca-config.json \ -profile=kubernetes proxy-client-csr.json | cfssljson -bare proxy-client保留pem证书,将其他文件删除或移动
xxxxxxxxxxrm -rf proxy-client.csr proxy-client-csr.jsonls -alh proxy-client* 将生成的证书和私钥文件拷贝到所有 master 节点:
xxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/cert/proxy-client*.pem root@${node_ip}:/app/cert ssh root@${node_ip} "ls -alh /app/cert/proxy-client*.pem" donexxxxxxxxxx将ca.pem、etcd.pem、etcd-key.pem 上传到master-01节点的 /app/cert目录下。for node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/cert/etcd*.pem root@${node_ip}:/app/cert ssh root@${node_ip} "ls -alh /app/cert/etcd*.pem" done
xxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /app/logs/kube-apiserver" donecat >/app/etc/kube-apiserver <<EOFKUBE_APISERVER_OPTS="--advertise-address=192.168.209.101 \\ --default-not-ready-toleration-seconds=360 \\ --default-unreachable-toleration-seconds=360 \\ --feature-gates=DynamicAuditing=true \\ --max-mutating-requests-inflight=2000 \\ --max-requests-inflight=4000 \\ --default-watch-cache-size=200 \\ --delete-collection-workers=2 \\ --encryption-provider-config=/app/etc/encryption-config.yaml \\ --etcd-cafile=/app/cert/etcd-ca.pem \\ --etcd-certfile=/app/cert/etcd.pem \\ --etcd-keyfile=/app/cert/etcd-key.pem \\ --etcd-servers=https://192.168.209.111:2379,https://192.168.209.112:2379,https://192.168.209.113:2379 \\ --bind-address=192.168.209.101 \\ --secure-port=6443 \\ --tls-cert-file=/app/cert/k8s.pem \\ --tls-private-key-file=/app/cert/k8s-key.pem \\ --insecure-port=0 \\ --audit-dynamic-configuration \\ --audit-log-maxage=15 \\ --audit-log-maxbackup=3 \\ --audit-log-maxsize=100 \\ --audit-log-truncate-enabled \\ --audit-log-path=/app/logs/kube-apiserver/audit.log \\ --audit-policy-file=/app/etc/audit-policy.yaml \\ --client-ca-file=/app/cert/ca.pem \\ --enable-bootstrap-token-auth \\ --requestheader-allowed-names=aggregator \\ --requestheader-client-ca-file=/app/cert/ca.pem \\ --requestheader-extra-headers-prefix=X-Remote-Extra- \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-username-headers=X-Remote-User \\ --service-account-key-file=/app/cert/ca.pem \\ --authorization-mode=Node,RBAC \\ --anonymous-auth=false \\ --runtime-config=api/all=true \\ --enable-admission-plugins=NodeRestriction \\ --allow-privileged=true \\ --apiserver-count=3 \\ --event-ttl=168h \\ --kubelet-certificate-authority=/app/cert/ca.pem \\ --kubelet-client-certificate=/app/cert/k8s.pem \\ --kubelet-client-key=/app/cert/k8s-key.pem \\ --kubelet-https=true \\ --kubelet-timeout=20s \\ --proxy-client-cert-file=/app/cert/proxy-client.pem \\ --proxy-client-key-file=/app/cert/proxy-client-key.pem \\ --service-cluster-ip-range=10.254.0.0/16 \\ --enable-aggregator-routing=true \\ --log-dir=/app/logs/kube-apiserver \\ --logtostderr=false \\ --v=2 "EOF--advertise-address:apiserver 对外通告的 IP(kubernetes 服务后端节点 IP);
--default-*-toleration-seconds:设置节点异常相关的阈值;
--max-*-requests-inflight:请求相关的最大阈值;
--etcd-*:访问 etcd 的证书和 etcd 服务器地址;
--experimental-encryption-provider-config:指定用于加密 etcd 中 secret 的配置;
--bind-address: https 监听的 IP,不能为 127.0.0.1,否则外界不能访问它的安全端口 6443;
--secret-port:https 监听端口;
--insecure-port=0:关闭监听 http 非安全端口(8080);
--tls-*-file:指定 apiserver 使用的证书、私钥和 CA 文件;
--audit-*:配置审计策略和审计日志文件相关的参数;
--client-ca-file:验证 client (kue-controller-manager、kube-scheduler、kubelet、kube-proxy 等)请求所带的证书;
--enable-bootstrap-token-auth:启用 kubelet bootstrap 的 token 认证;
--requestheader-*:kube-apiserver 的 aggregator layer 相关的配置参数,proxy-client & HPA 需要使用;
--requestheader-client-ca-file:用于签名 --proxy-client-cert-file 和 --proxy-client-key-file 指定的证书;在启用了 metric aggregator 时使用;
--requestheader-allowed-names:不能为空,值为逗号分割的 --proxy-client-cert-file 证书的 CN 名称,这里设置为 "aggregator";
--service-account-key-file:签名 ServiceAccount Token 的公钥文件,kube-controller-manager 的 --service-account-private-key-file 指定私钥文件,两者配对使用;
--runtime-config=api/all=true: 启用所有版本的 APIs,如 autoscaling/v2alpha1;
--authorization-mode=Node,RBAC、--anonymous-auth=false: 开启 Node 和 RBAC 授权模式,拒绝未授权的请求;
--enable-admission-plugins:启用一些默认关闭的 plugins;
--allow-privileged:运行执行 privileged 权限的容器;
--apiserver-count=3:指定 apiserver 实例的数量;
--event-ttl:指定 events 的保存时间;
--kubelet-*:如果指定,则使用 https 访问 kubelet APIs;需要为证书对应的用户(上面 kubernetes*.pem 证书的用户为 kubernetes) 用户定义 RBAC 规则,否则访问 kubelet API 时提示未授权;
--proxy-client-*:apiserver 访问 metrics-server 使用的证书;
--service-cluster-ip-range: 指定 Service Cluster IP 地址段;
--service-node-port-range: 指定 NodePort 的端口范围;
如果 kube-apiserver 机器没有运行 kube-proxy,则还需要添加 --enable-aggregator-routing=true 参数;
关于 --requestheader-XXX 相关参数,参考:
注意:
--requestheader-allowed-names 不为空,且 --proxy-client-cert-file 证书的 CN 名称不在 allowed-names 中,则后续查看 node 或 pods 的 metrics 失败,提示:xxxxxxxxxx[root@zhangjun-k8s01 1.8+]# kubectl top nodesError from server (Forbidden): nodes.metrics.k8s.io is forbidden: User "aggregator" cannot list resource "nodes" in API group "metrics.k8s.io" at the cluster scopexxxxxxxxxxcat >/usr/lib/systemd/system/kube-apiserver.service <<EOF[Unit]Description=Kubernetes API ServerDocumentation=https://github.com/kubernetes/kubernetesAfter=network.target[Service]EnvironmentFile=/app/etc/kube-apiserverExecStart=/app/bin/kube-apiserver \$KUBE_APISERVER_OPTSRestart=on-failureRestartSec=10Type=notify[Install]WantedBy=multi-user.targetEOFxxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/etc/kube-apiserver root@${node_ip}:/app/etc scp /usr/lib/systemd/system/kube-apiserver.service root@${node_ip}:/usr/lib/systemd/system ssh root@k8s-master-02 "sed -i 's#192.168.209.101#192.168.209.102#g' /app/etc/kube-apiserver" ssh root@k8s-master-03 "sed -i 's#192.168.209.101#192.168.209.103#g' /app/etc/kube-apiserver" donexxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-apiserver && systemctl restart kube-apiserver" donexxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status kube-apiserver | grep Active && ps -ef | grep kube-apiserver && netstat -tnulp | grep kube-api" done tcp 0 0 192.168.209.101:6443 0.0.0.0:* LISTEN 12644/kube-apiservemore /app/logs/kube-apiserver/kube-apiserver.INFO6443: 接收 https 请求的安全端口,对所有请求做认证和授权。
xxxxxxxxxxETCDCTL_API=3 etcdctl \ --endpoints=https://192.168.209.111:2379,https://192.168.209.112:2379,https://192.168.209.113:2379 \ --cacert=/app/cert/ca.pem \ --cert=/app/cert/etcd.pem \ --key=/app/cert/etcd-key.pem \ get /registry/ --prefix --keys-onlyxxxxxxxxxx$ kubectl cluster-info Kubernetes master is running at https://192.168.209.100:8443 To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. $ kubectl get all --all-namespaces NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default service/kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 30h$ kubectl get componentstatuses NAME AGE controller-manager <unknown> scheduler <unknown> etcd-1 <unknown> etcd-2 <unknown> etcd-0 <unknown>如果执行 kubectl 命令式时输出如下错误信息,则说明使用的 ~/.kube/config 文件不对,请切换到正确的账户后再执行该命令:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
执行 kubectl get componentstatuses 命令时,apiserver 默认向 127.0.0.1 发送请求。当 controller-manager、scheduler 以集群模式运行时,有可能和 kube-apiserver 不在一台机器上,这时 controller-manager 或 scheduler 的状态为 Unhealthy,但实际上它们工作正常。
在执行 kubectl exec、run、logs 等命令时,apiserver 会将请求转发到 kubelet 的 https 端口。这里定义 RBAC 规则,授权 apiserver 使用的证书(kubernetes.pem)用户名(CN:kuberntes)访问 kubelet API 的权限:
xxxxxxxxxx$ kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetesclusterrolebinding.rbac.authorization.k8s.io/kube-apiserver:kubelet-apis created
该集群包含 3 个节点,启动后将通过竞争选举机制产生一个 leader 节点,其它节点为阻塞状态。当 leader 节点不可用时,阻塞的节点将再次进行选举产生新的 leader 节点,从而保证服务的可用性。
为保证通信安全,本文档先生成 x509 证书和私钥,kube-controller-manager 在如下两种情况下使用该证书:
创建证书签名请求:
xxxxxxxxxxcd /app/certcat > /app/cert/kube-controller-manager-csr.json <<EOF{ "CN": "system:kube-controller-manager", "key": { "algo": "rsa", "size": 2048 }, "hosts": [ "127.0.0.1", "192.168.209.101", "192.168.209.102", "192.168.209.103" ], "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "system:kube-controller-manager", "OU": "System" } ]}EOFhosts 列表包含所有 kube-controller-manager 节点 IP;
CN 和 O 均为 system:kube-controller-manager,kubernetes 内置的 ClusterRoleBindings system:kube-controller-manager 赋予 kube-controller-manager 工作所需的权限。
生成证书和私钥:
xxxxxxxxxxcfssl gencert -ca=/app/cert/ca.pem \ -ca-key=/app/cert/ca-key.pem \ -config=/app/cert/ca-config.json \ -profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager ls -alh kube-controller-manager*.pemrm -rf kube-controller-manager.csr kube-controller-manager-csr.json将生成的证书和私钥分发到所有 master 节点:
xxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/cert/kube-controller-manager*.pem root@${node_ip}:/app/cert ssh root@${node_ip} "ls -alh /app/cert/kube-controller-manager*.pem" donekube-controller-manager 使用 kubeconfig 文件访问 apiserver,该文件提供了 apiserver 地址、嵌入的 CA 证书和 kube-controller-manager 证书:
xxxxxxxxxxcd /app/etckubectl config set-cluster kubernetes \ --certificate-authority=/app/cert/ca.pem \ --embed-certs=true \ --server=https://192.168.209.100:8443 \ --kubeconfig=kube-controller-manager.kubeconfigkubectl config set-credentials system:kube-controller-manager \ --client-certificate=/app/cert/kube-controller-manager.pem \ --client-key=/app/cert/kube-controller-manager-key.pem \ --embed-certs=true \ --kubeconfig=kube-controller-manager.kubeconfigkubectl config set-context system:kube-controller-manager \ --cluster=kubernetes \ --user=system:kube-controller-manager \ --kubeconfig=kube-controller-manager.kubeconfigkubectl config use-context system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfigls -alh kube-controller-manager.kubeconfig分发 kubeconfig 到所有 master 节点:
xxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/etc/kube-controller-manager.kubeconfig root@${node_ip}:/app/etc ssh root@${node_ip} "ls -alh /app/etc/kube-controller-manager.kubeconfig" done
xxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /app/logs/kube-controller-manager" done cat >/app/etc/kube-controller-manager <<EOFKUBE_CONTROLLER_MANAGER_OPTS="--cluster-name=kubernetes \\ --controllers=*,bootstrapsigner,tokencleaner \\ --kube-api-qps=1000 \\ --kube-api-burst=2000 \\ --leader-elect \\ --use-service-account-credentials=true \\ --concurrent-service-syncs=2 \\ --bind-address=192.168.209.101 \\ --secure-port=10257 \\ --port=0 \\ --tls-cert-file=/app/cert/kube-controller-manager.pem \\ --tls-private-key-file=/app/cert/kube-controller-manager-key.pem \\ --client-ca-file=/app/cert/ca.pem \\ --requestheader-allowed-names= \\ --requestheader-client-ca-file=/app/cert/ca.pem \\ --requestheader-extra-headers-prefix=X-Remote-Extra- \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-username-headers=X-Remote-User \\ --authorization-kubeconfig=/app/etc/kube-controller-manager.kubeconfig \\ --cluster-signing-cert-file=/app/cert/ca.pem \\ --cluster-signing-key-file=/app/cert/ca-key.pem \\ --experimental-cluster-signing-duration=876000h \\ --horizontal-pod-autoscaler-sync-period=10s \\ --concurrent-deployment-syncs=10 \\ --concurrent-gc-syncs=30 \\ --node-cidr-mask-size=24 \\ --service-cluster-ip-range=10.254.0.0/16 \\ --pod-eviction-timeout=6m \\ --terminated-pod-gc-threshold=10000 \\ --root-ca-file=/app/cert/ca.pem \\ --service-account-private-key-file=/app/cert/ca-key.pem \\ --kubeconfig=/app/etc/kube-controller-manager.kubeconfig \\ --log-dir=/app/logs/kube-controller-manager \\ --logtostderr=false \\ --v=2 "EOF--port=0:关闭监听非安全端口(http),同时 --address 参数无效,--bind-address 参数有效;
--secure-port=10252、--bind-address=0.0.0.0: 在所有网络接口监听 10252 端口的 https /metrics 请求;
--kubeconfig:指定 kubeconfig 文件路径,kube-controller-manager 使用它连接和验证 kube-apiserver;
--authentication-kubeconfig 和 --authorization-kubeconfig:kube-controller-manager 使用它连接 apiserver,对 client 的请求进行认证和授权。kube-controller-manager 不再使用 --tls-ca-file 对请求 https metrics 的 Client 证书进行校验。如果没有配置这两个 kubeconfig 参数,则 client 连接 kube-controller-manager https 端口的请求会被拒绝(提示权限不足)。
--cluster-signing-*-file:签名 TLS Bootstrap 创建的证书;
--experimental-cluster-signing-duration:指定 TLS Bootstrap 证书的有效期;
--root-ca-file:放置到容器 ServiceAccount 中的 CA 证书,用来对 kube-apiserver 的证书进行校验;
--service-account-private-key-file:签名 ServiceAccount 中 Token 的私钥文件,必须和 kube-apiserver 的 --service-account-key-file 指定的公钥文件配对使用;
--service-cluster-ip-range :指定 Service Cluster IP 网段,必须和 kube-apiserver 中的同名参数一致;
--leader-elect=true:集群运行模式,启用选举功能;被选为 leader 的节点负责处理工作,其它节点为阻塞状态;
--controllers=*,bootstrapsigner,tokencleaner:启用的控制器列表,tokencleaner 用于自动清理过期的 Bootstrap token;
--horizontal-pod-autoscaler-*:custom metrics 相关参数,支持 autoscaling/v2alpha1;
--tls-cert-file、--tls-private-key-file:使用 https 输出 metrics 时使用的 Server 证书和秘钥;
--use-service-account-credentials=true: kube-controller-manager 中各 controller 使用 serviceaccount 访问 kube-apiserver;
xxxxxxxxxxcat >/usr/lib/systemd/system/kube-controller-manager.service <<EOF[Unit]Description=Kubernetes Controller Manager ServerDocumentation=https://github.com/kubernetes/kubernetesAfter=network.target[Service]EnvironmentFile=/app/etc/kube-controller-managerExecStart=/app/bin/kube-controller-manager \$KUBE_CONTROLLER_MANAGER_OPTSRestart=on-failureRestartSec=5Type=notify[Install]WantedBy=multi-user.targetEOFxxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/etc/kube-controller-manager root@${node_ip}:/app/etc scp /usr/lib/systemd/system/kube-controller-manager.service root@${node_ip}:/usr/lib/systemd/system ssh root@k8s-master-02 "sed -i 's#192.168.209.101#192.168.209.102#g' /app/etc/kube-controller-manager" ssh root@k8s-master-03 "sed -i 's#192.168.209.101#192.168.209.103#g' /app/etc/kube-controller-manager" donexxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-controller-manager && systemctl restart kube-controller-manager" donexxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status kube-controller-manager | grep Active && ps -ef | grep kube-controller-manager && netstat -tnulp | grep kube-con" done tcp 0 0 192.168.209.101:10257 0.0.0.0:* LISTEN 13058/kube-controllmore /app/logs/kube-controller-manager/kube-controller-manager.INFOkube-controller-manager 监听 10257 端口,接收 https 请求。
xxxxxxxxxxkubectl get endpoints kube-controller-manager --namespace=kube-system -o yamlapiVersion: v1kind: Endpointsmetadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"k8s-master-01_cf286549-aea4-4b6c-bf49-77508454458e","leaseDurationSeconds":15,"acquireTime":"2019-11-13T03:56:48Z","renewTime":"2019-11-13T03:57:04Z","leaderTransitions":1019}' creationTimestamp: "2019-11-12T09:18:14Z" name: kube-controller-manager namespace: kube-system resourceVersion: "78079" selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager uid: 816d3350-3a9b-4432-84e3-4034df954564可见,当前的 leader 为 k8s-master-01 节点。
测试 kube-controller-manager 集群的高可用
停掉k8s-master-01节点:
xxxxxxxxxxsystemctl stop kube-controller-manager查看当前的 leader:
xxxxxxxxxxkubectl get endpoints kube-controller-manager --namespace=kube-system -o yamlapiVersion: v1kind: Endpointsmetadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"k8s-master-02_080e70f8-a6cc-4c4b-9290-abecb65fdf30","leaseDurationSeconds":15,"acquireTime":"2019-11-13T03:58:28Z","renewTime":"2019-11-13T03:58:31Z","leaderTransitions":1021}' creationTimestamp: "2019-11-12T09:18:14Z" name: kube-controller-manager namespace: kube-system resourceVersion: "78136" selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager uid: 816d3350-3a9b-4432-84e3-4034df954564可见,当前的 leader 为 k8s-master-02 节点。
ClusteRole system:kube-controller-manager 的权限很小,只能创建 secret、serviceaccount 等资源对象,各 controller 的权限分散到 ClusterRole system:controller:XXX 中:
xxxxxxxxxxkubectl describe clusterrole system:kube-controller-managerName: system:kube-controller-managerLabels: kubernetes.io/bootstrapping=rbac-defaultsAnnotations: rbac.authorization.kubernetes.io/autoupdate: truePolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- secrets [] [] [create delete get update] endpoints [] [] [create get update] serviceaccounts [] [] [create get update] events [] [] [create patch update] events.events.k8s.io [] [] [create patch update] serviceaccounts/token [] [] [create] tokenreviews.authentication.k8s.io [] [] [create] subjectaccessreviews.authorization.k8s.io [] [] [create] configmaps [] [] [get] namespaces [] [] [get] *.* [] [] [list watch]需要在 kube-controller-manager 的启动参数中添加 --use-service-account-credentials=true 参数,这样 main controller 会为各 controller 创建对应的 ServiceAccount XXX-controller。
内置的 ClusterRoleBinding system:controller:XXX 将赋予各 XXX-controller ServiceAccount 对应的 ClusterRole system:controller:XXX 权限。
xxxxxxxxxxkubectl get clusterrole | grep controllersystem:controller:attachdetach-controller 30hsystem:controller:certificate-controller 30hsystem:controller:clusterrole-aggregation-controller 30hsystem:controller:cronjob-controller 30hsystem:controller:daemon-set-controller 30hsystem:controller:deployment-controller 30hsystem:controller:disruption-controller 30hsystem:controller:endpoint-controller 30hsystem:controller:expand-controller 30hsystem:controller:generic-garbage-collector 30hsystem:controller:horizontal-pod-autoscaler 30hsystem:controller:job-controller 30hsystem:controller:namespace-controller 30hsystem:controller:node-controller 30hsystem:controller:persistent-volume-binder 30hsystem:controller:pod-garbage-collector 30hsystem:controller:pv-protection-controller 30hsystem:controller:pvc-protection-controller 30hsystem:controller:replicaset-controller 30hsystem:controller:replication-controller 30hsystem:controller:resourcequota-controller 30hsystem:controller:route-controller 30hsystem:controller:service-account-controller 30hsystem:controller:service-controller 30hsystem:controller:statefulset-controller 30hsystem:controller:ttl-controller 30hsystem:kube-controller-manager 30h以 deployment controller 为例:
xxxxxxxxxxkubectl describe clusterrole system:controller:deployment-controllerName: system:controller:deployment-controllerLabels: kubernetes.io/bootstrapping=rbac-defaultsAnnotations: rbac.authorization.kubernetes.io/autoupdate: truePolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- replicasets.apps [] [] [create delete get list patch update watch] replicasets.extensions [] [] [create delete get list patch update watch] events [] [] [create patch update] events.events.k8s.io [] [] [create patch update] pods [] [] [get list update watch] deployments.apps [] [] [get list update watch] deployments.extensions [] [] [get list update watch] deployments.apps/finalizers [] [] [update] deployments.apps/status [] [] [update] deployments.extensions/finalizers [] [] [update] deployments.extensions/status [] [] [update]以下命令在 kube-controller-manager 节点上执行。
xxxxxxxxxxcurl -s --cacert /app/cert/ca.pem --cert /app/cert/admin.pem --key /app/cert/admin-key.pem https://192.168.209.101:10257/metrics | head# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.# TYPE apiserver_audit_event_total counterapiserver_audit_event_total 0# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.# TYPE apiserver_audit_requests_rejected_total counterapiserver_audit_requests_rejected_total 0# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.# TYPE apiserver_client_certificate_expiration_seconds histogramapiserver_client_certificate_expiration_seconds_bucket{le="0"} 0apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
该集群包含 3 个节点,启动后将通过竞争选举机制产生一个 leader 节点,其它节点为阻塞状态。当 leader 节点不可用后,剩余节点将再次进行选举产生新的 leader 节点,从而保证服务的可用性。
为保证通信安全,本文档先生成 x509 证书和私钥,kube-scheduler 在如下两种情况下使用该证书:
创建证书签名请求:
xxxxxxxxxxcd /app/certcat > /app/cert/kube-scheduler-csr.json <<EOF{ "CN": "system:kube-scheduler", "hosts": [ "127.0.0.1", "192.168.209.101", "192.168.209.102", "192.168.209.103" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "system:kube-scheduler", "OU": "System" } ]}EOFhosts 列表包含所有 kube-scheduler 节点 IP;
CN 和 O 均为 system:kube-scheduler,kubernetes 内置的 ClusterRoleBindings system:kube-scheduler 将赋予 kube-scheduler 工作所需的权限;
生成证书和私钥:
xxxxxxxxxxcfssl gencert -ca=/app/cert/ca.pem \ -ca-key=/app/cert/ca-key.pem \ -config=/app/cert/ca-config.json \ -profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-schedulerrm -rf kube-scheduler.csr kube-scheduler-csr.jsonls -alh kube-scheduler*pem将生成的证书和私钥分发到所有 master 节点:
xxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/cert/kube-scheduler*.pem root@${node_ip}:/app/cert ssh root@${node_ip} "ls -alh /app/cert/kube-scheduler*.pem" donekube-scheduler 使用 kubeconfig 文件访问 apiserver,该文件提供了 apiserver 地址、嵌入的 CA 证书和 kube-scheduler 证书:
xxxxxxxxxxcd /app/etckubectl config set-cluster kubernetes \ --certificate-authority=/app/cert/ca.pem \ --embed-certs=true \ --server=https://192.168.209.100:8443 \ --kubeconfig=kube-scheduler.kubeconfigkubectl config set-credentials system:kube-scheduler \ --client-certificate=/app/cert/kube-scheduler.pem \ --client-key=/app/cert/kube-scheduler-key.pem \ --embed-certs=true \ --kubeconfig=kube-scheduler.kubeconfigkubectl config set-context system:kube-scheduler \ --cluster=kubernetes \ --user=system:kube-scheduler \ --kubeconfig=kube-scheduler.kubeconfigkubectl config use-context system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfigls -alh kube-scheduler.kubeconfig分发 kubeconfig 到所有 master 节点:
xxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/etc/kube-scheduler.kubeconfig root@${node_ip}:/app/etc ssh root@${node_ip} "ls -alh /app/etc/kube-scheduler.kubeconfig" donexxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /app/logs/kube-scheduler" done cat >/app/etc/kube-scheduler.yaml <<EOFapiVersion: kubescheduler.config.k8s.io/v1alpha1kind: KubeSchedulerConfigurationbindTimeoutSeconds: 600clientConnection: burst: 2000 kubeconfig: "/app/etc/kube-scheduler.kubeconfig" qps: 1000enableContentionProfiling: falseenableProfiling: truehardPodAffinitySymmetricWeight: 1healthzBindAddress: 127.0.0.1:10251leaderElection: leaderElect: truemetricsBindAddress: 127.0.0.1:10251EOF cat >/app/etc/kube-scheduler <<EOFKUBE_SCHEDULER_OPTS="--config=/app/etc/kube-scheduler.yaml \\ --bind-address=192.168.209.101 \\ --secure-port=10259 \\ --tls-cert-file=/app/cert/kube-scheduler.pem \\ --tls-private-key-file=/app/cert/kube-scheduler-key.pem \\ --authentication-kubeconfig=/app/etc/kube-scheduler.kubeconfig \\ --client-ca-file=/app/cert/ca.pem \\ --requestheader-allowed-names= \\ --requestheader-client-ca-file=/app/cert/ca.pem \\ --requestheader-extra-headers-prefix=X-Remote-Extra- \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-username-headers=X-Remote-User \\ --authorization-kubeconfig=/app/etc/kube-scheduler.kubeconfig \\ --log-dir=/app/logs/kube-scheduler \\ --logtostderr=false \\ --v=2 "EOF--kubeconfig:指定 kubeconfig 文件路径,kube-scheduler 使用它连接和验证 kube-apiserver;
--leader-elect=true:集群运行模式,启用选举功能;被选为 leader 的节点负责处理工作,其它节点为阻塞状态;
注意:以下命令在 kube-scheduler 节点上执行。kube-scheduler 监听 10251 和 10259 端口:两个接口都对外提供 /metrics 和 /healthz 的访问。
xxxxxxxxxxnetstat -ntulp | grep kube-sc tcp 0 0 127.0.0.1:10251 0.0.0.0:* LISTEN 12179/kube-schedule tcp 0 0 192.168.209.101:10259 0.0.0.0:* LISTEN 12179/kube-schedule注意:很多安装文档都是关闭了非安全端口,将安全端口改为默认的非安全端口数值,这会导致查看集群状态是报下面所示的错误,执行 kubectl get cs命令时,apiserver 默认向 127.0.0.1 发送请求。当controller-manager、scheduler以集群模式运行时,有可能和kube-apiserver不在一台机器上,且访问方式为https,则 controller-manager或scheduler 的状态为 Unhealthy,但实际上它们工作正常。则会导致上述error,但实际集群是安全状态;
xxxxxxxxxx$ kubectl get componentstatusesNAME AGEcontroller-manager <unknown>scheduler <unknown>etcd-1 <unknown>etcd-0 <unknown>etcd-2 <unknown>正常输出应该为:NAME STATUS MESSAGE ERRORscheduler Healthy ok controller-manager Healthy ok etcd-2 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"} etcd-0 Healthy {"health":"true"}
xxxxxxxxxxcat >/usr/lib/systemd/system/kube-scheduler.service <<EOF[Unit]Description=Kubernetes Scheduler ServerDocumentation=https://github.com/kubernetes/kubernetesAfter=network.target[Service]EnvironmentFile=/app/etc/kube-schedulerExecStart=/app/bin/kube-scheduler \$KUBE_SCHEDULER_OPTSRestart=alwaysRestartSec=5StartLimitInterval=0[Install]WantedBy=multi-user.targetEOFxxxxxxxxxxfor node_ip in k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" scp /app/etc/kube-scheduler.yaml root@${node_ip}:/app/etc scp /app/etc/kube-scheduler root@${node_ip}:/app/etc scp /usr/lib/systemd/system/kube-scheduler.service root@${node_ip}:/usr/lib/systemd/system ssh root@k8s-master-02 "sed -i 's#192.168.209.101#192.168.209.102#g' /app/etc/kube-scheduler" ssh root@k8s-master-03 "sed -i 's#192.168.209.101#192.168.209.103#g' /app/etc/kube-scheduler" donexxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-scheduler && systemctl restart kube-scheduler" donekube-scheduler 监听 10259 端口,接收 https 请求。
xxxxxxxxxxfor node_ip in k8s-master-01 k8s-master-02 k8s-master-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status kube-scheduler | grep Active && ps -ef | grep kube-scheduler && netstat -tnulp | grep kube-sc && lsof -i:10251 && lsof -i:10259" donetcp 0 0 127.0.0.1:10251 0.0.0.0:* LISTEN 12179/kube-schedule tcp 0 0 192.168.209.101:10259 0.0.0.0:* LISTEN 12179/kube-schedulemore /app/logs/kube-scheduler/kube-scheduler.INFOxxxxxxxxxxkubectl get endpoints kube-scheduler --namespace=kube-system -o yamlapiVersion: v1kind: Endpointsmetadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"k8s-master-03_759c7506-e35d-4f0d-9fa2-985828251acf","leaseDurationSeconds":15,"acquireTime":"2019-11-13T08:25:24Z","renewTime":"2019-11-13T08:32:49Z","leaderTransitions":4}' creationTimestamp: "2019-11-13T07:11:07Z" name: kube-scheduler namespace: kube-system resourceVersion: "91650" selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler uid: e133b1fd-1d25-4507-af16-e7435989b7af可见,当前的 leader 为 k8s-master-03 节点。
测试 kube-scheduler集群的高可用
停掉k8s-master-03节点:
xxxxxxxxxxsystemctl stop kube-scheduler查看当前的 leader:
xxxxxxxxxxkubectl get endpoints kube-scheduler --namespace=kube-system -o yamlapiVersion: v1kind: Endpointsmetadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"k8s-master-01_4049ef97-ae3c-47df-8c62-97135cc5fea6","leaseDurationSeconds":15,"acquireTime":"2019-11-13T08:35:06Z","renewTime":"2019-11-13T08:35:08Z","leaderTransitions":5}' creationTimestamp: "2019-11-13T07:11:07Z" name: kube-scheduler namespace: kube-system resourceVersion: "91809" selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler uid: e133b1fd-1d25-4507-af16-e7435989b7af可见,当前的 leader 为 k8s-master-01 节点。
以下命令在 kube-scheduler 节点上执行:
xxxxxxxxxxcurl -s http://127.0.0.1:10251/metrics | head# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.# TYPE apiserver_audit_event_total counterapiserver_audit_event_total 0# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.# TYPE apiserver_audit_requests_rejected_total counterapiserver_audit_requests_rejected_total 0# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.# TYPE apiserver_client_certificate_expiration_seconds histogramapiserver_client_certificate_expiration_seconds_bucket{le="0"} 0apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0xxxxxxxxxxcurl -s --cacert /app/cert/ca.pem --cert /app/cert/admin.pem --key /app/cert/admin-key.pem https://192.168.209.101:10259/metrics | head# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.# TYPE apiserver_audit_event_total counterapiserver_audit_event_total 0# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.# TYPE apiserver_audit_requests_rejected_total counterapiserver_audit_requests_rejected_total 0# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.# TYPE apiserver_client_certificate_expiration_seconds histogramapiserver_client_certificate_expiration_seconds_bucket{le="0"} 0apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
kubernetes worker 节点运行如下组件:
xxxxxxxxxxhostnamectl --static set-hostname k8s-node-01cat >> /etc/hosts <<EOF192.168.209.121 k8s-node-01192.168.209.122 k8s-node-02192.168.209.123 k8s-node-03EOFxxxxxxxxxxssh-keygen -t rsa -P "" -f ~/.ssh/id_rsafor node_ip in k8s-node-01 k8s-node-02 k8s-node-03do expect -c " spawn ssh-copy-id -i /root/.ssh/id_rsa.pub root@${node_ip} expect { \"*yes/no*\" {send \"yes\r\"; exp_continue} \"*password*\" {send \"hello\r\"; exp_continue} \"*Password*\" {send \"hello\r\";} } "donexxxxxxxxxxssh k8s-node-02 "hostnamectl --static set-hostname k8s-node-02" &&ssh k8s-node-03 "hostnamectl --static set-hostname k8s-node-03"xxxxxxxxxxssh k8s-node-02 "cat >> /etc/hosts <<EOF192.168.209.121 k8s-node-01192.168.209.122 k8s-node-02192.168.209.123 k8s-node-03EOF" &&ssh k8s-node-03 "cat >> /etc/hosts <<EOF192.168.209.121 k8s-node-01192.168.209.122 k8s-node-02192.168.209.123 k8s-node-03EOF"
kubernetes 要求集群内各节点(包括 master 节点)能通过 Pod 网段互联互通。flannel 使用 vxlan 技术为各节点创建一个可以互通的 Pod 网络,使用的端口为 UDP 8472(需要开放该端口,如公有云 AWS 等)。
flanneld 第一次启动时,从 etcd 获取配置的 Pod 网段信息,为本节点分配一个未使用的地址段,然后创建 flannedl.1 网络接口(也可能是其它名称,如 flannel1 等)。
flannel 将分配给自己的 Pod 网段信息写入 /run/flannel/docker 文件,docker 后续使用这个文件中的环境变量设置 docker0 网桥,从而从这个地址段为本节点的所有 Pod 容器分配 IP。
从 flannel 的 release 页面 下载最新版本的安装包:
xxxxxxxxxxcd /app/optwget https://github.com/coreos/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gztar -xzf flannel-v0.11.0-linux-amd64.tar.gzcp flanneld mk-docker-opts.sh /app/bin分发二进制文件到集群所有节点:
xxxxxxxxxx for node_ip in k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" scp /app/opt/{flanneld,mk-docker-opts.sh} root@${node_ip}:/app/bin ssh root@${node_ip} "ls -alh /app/bin/{flanneld,mk-docker-opts.sh} && flanneld -version" doneflanneld 从 etcd 集群存取网段分配信息,而 etcd 集群启用了双向 x509 证书认证,所以需要为 flanneld 生成证书和私钥。
创建证书签名请求:
xxxxxxxxxxcd /app/certcat > /app/cert/flanneld-csr.json <<EOF{ "CN": "flanneld", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "System" } ]}EOF该证书只会被 kubectl 当做 client 证书使用,所以 hosts 字段为空;
生成证书和私钥:
xxxxxxxxxxcfssl gencert -ca=/app/cert/ca.pem \ -ca-key=/app/cert/ca-key.pem \ -config=/app/cert/ca-config.json \ -profile=kubernetes flanneld-csr.json | cfssljson -bare flanneldrm -rf flanneld.csr flanneld-csr.jsonls -alh flanneld*pem将生成的证书和私钥分发到所有node节点(master 和 worker):
xxxxxxxxxxfor node_ip in k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" scp /app/cert/flanneld*.pem root@${node_ip}:/app/cert ssh root@${node_ip} "ls -alh /app/cert/flanneld*.pem" done flannel开启了HTTPS访问所以需要证书支持;
xxxxxxxxxx将etcd-ca.pem、etcd-key.pem、etcd.pem 上传到node-01节点的 /app/cert目录下。for node_ip in k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" scp /app/cert/etcd*.pem root@${node_ip}:/app/cert ssh root@${node_ip} "ls -alh /app/cert/etcd*.pem" done注意:本步骤在etcd-01节点只需执行一次。
xxxxxxxxxxexport etcd_endpoints=https://192.168.209.111:2379,https://192.168.209.112:2379,https://192.168.209.113:2379export flannel_etcd_prefix=/kubernetes/networkexport cluster_cidr=172.30.0.0/16ETCDCTL_API=2 etcdctl \ --endpoints=${etcd_endpoints} \ --ca-file=/app/cert/ca.pem \ --cert-file=/app/cert/etcd.pem \ --key-file=/app/cert/etcd-key.pem \ mk ${flannel_etcd_prefix}/config '{"Network":"'${cluster_cidr}'", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}'
- flanneld 当前版本 (v0.11.0) 不支持 etcd v3,故使用 etcd v2 API 写入配置 key 和网段数据;
- 写入的 Pod 网段 ${CLUSTER_CIDR} 地址段(如 /16)必须小于 SubnetLen,必须与 kube-controller-manager 的 --cluster-cidr 参数值一致;
- Network flannel使用CIDR格式(172.30.0.0/16)的网络地址,用于为pod的配置网络功能
- SubnetLen表示每个主机分配的subnet大小,我们可以在初始化时对其指定,否则使用默认配置。在默认配置的情况下SubnetLen配置为24(表示24位子网掩码)。
- SubnetMin是集群网络地址空间中最小的可分配的subnet,可以手动指定,否则默认配置为集群网络地址空间中第一个可分配的subnet。例如对于”10.1.0.0/16″,当SubnetLen为24时,第一个可分配的subnet为”10.1.1.0/24″。
- SubnetMax表示最大可分配的subnet,对于”10.1.0.0/16″,当subnetLen为24时,SubnetMax为”10.1.255.0/24″
- Backend.Type 为flannel指定使用的backend的类型,类型分三种:vxlan、host-gw、udp,如未指定,则默认为“vxlan”
- 注意:Backend为vxlan时,其中会存储vtep设备的mac地址至etcd中。
xxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /app/run/flannel" doneexport etcd_endpoints=https://192.168.209.111:2379,https://192.168.209.112:2379,https://192.168.209.113:2379export flannel_etcd_prefix=/kubernetes/networkexport iface="ens32" cat > /app/etc/flanneld <<EOFFLANNEL_OPTS="-etcd-cafile=/app/cert/etcd-ca.pem \\ -etcd-certfile=/app/cert/etcd.pem \\ -etcd-keyfile=/app/cert/etcd-key.pem \\ -etcd-endpoints=${etcd_endpoints} \\ -etcd-prefix=${flannel_etcd_prefix} \\ -iface=${iface} \\ -ip-masq"EOF
- flanneld 使用系统缺省路由所在的接口与其它节点通信,对于有多个网络接口(如内网和公网)的节点,可以用
-iface参数指定通信接口;- flanneld 运行时需要 root 权限;
-ip-masq: flanneld 为访问 Pod 网络外的流量设置 SNAT 规则,同时将传递给 Docker 的变量--ip-masq(/run/flannel/docker文件中)设置为 false,这样 Docker 将不再创建 SNAT 规则; Docker 的--ip-masq为 true 时,创建的 SNAT 规则比较“暴力”:将所有本节点 Pod 发起的、访问非 docker0 接口的请求做 SNAT,这样访问其他节点 Pod 的请求来源 IP 会被设置为 flannel.1 接口的 IP,导致目的 Pod 看不到真实的来源 Pod IP。 flanneld 创建的 SNAT 规则比较温和,只对访问非 Pod 网段的请求做 SNAT。
xxxxxxxxxxcat > /usr/lib/systemd/system/flanneld.service <<EOF[Unit]Description=Flanneld overlay address etcd agentAfter=network.target network-online.target Wants=network-online.targetBefore=docker.service[Service]EnvironmentFile=/app/etc/flanneldExecStart=/app/bin/flanneld \$FLANNEL_OPTSExecStartPost=/app/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /app/run/flannel/docker Restart=on-failureRestartSec=5Type=notifyStartLimitInterval=0[Install]WantedBy=multi-user.targetRequiredBy=docker.serviceEOF
mk-docker-opts.sh脚本将分配给 flanneld 的 Pod 子网段信息写入/app/run/flannel/docker文件,后续 docker 启动时使用这个文件中的环境变量配置 docker0 网桥。
xxxxxxxxxxfor node_ip in k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" scp /app/etc/flanneld root@${node_ip}:/app/etc scp /usr/lib/systemd/system/flanneld.service root@${node_ip}:/usr/lib/systemd/system donexxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable flanneld && systemctl restart flanneld" donexxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status flanneld | grep Active && ps -ef | grep flanneld" done检查分配给各 flanneld 的 Pod 网段信息(etcd节点执行)
查看集群 Pod 网段(/16):
xxxxxxxxxxETCDCTL_API=2 etcdctl \ --endpoints=${etcd_endpoints} \ --ca-file=/app/cert/ca.pem \ --cert-file=/app/cert/etcd.pem \ --key-file=/app/cert/etcd-key.pem \ get ${flannel_etcd_prefix}/config {"Network":"172.30.0.0/16", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}查看已分配的 Pod 子网段列表(/24):
xxxxxxxxxx ETCDCTL_API=2 etcdctl \ --endpoints=${etcd_endpoints} \ --ca-file=/app/cert/ca.pem \ --cert-file=/app/cert/etcd.pem \ --key-file=/app/cert/etcd-key.pem \ ls ${flannel_etcd_prefix}/subnets /kubernetes/network/subnets/172.30.94.0-24/kubernetes/network/subnets/172.30.65.0-24/kubernetes/network/subnets/172.30.31.0-24查看某一 Pod 网段对应的节点 IP 和 flannel 接口地址:
xxxxxxxxxx ETCDCTL_API=2 etcdctl \ --endpoints=${etcd_endpoints} \ --ca-file=/app/cert/ca.pem \ --cert-file=/app/cert/etcd.pem \ --key-file=/app/cert/etcd-key.pem \ get ${flannel_etcd_prefix}/subnets/172.30.94.0-24 {"PublicIP":"192.168.209.121","BackendType":"vxlan","BackendData":{"VtepMAC":"02:ed:89:f7:83:ad"}}172.30.94.0/24 被分配给节点 etcd-01(192.168.209.121); VtepMAC 为 node-01 节点的 flannel.1 网卡 MAC 地址;
在各节点上部署 flannel 后,检查是否创建了 flannel 接口(名称可能为 flannel0、flannel.0、flannel.1 等):
xxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "/usr/sbin/ip addr show flannel.1 | grep -w inet" done >>> k8s-node-01 inet 172.30.94.0/32 scope global flannel.1>>> k8s-node-02 inet 172.30.65.0/32 scope global flannel.1>>> k8s-node-03 inet 172.30.31.0/32 scope global flannel.1在各节点上 ping 所有 flannel 接口 IP,确保能通:
xxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "ping -c 1 172.30.94.0" ssh root@${node_ip} "ping -c 1 172.30.65.0" ssh root@${node_ip} "ping -c 1 172.30.31.0" done
docker 运行和管理容器,kubelet 通过 Container Runtime Interface (CRI) 与它进行交互。
到 docker 下载页面 下载最新发布包:
xxxxxxxxxxcd /opt/k8s/workwget https://download.docker.com/linux/static/stable/x86_64/docker-19.03.5.tgztar -xzf docker-19.03.5.tgz分发二进制文件到所有 worker 节点:
xxxxxxxxxx for node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" scp /app/opt/docker/* root@${node_ip}:/app/bin ssh root@${node_ip} "chmod +x /app/bin/* && source /etc/profile && docker -v && mkdir -p /app/data/docker && mkdir -p /app/run/docker" done使用国内的仓库镜像服务器以加快 pull image 的速度,同时增加下载的并发数 (需要重启 dockerd 生效):
xxxxxxxxxxcat > /app/etc/docker-daemon.json <<EOF{ "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn","https://hub-mirror.c.163.com"], "insecure-registries": ["docker02:35000"], "max-concurrent-downloads": 20, "live-restore": true, "max-concurrent-uploads": 10, "debug": true, "data-root": "/app/data/docker", "exec-root": "/app/run/docker", "log-opts": { "max-size": "100m", "max-file": "10" }}EOFxxxxxxxxxxcat > /usr/lib/systemd/system/docker.service << "EOF"[Unit]Description=Docker Application Container EngineDocumentation=http://docs.docker.io[Service]Environment="PATH=/app/bin:/bin:/sbin:/usr/bin:/usr/sbin"EnvironmentFile=-/app/run/flannel/dockerExecStart=/app/bin/dockerd --config-file=/app/etc/docker-daemon.json $DOCKER_NETWORK_OPTIONSExecReload=/bin/kill -s HUP $MAINPIDRestart=on-failureRestartSec=5LimitNOFILE=infinityLimitNPROC=infinityLimitCORE=infinityDelegate=yesKillMode=process[Install]WantedBy=multi-user.targetEOF
- EOF 前后有双引号,这样 bash 不会替换文档中的变量,如
$DOCKER_NETWORK_OPTIONS(这些环境变量是 systemd 负责替换的。);- dockerd 运行时会调用其它 docker 命令,如 docker-proxy,所以需要将 docker 命令所在的目录加到 PATH 环境变量中;
- flanneld 启动时将网络配置写入
/app/run/flannel/docker文件中,dockerd 启动前读取该文件中的环境变量DOCKER_NETWORK_OPTIONS,然后设置 docker0 网桥网段;- 如果指定了多个
EnvironmentFile选项,则必须将/app/run/flannel/docker放在最后(确保 docker0 使用 flanneld 生成的 bip 参数);- docker 需要以 root 用于运行;
docker 从 1.13 版本开始,可能将 iptables FORWARD chain的默认策略设置为DROP,从而导致 ping 其它 Node 上的 Pod IP 失败,遇到这种情况时,需要手动设置策略为 ACCEPT。
且把以下命令写入 /etc/rc.local 文件中,
xxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "/sbin/iptables -P FORWARD ACCEPT && echo '/sbin/iptables -P FORWARD ACCEPT' >> /etc/rc.local" done分发 systemd unit 文件到所有 worker 机器:
xxxxxxxxxxfor node_ip in k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" scp /app/etc/docker-daemon.json root@${node_ip}:/app/etc scp /usr/lib/systemd/system/docker.service root@${node_ip}:/usr/lib/systemd/system donexxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable docker && systemctl restart docker" donexxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status docker | grep Active" done确保状态为 active (running),否则查看日志,确认原因:
xxxxxxxxxxjournalctl -f u dockerxxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "/usr/sbin/ip addr show flannel.1 && /usr/sbin/ip addr show docker0" done 确认各 worker 节点的 docker0 网桥和 flannel.1 接口的 IP 处于同一个网段中。
xxxxxxxxxx>>> k8s-node-014: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group defaultlink/ether 02:ed:89:f7:83:ad brd ff:ff:ff:ff:ff:ffinet 172.30.94.0/32 scope global flannel.1valid_lft forever preferred_lft forever5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group defaultlink/ether 02:42:0c:51:15:fb brd ff:ff:ff:ff:ff:ffinet 172.30.94.1/24 brd 172.30.94.255 scope global docker0valid_lft forever preferred_lft forever
注意: 如果您的服务安装顺序不对或者机器环境比较复杂, docker服务早于flanneld服务安装,此时 worker 节点的 docker0 网桥和 flannel.1 接口的 IP可能不会同处同一个网段下,这个时候请先停止docker服务, 手工删除docker0网卡,重新启动docker服务后即可修复:
xxxxxxxxxxsystemctl stop dockerip link delete docker0systemctl start dockerxxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" ssh root@${node_ip} "ps -elfH | grep docker && source /etc/profile && docker info" donekubelet 运行在每个 worker 节点上,接收 kube-apiserver 发送的请求,管理 Pod 容器,执行交互式命令,如 exec、run、logs 等。 kubelet 启动时自动向 kube-apiserver 注册节点信息,内置的 cadvisor 统计和监控节点的资源使用情况。 为确保安全,部署时关闭了 kubelet 的非安全 http 端口,对请求进行认证和授权,拒绝未授权的访问(如 apiserver、heapster 的请求)。
从 CHANGELOG 页面 下载二进制 tar 文件并解压:
xxxxxxxxxxcd /app/optwget https://dl.k8s.io/v1.16.2/kubernetes-node-linux-amd64.tar.gztar -xzf kubernetes-node-linux-amd64.tar.gz将二进制文件拷贝到所有 master 节点:
xxxxxxxxxxfor node_ip in k8s-node-01 k8s-node-02 k8s-node-03 do echo ">>> ${node_ip}" scp /app/opt/kubernetes/node/bin/* root@${node_ip}:/app/bin ssh root@${node_ip} "chmod +x /app/bin/* && source /etc/profile && kubelet --version && kube-proxy --version && kubeadm version" donexxxxxxxxxxcd /app/etcexport node_names=(k8s-node-01 k8s-node-02 k8s-node-03)export kube_apiserver="https://192.168.209.100:8443"for node_name in ${node_names[@]} do echo ">>> ${node_name}" # 创建 token export BOOTSTRAP_TOKEN=$(kubeadm token create \ --description kubelet-bootstrap-token \ --groups system:bootstrappers:${node_name} \ --kubeconfig ~/.kube/config) # 设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/app/cert/ca.pem \ --embed-certs=true \ --server=${kube_apiserver} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig # 设置客户端认证参数 kubectl config set-credentials kubelet-bootstrap \ --token=${BOOTSTRAP_TOKEN} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig # 设置上下文参数 kubectl config set-context default \ --cluster=kubernetes \ --user=kubelet-bootstrap \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig # 设置默认上下文 kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig done ll /app/etc/kubelet*/app/etc/kubelet-bootstrap-k8s-node-01.kubeconfig /app/etc/kubelet-bootstrap-k8s-node-03.kubeconfig/app/etc/kubelet-bootstrap-k8s-node-02.kubeconfig向 kubeconfig 写入的是 token,bootstrap 结束后 kube-controller-manager 为 kubelet 创建 client 和 server 证书;
查看 kubeadm 为各节点创建的 token:
xxxxxxxxxxkubeadm token list --kubeconfig ~/.kube/configTOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS0j5q1u.hzadpo5povwhp1gl 23h 2019-12-18T11:23:52+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:k8s-node-02cxjvja.vtcq3478fktajitm 23h 2019-12-18T11:23:05+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:k8s-node-01iz40wg.blrolof16rx238gw 23h 2019-12-18T11:23:52+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:k8s-node-01on3o5o.y1spfp584jw3t5ak 23h 2019-12-18T11:23:53+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:k8s-node-03r46d3x.hs2mrni4kr3bubkm 23h 2019-12-18T11:23:07+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:k8s-node-03wa0hj7.qc40t5b6zf54achs 23h 2019-12-18T11:23:06+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:k8s-node-02token 有效期为 1 天,超期后将不能再被用来 boostrap kubelet,且会被 kube-controller-manager 的 tokencleaner 清理; kube-apiserver 接收 kubelet 的 bootstrap token 后,将请求的 user 设置为 system:bootstrap:
,group 设置为 system:bootstrappers,后续将为这个 group 设置 ClusterRoleBinding;
查看各 token 关联的 Secret:
xxxxxxxxxxkubectl get secrets -n kube-system | grep bootstrap-tokenbootstrap-token-0j5q1u bootstrap.kubernetes.io/token 7 8m7sbootstrap-token-cxjvja bootstrap.kubernetes.io/token 7 8m53sbootstrap-token-iz40wg bootstrap.kubernetes.io/token 7 8m7sbootstrap-token-on3o5o bootstrap.kubernetes.io/token 7 8m6sbootstrap-token-r46d3x bootstrap.kubernetes.io/token 7 8m52sbootstrap-token-wa0hj7 bootstrap.kubernetes.io/token 7 8m52sxxxxxxxxxxfor node_name in ${node_names[@]} do echo ">>> ${node_name}" scp /app/etc/kubelet-bootstrap-${node_name}.kubeconfig root@${node_name}:/app/etc/kubelet-bootstrap.kubeconfig done从 v1.10 开始,部分 kubelet 参数需在配置文件中配置,kubelet --help 会提示:
xxxxxxxxxxDEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag.
创建 kubelet 参数配置文件模板(可配置项参考代码中注释 ):
xxxxxxxxxxexport cluster_dns_domain="cluster.local"export cluster_dns_svc_ip="10.254.0.254"export cluster_cidr="172.30.0.0/16"cat > /app/etc/kubelet-config.yaml.template <<EOFkind: KubeletConfigurationapiVersion: kubelet.config.k8s.io/v1beta1address: "##NODE_IP##"staticPodPath: ""syncFrequency: 1mfileCheckFrequency: 20shttpCheckFrequency: 20sstaticPodURL: ""port: 10250readOnlyPort: 0rotateCertificates: trueserverTLSBootstrap: trueauthentication: anonymous: enabled: false webhook: enabled: true x509: clientCAFile: "/app/cert/ca.pem"authorization: mode: WebhookregistryPullQPS: 0registryBurst: 20eventRecordQPS: 0eventBurst: 20enableDebuggingHandlers: trueenableContentionProfiling: truehealthzPort: 10248healthzBindAddress: "##NODE_IP##"clusterDomain: "${cluster_dns_domain}"clusterDNS: - "${cluster_dns_svc_ip}"nodeStatusUpdateFrequency: 10snodeStatusReportFrequency: 1mimageMinimumGCAge: 2mimageGCHighThresholdPercent: 85imageGCLowThresholdPercent: 80volumeStatsAggPeriod: 1mkubeletCgroups: ""systemCgroups: ""cgroupRoot: ""cgroupsPerQOS: truecgroupDriver: cgroupfsruntimeRequestTimeout: 10mhairpinMode: promiscuous-bridgemaxPods: 220podCIDR: "${cluster_cidr}"podPidsLimit: -1resolvConf: /etc/resolv.confmaxOpenFiles: 1000000kubeAPIQPS: 1000kubeAPIBurst: 2000serializeImagePulls: falseevictionHard: memory.available: "100Mi" nodefs.available: "10%" nodefs.inodesFree: "5%" imagefs.available: "15%"evictionSoft: {}enableControllerAttachDetach: truefailSwapOn: truecontainerLogMaxSize: 20MicontainerLogMaxFiles: 10systemReserved: {}kubeReserved: {}systemReservedCgroup: ""kubeReservedCgroup: ""enforceNodeAllocatable: ["pods"]EOFcat > /app/etc/kubelet << EOFKUBELET_OPTS="--bootstrap-kubeconfig=/app/etc/kubelet-bootstrap.kubeconfig \\ --cert-dir=/app/cert \\ --cni-conf-dir=/app/etc/cni/net.d \\ --container-runtime=docker \\ --container-runtime-endpoint=unix:///app/run/dockershim.sock \\ --root-dir=${k8s_dir}/kubelet \\ --kubeconfig=/app/etc/kubelet.kubeconfig \\ --config=/app/etc/kubelet-config.yaml \\ --pod-infra-container-image=registry.cn-beijing.aliyuncs.com/images_k8s/pause-amd64:3.1 \\ --image-pull-progress-deadline=15m \\ --volume-plugin-dir=${k8s_dir}/kubelet/kubelet-plugins/volume/exec/ \\ --logtostderr=true \\ --v=2 "EOF
- address:kubelet 安全端口(https,10250)监听的地址,不能为 127.0.0.1,否则 kube-apiserver、heapster 等不能调用 kubelet 的 API;
- readOnlyPort=0:关闭只读端口(默认 10255),等效为未指定;
- authentication.anonymous.enabled:设置为 false,不允许匿名访问 10250 端口;
- authentication.x509.clientCAFile:指定签名客户端证书的 CA 证书,开启 HTTP 证书认证;
- authentication.webhook.enabled=true:开启 HTTPs bearer token 认证;
- 对于未通过 x509 证书和 webhook 认证的请求(kube-apiserver 或其他客户端),将被拒绝,提示 Unauthorized;
- authroization.mode=Webhook:kubelet 使用 SubjectAccessReview API 查询 kube-apiserver 某 user、group 是否具有操作资源的权限(RBAC);
- featureGates.RotateKubeletClientCertificate、featureGates.RotateKubeletServerCertificate:自动 rotate 证书,证书的有效期取决于 kube-controller-manager 的 --experimental-cluster-signing-duration 参数;
- 需要 root 账户运行;
- 如果设置了
--hostname-override选项,则kube-proxy也需要设置该选项,否则会出现找不到 Node 的情况;--bootstrap-kubeconfig:指向 bootstrap kubeconfig 文件,kubelet 使用该文件中的用户名和 token 向 kube-apiserver 发送 TLS Bootstrapping 请求;- K8S approve kubelet 的 csr 请求后,在
--cert-dir目录创建证书和私钥文件,然后写入--kubeconfig文件;--pod-infra-container-image不使用 redhat 的pod-infrastructure:latest镜像,它不能回收容器的僵尸;
为各节点创建和分发 kubelet 配置文件:
xxxxxxxxxxexport node_ips=(192.168.209.121 192.168.209.122 192.168.209.123)for node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" sed -e "s/##NODE_IP##/${node_ip}/g" /app/etc/kubelet-config.yaml.template > /app/etc/kubelet-config-${node_ip}.yaml.template scp /app/etc/kubelet-config-${node_ip}.yaml.template root@${node_ip}:/app/etc/kubelet-config.yaml scp /app/etc/kubelet root@${node_ip}:/app/etc done创建 kubelet systemd unit 文件模板:
xxxxxxxxxxexport k8s_dir="/app/data"cat > /app/etc/kubelet.service <<EOF[Unit]Description=Kubernetes KubeletDocumentation=https://github.com/GoogleCloudPlatform/kubernetesAfter=docker.serviceRequires=docker.service[Service]EnvironmentFile=/app/etc/kubeletExecStart=/app/bin/kubelet \$KUBELET_OPTSRestart=on-failureRestartSec=5KillMode=processStartLimitInterval=0[Install]WantedBy=multi-user.targetEOF为各节点创建和分发 kubelet systemd unit 文件
xxxxxxxxxxfor node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" scp /app/etc/kubelet.service root@${node_ip}:/usr/lib/systemd/system/kubelet.service done
kubelet 启动时查找 --kubeletconfig 参数对应的文件是否存在,如果不存在则使用 --bootstrap-kubeconfig 指定的 kubeconfig 文件向 kube-apiserver 发送证书签名请求 (CSR)。
kube-apiserver 收到 CSR 请求后,对其中的 Token 进行认证,认证通过后将请求的 user 设置为 system:bootstrap:
xxxxxxxxxxjournalctl -u kubelet -a |grep -A 2 'certificatesigningrequests'Dec 17 15:04:06 k8s-node-01 kubelet[96901]: E1217 15:04:06.093223 96901 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:8h60d8" cannot create resource "certificatesigningrequests" in API group "certificates.k8s.io" at the cluster scopeDec 17 15:04:08 k8s-node-01 kubelet[96901]: I1217 15:04:08.278774 96901 certificate_manager.go:381] Rotating certificatesDec 17 15:04:08 k8s-node-01 kubelet[96901]: E1217 15:04:08.287306 96901 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:8h60d8" cannot create resource "certificatesigningrequests" in API group "certificates.k8s.io" at the cluster scopeDec 17 15:04:12 k8s-node-01 kubelet[96901]: I1217 15:04:12.484910 96901 certificate_manager.go:381] Rotating certificatesDec 17 15:04:12 k8s-node-01 kubelet[96901]: E1217 15:04:12.576963 96901 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:8h60d8" cannot create resource "certificatesigningrequests" in API group "certificates.k8s.io" at the cluster scopeDec 17 15:04:14 k8s-node-01 kubelet[96901]: I1217 15:04:14.412047 96901 fs.go:127] Filesystem UUIDs: map[3bca7d87-e615-4bd3-a062-325ae4cf6c50:/dev/sda1 d605d0dd-5d9b-404d-8a00-be280013a095:/dev/dm-0]Dec 17 15:04:14 k8s-node-01 kubelet[96901]: I1217 15:04:14.421814 96901 fs.go:128] Filesystem partitions: map[/dev/mapper/centos_centos7-root:{mountpoint:/ major:253 minor:0 fsType:xfs blockSize:0} /dev/sda1:{mountpoint:/boot major:8 minor:1 fsType:xfs blockSize:0} tmpfs:{mountpoint:/dev/shm major:0 minor:19 fsType:tmpfs blockSize:0}]--Dec 17 15:04:21 k8s-node-01 kubelet[96901]: E1217 15:04:21.016204 96901 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:8h60d8" cannot create resource "certificatesigningrequests" in API group "certificates.k8s.io" at the cluster scopeDec 17 15:04:21 k8s-node-01 kubelet[96901]: E1217 15:04:21.016364 96901 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.CSIDriver: UnauthorizedDec 17 15:04:21 k8s-node-01 kubelet[96901]: E1217 15:04:21.016445 96901 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.RuntimeClass: Unauthorized--Dec 17 15:04:38 k8s-node-01 kubelet[96901]: E1217 15:04:38.106782 96901 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:8h60d8" cannot create resource "certificatesigningrequests" in API group "certificates.k8s.io" at the cluster scopeDec 17 15:04:38 k8s-node-01 kubelet[96901]: E1217 15:04:38.106818 96901 certificate_manager.go:290] Reached backoff limit, still unable to rotate certs: timed out waiting for the conditionDec 17 15:04:38 k8s-node-01 kubelet[96901]: E1217 15:04:38.178019 96901 kubelet.go:2267] node "k8s-node-01" not found解决办法是:创建一个 clusterrolebinding,将 group system:bootstrappers 和 clusterrole system:node-bootstrapper 绑定:
xxxxxxxxxxkubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --group=system:bootstrappers
xxxxxxxxxxfor node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p ${k8s_dir}/kubelet/kubelet-plugins/volume/exec" ssh root@${node_ip} "/usr/sbin/swapoff -a" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kubelet && systemctl restart kubelet" done启动服务前必须先创建工作目录; 关闭 swap 分区,否则 kubelet 会启动失败;
kubelet 启动后使用 --bootstrap-kubeconfig 向 kube-apiserver 发送 CSR 请求,当这个 CSR 被 approve 后,kube-controller-manager 为 kubelet 创建 TLS 客户端证书、私钥和 --kubeletconfig 文件。
注意:kube-controller-manager 需要配置 --cluster-signing-cert-file 和 --cluster-signing-key-file 参数,才会为 TLS Bootstrap 创建证书和私钥。
xxxxxxxxxxkubectl get csrNAME AGE REQUESTOR CONDITIONcsr-4zj8q 22m system:bootstrap:8h60d8 Pendingcsr-6z6qb 138m system:bootstrap:8h60d8 Pendingcsr-7bjfm 76m system:bootstrap:8h60d8 Pendingcsr-8fs2t 123m system:bootstrap:8h60d8 Pendingcsr-dtqn8 108m system:bootstrap:8h60d8 Pendingcsr-dzm25 17m system:bootstrap:8h60d8 Pendingcsr-hgsz5 93m system:bootstrap:8h60d8 Pendingcsr-jdt2k 61m system:bootstrap:8h60d8 Pendingcsr-jh642 37m system:bootstrap:8h60d8 Pendingcsr-jwwrd 138m system:bootstrap:8h60d8 Pendingcsr-mltzh 52m system:bootstrap:8h60d8 Pendingcsr-nmcbr 77m system:bootstrap:8h60d8 Pendingcsr-nxgsx 3m21s system:bootstrap:8h60d8 Pendingcsr-pbkds 20m system:bootstrap:8h60d8 Pendingcsr-qklgc 22m system:bootstrap:8h60d8 Pendingcsr-vknpm 53m system:bootstrap:8h60d8 Pendingcsr-wppmc 15m system:bootstrap:8h60d8 Pendingkubectl get nodesNo resources found in default namespace.三个 worker 节点的 csr 均处于 pending 状态;
创建三个 ClusterRoleBinding,分别用于自动 approve client、renew client、renew server 证书:
xxxxxxxxxxcat > /app/etc/csr-crb.yaml <<EOF # Approve all CSRs for the group "system:bootstrappers" kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: auto-approve-csrs-for-group subjects: - kind: Group name: system:bootstrappers apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: system:certificates.k8s.io:certificatesigningrequests:nodeclient apiGroup: rbac.authorization.k8s.io--- # To let a node of the group "system:nodes" renew its own credentials kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: node-client-cert-renewal subjects: - kind: Group name: system:nodes apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient apiGroup: rbac.authorization.k8s.io---# A ClusterRole which instructs the CSR approver to approve a node requesting a# serving cert matching its client cert.kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata: name: approve-node-server-renewal-csrrules:- apiGroups: ["certificates.k8s.io"] resources: ["certificatesigningrequests/selfnodeserver"] verbs: ["create"]--- # To let a node of the group "system:nodes" renew its own server credentials kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: node-server-cert-renewal subjects: - kind: Group name: system:nodes apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: approve-node-server-renewal-csr apiGroup: rbac.authorization.k8s.ioEOFkubectl apply -f csr-crb.yamlclusterrolebinding.rbac.authorization.k8s.io/auto-approve-csrs-for-group createdclusterrolebinding.rbac.authorization.k8s.io/node-client-cert-renewal createdclusterrole.rbac.authorization.k8s.io/approve-node-server-renewal-csr createdclusterrolebinding.rbac.authorization.k8s.io/node-server-cert-renewal created
- auto-approve-csrs-for-group:自动 approve node 的第一次 CSR; 注意第一次 CSR 时,请求的 Group 为 system:bootstrappers;
- node-client-cert-renewal:自动 approve node 后续过期的 client 证书,自动生成的证书 Group 为 system:nodes;
- node-server-cert-renewal:自动 approve node 后续过期的 server 证书,自动生成的证书 Group 为 system:nodes;
等待一段时间(1-10 分钟),三个节点的 CSR 都被自动 approved:
xxxxxxxxxx$ kubectl get csrNAME AGE REQUESTOR CONDITIONcsr-2vgp7 4m50s system:bootstrap:fq9jp8 Approved,Issuedcsr-4zj8q 86m system:bootstrap:8h60d8 Approved,Issuedcsr-5jhxc 42m system:node:k8s-node-01 Pendingcsr-6z6qb 3h22m system:bootstrap:8h60d8 Approved,Issuedcsr-7bjfm 140m system:bootstrap:8h60d8 Approved,Issuedcsr-8fs2t 3h7m system:bootstrap:8h60d8 Approved,Issuedcsr-b9ktj 43m system:bootstrap:8h60d8 Approved,Issuedcsr-dtqn8 172m system:bootstrap:8h60d8 Approved,Issuedcsr-dzm25 81m system:bootstrap:8h60d8 Approved,Issuedcsr-fqfhx 5m17s system:bootstrap:3xqvk4 Approved,Issuedcsr-g2fzj 3m11s system:node:k8s-node-02 Pendingcsr-gkbj9 3m5s system:node:k8s-node-03 PendingPending 的 CSR 用于创建 kubelet server 证书,需要手动 approve,参考后文。 所有节点均 ready:
xxxxxxxxxx$ kubectl get nodesNAME STATUS ROLES AGE VERSIONk8s-node-01 Ready <none> 44m v1.16.2k8s-node-02 Ready <none> 4m53s v1.16.2k8s-node-03 Ready <none> 4m46s v1.16.2kube-controller-manager 为各 node 生成了 kubeconfig 文件和公私钥:
xxxxxxxxxx$ ll /app/etc/kubelet.kubeconfig-rw------- 1 root root 2286 Dec 17 18:36 /app/etc/kubelet.kubeconfig$ ll /app/cert/kubelet*-rw------- 1 root root 1273 Dec 17 18:37 /app/cert/kubelet-client-2019-12-17-18-37-25.pemlrwxrwxrwx 1 root root 48 Dec 17 18:37 /app/cert/kubelet-client-current.pem -> /app/cert/kubelet-client-2019-12-17-18-37-25.pem没有自动生成 kubelet server 证书;
基于安全性考虑,CSR approving controllers 不会自动 approve kubelet server 证书签名请求,需要手动 approve:
xxxxxxxxxx$ kubectl get csrNAME AGE REQUESTOR CONDITIONcsr-2vgp7 9m49s system:bootstrap:fq9jp8 Approved,Issuedcsr-4l55c 2m40s system:node:k8s-node-01 Pendingcsr-4zj8q 91m system:bootstrap:8h60d8 Approved,Issuedcsr-5jhxc 47m system:node:k8s-node-01 Pendingcsr-6z6qb 3h27m system:bootstrap:8h60d8 Approved,Issuedcsr-7bjfm 145m system:bootstrap:8h60d8 Approved,Issuedcsr-8fs2t 3h12m system:bootstrap:8h60d8 Approved,Issuedcsr-b9ktj 48m system:bootstrap:8h60d8 Approved,Issuedcsr-dtqn8 177m system:bootstrap:8h60d8 Approved,Issuedcsr-dzm25 86m system:bootstrap:8h60d8 Approved,Issuedcsr-fqfhx 10m system:bootstrap:3xqvk4 Approved,Issuedcsr-g2fzj 8m10s system:node:k8s-node-02 Pendingcsr-gkbj9 8m4s system:node:k8s-node-03 Pending$ kubectl certificate approve csr-4l55ccertificatesigningrequest.certificates.k8s.io/csr-4l55c approved$ kubectl certificate approve csr-g2fzjcertificatesigningrequest.certificates.k8s.io/csr-g2fzj approved$ kubectl certificate approve csr-gkbj9certificatesigningrequest.certificates.k8s.io/csr-gkbj9 approved$ ll /app/cert/kubelet*-rw------- 1 root root 1273 Dec 17 17:58 /app/cert/kubelet-client-2019-12-17-17-58-59.pemlrwxrwxrwx 1 root root 48 Dec 17 17:58 /app/cert/kubelet-client-current.pem -> /app/cert/kubelet-client-2019-12-17-17-58-59.pem-rw------- 1 root root 1317 Dec 17 18:52 /app/cert/kubelet-server-2019-12-17-18-52-46.pemlrwxrwxrwx 1 root root 48 Dec 17 18:52 /app/cert/kubelet-server-current.pem -> /app/cert/kubelet-server-2019-12-17-18-52-46.pemkubelet 启动后监听多个端口,用于接收 kube-apiserver 或其它客户端发送的请求:
xxxxxxxxxx$ netstat -ntulp | grep kubelettcp 0 0 192.168.209.121:10248 0.0.0.0:* LISTEN 30110/kubelet tcp 0 0 127.0.0.1:34056 0.0.0.0:* LISTEN 30110/kubelet tcp 0 0 192.168.209.121:10250 0.0.0.0:* LISTEN 30110/kubelet
- 10248: healthz http 服务;
- 10250: https 服务,访问该端口时需要认证和授权(即使访问 /healthz 也需要);
- 未开启只读端口 10255;
- 从 K8S v1.10 开始,去除了
--cadvisor-port参数(默认 4194 端口),不支持访问 cAdvisor UI & API。
kubelet 接收 10250 端口的 https 请求,可以访问如下资源:
详情参考:https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/server/server.go#L434:3
由于关闭了匿名认证,同时开启了 webhook 授权,所有访问 10250 端口 https API 的请求都需要被认证和授权。
预定义的 ClusterRole system:kubelet-api-admin 授予访问 kubelet 所有 API 的权限(kube-apiserver 使用的 kubernetes 证书 User 授予了该权限):
xxxxxxxxxx$ kubectl describe clusterrole system:kubelet-api-adminName: system:kubelet-api-adminLabels: kubernetes.io/bootstrapping=rbac-defaultsAnnotations: rbac.authorization.kubernetes.io/autoupdate: truePolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- nodes/log [] [] [*] nodes/metrics [] [] [*] nodes/proxy [] [] [*] nodes/spec [] [] [*] nodes/stats [] [] [*] nodes [] [] [get list watch proxy]kubelet 配置了如下认证参数:
- authentication.anonymous.enabled:设置为 false,不允许匿名访问 10250 端口;
- authentication.x509.clientCAFile:指定签名客户端证书的 CA 证书,开启 HTTPs 证书认证;
- authentication.webhook.enabled=true:开启 HTTPs bearer token 认证;
同时配置了如下授权参数:
- authroization.mode=Webhook:开启 RBAC 授权;
kubelet 收到请求后,使用 clientCAFile 对证书签名进行认证,或者查询 bearer token 是否有效。如果两者都没通过,则拒绝请求,提示 Unauthorized:
xxxxxxxxxx$ curl -s --cacert /app/cert/ca.pem https://192.168.209.121:10250/metricsUnauthorized$ curl -s --cacert /app/cert/ca.pem -H "Authorization: Bearer 123456" https://192.168.209.121:10250/metricsUnauthorized通过认证后,kubelet 使用 SubjectAccessReview API 向 kube-apiserver 发送请求,查询证书或 token 对应的 user、group 是否有操作资源的权限(RBAC);
xxxxxxxxxx$ # 权限不足的证书$ curl -s --cacert /app/cert/ca.pem --cert /app/cert/kube-controller-manager.pem --key /app/cert/kube-controller-manager-key.pem https://192.168.209.121:10250/metricsForbidden (user=system:kube-controller-manager, verb=get, resource=nodes, subresource=metrics)$ # 使用部署 kubectl 命令行工具时创建的、具有最高权限的 admin 证书;$ curl -s --cacert /app/cert/ca.pem --cert /app/cert/admin.pem --key /app/cert/admin-key.pem https://192.168.209.121:10250/metrics | head# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.# TYPE apiserver_audit_event_total counterapiserver_audit_event_total 0# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.# TYPE apiserver_audit_requests_rejected_total counterapiserver_audit_requests_rejected_total 0# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.# TYPE apiserver_client_certificate_expiration_seconds histogramapiserver_client_certificate_expiration_seconds_bucket{le="0"} 0apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
--cacert、--cert、--key的参数值必须是文件路径,如上面的./admin.pem不能省略./,否则返回401 Unauthorized;
xxxxxxxxxx$ kubectl create sa kubelet-api-testserviceaccount/kubelet-api-test created$ kubectl create clusterrolebinding kubelet-api-test --clusterrole=system:kubelet-api-admin --serviceaccount=default:kubelet-api-testclusterrolebinding.rbac.authorization.k8s.io/kubelet-api-test created$ SECRET=$(kubectl get secrets | grep kubelet-api-test | awk '{print $1}')$ TOKEN=$(kubectl describe secret ${SECRET} | grep -E '^token' | awk '{print $2}')$ echo ${TOKEN}xxxxxxxxxx$ curl -s --cacert /app/cert/ca.pem -H "Authorization: Bearer ${TOKEN}" https://192.168.209.121:10250/metrics | head# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.# TYPE apiserver_audit_event_total counterapiserver_audit_event_total 0# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.# TYPE apiserver_audit_requests_rejected_total counterapiserver_audit_requests_rejected_total 0# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.# TYPE apiserver_client_certificate_expiration_seconds histogramapiserver_client_certificate_expiration_seconds_bucket{le="0"} 0apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0$ curl -s --cacert /app/cert/ca.pem -H "Authorization: Bearer ${TOKEN}" https://192.168.209.122:10250/metrics | head$ curl -s --cacert /app/cert/ca.pem -H "Authorization: Bearer ${TOKEN}" https://192.168.209.123:10250/metrics | headcadvisor 是内嵌在 kubelet 二进制中的,统计所在节点各容器的资源(CPU、内存、磁盘、网卡)使用情况的服务。
浏览器访问 https://192.168.209.121:10250/metrics 和 https://192.168.209.121:10250/metrics/cadvisor 分别返回 kubelet 和 cadvisor 的 metrics。


- kubelet.config.json 设置 authentication.anonymous.enabled 为 false,不允许匿名证书访问 10250 的 https 服务;
- 参考A.浏览器访问kube-apiserver安全端口.md,创建和导入相关证书,然后访问上面的 10250 端口;
从 kube-apiserver 获取各节点 kubelet 的配置:
xxxxxxxxxx$ #master节点增加node节点hostscat >> /etc/hosts <<EOF192.168.209.121 k8s-node-01192.168.209.122 k8s-node-02192.168.209.123 k8s-node-03EOF
xxxxxxxxxx$ export kube_apiserver="https://192.168.209.100:8443"$ curl -sSL --cacert /app/cert/ca.pem --cert /app/cert/admin.pem --key /app/cert/admin-key.pem ${kube_apiserver}/api/v1/nodes/k8s-node-01/proxy/configz | jq \ '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"' { "syncFrequency": "1m0s", "fileCheckFrequency": "20s", "httpCheckFrequency": "20s", "address": "192.168.209.121", "port": 10250, "rotateCertificates": true, "serverTLSBootstrap": true, "authentication": { "x509": { "clientCAFile": "/app/cert/ca.pem" }, "webhook": { "enabled": true, "cacheTTL": "2m0s" }, "anonymous": { "enabled": false } }, "authorization": { "mode": "Webhook", "webhook": { "cacheAuthorizedTTL": "5m0s", "cacheUnauthorizedTTL": "30s" } }, "registryPullQPS": 0, "registryBurst": 20, "eventRecordQPS": 0, "eventBurst": 20, "enableDebuggingHandlers": true, "enableContentionProfiling": true, "healthzPort": 10248, "healthzBindAddress": "192.168.209.121", "oomScoreAdj": -999, "clusterDomain": "cluster.local", "clusterDNS": [ "10.254.0.254" ], "streamingConnectionIdleTimeout": "4h0m0s", "nodeStatusUpdateFrequency": "10s", "nodeStatusReportFrequency": "1m0s", "nodeLeaseDurationSeconds": 40, "imageMinimumGCAge": "2m0s", "imageGCHighThresholdPercent": 85, "imageGCLowThresholdPercent": 80, "volumeStatsAggPeriod": "1m0s", "cgroupsPerQOS": true, "cgroupDriver": "cgroupfs", "cpuManagerPolicy": "none", "cpuManagerReconcilePeriod": "10s", "topologyManagerPolicy": "none", "runtimeRequestTimeout": "10m0s", "hairpinMode": "promiscuous-bridge", "maxPods": 220, "podCIDR": "172.30.0.0/16", "podPidsLimit": -1, "resolvConf": "/etc/resolv.conf", "cpuCFSQuota": true, "cpuCFSQuotaPeriod": "100ms", "maxOpenFiles": 1000000, "contentType": "application/vnd.kubernetes.protobuf", "kubeAPIQPS": 1000, "kubeAPIBurst": 2000, "serializeImagePulls": false, "evictionHard": { "imagefs.available": "15%", "memory.available": "100Mi", "nodefs.available": "10%", "nodefs.inodesFree": "5%" }, "evictionPressureTransitionPeriod": "5m0s", "enableControllerAttachDetach": true, "makeIPTablesUtilChains": true, "iptablesMasqueradeBit": 14, "iptablesDropBit": 15, "failSwapOn": true, "containerLogMaxSize": "20Mi", "containerLogMaxFiles": 10, "configMapAndSecretChangeDetectionStrategy": "Watch", "enforceNodeAllocatable": [ "pods" ], "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1"} kube-proxy 运行在所有 worker 节点上,它监听 apiserver 中 service 和 endpoint 的变化情况,创建路由规则以提供服务 IP 和负载均衡功能。
本文档讲解使用 ipvs 模式的 kube-proxy 的部署过程。
xxxxxxxxxxcat > /app/cert/kube-proxy-csr.json <<EOF{ "CN": "system:kube-proxy", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "BeiJing", "L": "BeiJing", "O": "k8s", "OU": "System" } ]}EOF
- CN:指定该证书的 User 为
system:kube-proxy;- 预定义的 RoleBinding
system:node-proxier将Usersystem:kube-proxy与 Rolesystem:node-proxier绑定,该 Role 授予了调用kube-apiserverProxy 相关 API 的权限;- 该证书只会被 kube-proxy 当做 client 证书使用,所以 hosts 字段为空;
生成证书和私钥:
xxxxxxxxxx$ cfssl gencert -ca=/app/cert/ca.pem \ -ca-key=/app/cert//ca-key.pem \ -config=/app/cert/ca-config.json \ -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy $ ll kube-proxy*-rw-r--r-- 1 root root 1009 Dec 19 16:57 kube-proxy.csr-rw-r--r-- 1 root root 215 Dec 19 16:57 kube-proxy-csr.json-rw------- 1 root root 1675 Dec 19 16:57 kube-proxy-key.pem-rw-r--r-- 1 root root 1403 Dec 19 16:57 kube-proxy.pemxxxxxxxxxx$ export kube_apiserver="https://192.168.209.100:8443"cd /app/etckubectl config set-cluster kubernetes \ --certificate-authority=/app/cert/ca.pem \ --embed-certs=true \ --server=${kube_apiserver} \ --kubeconfig=kube-proxy.kubeconfigkubectl config set-credentials kube-proxy \ --client-certificate=/app/cert/kube-proxy.pem \ --client-key=/app/cert/kube-proxy-key.pem \ --embed-certs=true \ --kubeconfig=kube-proxy.kubeconfigkubectl config set-context default \ --cluster=kubernetes \ --user=kube-proxy \ --kubeconfig=kube-proxy.kubeconfigkubectl config use-context default --kubeconfig=kube-proxy.kubeconfig
--embed-certs=true:将 ca.pem 和 admin.pem 证书内容嵌入到生成的 kubectl-proxy.kubeconfig 文件中(不加时,写入的是证书文件路径)
分发 kubeconfig 文件:
xxxxxxxxxxexport node_ips=(192.168.209.122 192.168.209.123)for node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" scp /app/etc/kube-proxy.kubeconfig root@${node_ip}:/app/etc done从 v1.10 开始,kube-proxy 部分参数可以配置文件中配置。可以使用 --write-config-to 选项生成该配置文件,或者参考 源代码的注释。
创建 kube-proxy config 文件模板:
xxxxxxxxxxexport cluster_cidr="172.30.0.0/16"cd /app/etccat > /app/etc/kube-proxy-config.yaml.template <<EOFkind: KubeProxyConfigurationapiVersion: kubeproxy.config.k8s.io/v1alpha1clientConnection: burst: 200 kubeconfig: "/app/etc/kube-proxy.kubeconfig" qps: 100bindAddress: ##node_ip##healthzBindAddress: ##node_ip##:10256metricsBindAddress: ##node_ip##:10249enableProfiling: trueclusterCIDR: ${cluster_cidr}mode: "ipvs"portRange: ""kubeProxyIPTablesConfiguration: masqueradeAll: falsekubeProxyIPVSConfiguration: scheduler: rr excludeCIDRs: []EOF
bindAddress: 监听地址;clientConnection.kubeconfig: 连接 apiserver 的 kubeconfig 文件;clusterCIDR: kube-proxy 根据--cluster-cidr判断集群内部和外部流量,指定--cluster-cidr或--masquerade-all选项后 kube-proxy 才会对访问 Service IP 的请求做 SNAT;hostnameOverride: 参数值必须与 kubelet 的值一致,否则 kube-proxy 启动后会找不到该 Node,从而不会创建任何 ipvs 规则;mode: 使用 ipvs 模式;
为各节点创建和分发 kube-proxy 配置文件:
xxxxxxxxxxexport node_ips=(192.168.209.121 192.168.209.122 192.168.209.123)for node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" sed "s/##node_ip##/${node_ip}/" /app/etc/kube-proxy-config.yaml.template > /app/etc/kube-proxy-config-${node_ip}.yaml.template scp kube-proxy-config-${node_ip}.yaml.template root@${node_ip}:/app/etc/kube-proxy-config.yaml donexxxxxxxxxxexport k8s_dir="/app/data"cat > /app/etc/kube-proxy.service <<EOF[Unit]Description=Kubernetes Kube-Proxy ServerDocumentation=https://github.com/GoogleCloudPlatform/kubernetesAfter=network.target[Service]WorkingDirectory=${k8s_dir}/kube-proxyExecStart=/app/bin/kube-proxy \\ --config=/app/etc/kube-proxy-config.yaml \\ --logtostderr=true \\ --v=2Restart=on-failureRestartSec=5LimitNOFILE=65536[Install]WantedBy=multi-user.targetEOF分发 kube-proxy systemd unit 文件:
xxxxxxxxxxexport node_ips=(192.168.209.121 192.168.209.122 192.168.209.123)for node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" scp /app/etc/kube-proxy.service root@${node_ip}:/usr/lib/systemd/system donexxxxxxxxxxfor node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "mkdir -p /app/data/kube-proxy" ssh root@${node_ip} "modprobe ip_vs_rr" ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-proxy && systemctl restart kube-proxy" done检查启动结果
xxxxxxxxxxfor node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "systemctl status kube-proxy|grep Active && netstat -ntulp | grep kube-proxy" donetcp 0 0 192.168.209.121:10256 0.0.0.0:* LISTEN 110375/kube-proxy tcp 0 0 192.168.209.121:10249 0.0.0.0:* LISTEN 110375/kube-proxy 10249:http prometheus metrics port; 10256:http healthz port;
查看 ipvs 路由规则
xxxxxxxxxxfor node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "/usr/sbin/ipvsadm -ln" done>>> 192.168.209.121IP Virtual Server version 1.2.1 (size=4096)Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConnTCP 10.254.0.1:443 rr -> 192.168.209.101:6443 Masq 1 0 0 -> 192.168.209.102:6443 Masq 1 0 0 -> 192.168.209.103:6443 Masq 1 0 0 >>> 192.168.209.122IP Virtual Server version 1.2.1 (size=4096)Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConnTCP 10.254.0.1:443 rr -> 192.168.209.101:6443 Masq 1 0 0 -> 192.168.209.102:6443 Masq 1 0 0 -> 192.168.209.103:6443 Masq 1 0 0 >>> 192.168.209.123IP Virtual Server version 1.2.1 (size=4096)Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConnTCP 10.254.0.1:443 rr -> 192.168.209.101:6443 Masq 1 0 0 -> 192.168.209.102:6443 Masq 1 0 0 -> 192.168.209.103:6443 Masq 1 0 0 可见所有通过 https 访问 K8S SVC kubernetes 的请求都转发到 kube-apiserver 节点的 6443 端口。
使用 daemonset 验证 master 和 worker 节点是否工作正常。
xxxxxxxxxx$ kubectl get nodesNAME STATUS ROLES AGE VERSIONk8s-node-01 Ready <none> 2d v1.16.2k8s-node-02 Ready <none> 47h v1.16.2k8s-node-03 Ready <none> 47h v1.16.2都为 Ready 时正常。
xxxxxxxxxxmkdir /app/ymlcd /app/ymlcat > /app/yml/nginx-ds.yml << EOFapiVersion: v1kind: Servicemetadata: name: nginx-ds labels: app: nginx-dsspec: type: NodePort selector: app: nginx-ds ports: - name: http port: 80 targetPort: 80---apiVersion: apps/v1kind: DaemonSetmetadata: name: nginx-ds labels: addonmanager.kubernetes.io/mode: Reconcilespec: selector: matchLabels: app: nginx-ds template: metadata: labels: app: nginx-ds spec: containers: - name: my-nginx image: nginx:1.16.1 ports: - containerPort: 80EOFxxxxxxxxxxkubectl create -f nginx-ds.ymlxxxxxxxxxx$ kubectl get pods -o wide | grep nginx-dsnginx-ds-nd4pg 1/1 Running 0 12m 172.30.65.2 k8s-node-02 <none> <none>nginx-ds-xsc2g 1/1 Running 0 12m 172.30.94.2 k8s-node-01 <none> <none>nginx-ds-zvlws 1/1 Running 0 12m 172.30.31.2 k8s-node-03 <none> <none>在所有 Node 上分别 ping 这三个 IP,看是否连通:
xxxxxxxxxxexport node_ips=(192.168.209.121 192.168.209.122 192.168.209.123)for node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "ping -c 1 172.30.65.2" ssh root@${node_ip} "ping -c 1 172.30.94.2" ssh root@${node_ip} "ping -c 1 172.30.31.2" donexxxxxxxxxx$ kubectl get svc -o wide | grep nginx-dsnginx-ds NodePort 10.254.0.192 <none> 80:32000/TCP 40m app=nginx-ds
- Service Cluster IP:10.254.0.192
- 服务端口:80
- NodePort 端口:32000
在所有 Node 上 curl Service IP:
xxxxxxxxxxfor node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "curl -s 10.254.0.192" done预期输出 nginx 欢迎页面内容。
在所有 Node 上执行
xxxxxxxxxxfor node_ip in ${node_ips[@]} do echo ">>> ${node_ip}" ssh root@${node_ip} "curl -s ${node_ip}:32000" done预期输出 nginx 欢迎页面内容。
kuberntes 自带插件的 manifests yaml 文件使用 gcr.io 的 docker registry,国内被墙,需要手动替换为其它 registry 地址。
可以从微软中国提供的 gcr.io 免费代理下载被墙的镜像;
将下载的 kubernetes-server-linux-amd64.tar.gz 解压后,再解压其中的 kubernetes-src.tar.gz 文件。
xxxxxxxxxxcd /app/opt/kubernetestar -xzf kubernetes-src.tar.gz# coredns 目录是 cluster/addons/dns/corednscd /app/opt/kubernetes/cluster/addons/dns/corednscp coredns.yaml.base coredns.yaml
浏览器访问 kube-apiserver 的安全端口 6443 时,提示证书不被信任:

这是因为 kube-apiserver 的 server 证书是我们创建的根证书 ca.pem 签名的,需要将根证书 ca.pem 导入操作系统,并设置永久信任。
对于 windows 系统使用以下命令导入 ca.perm:
xxxxxxxxxxkeytool -import -v -trustcacerts -alias appmanagement -file "F:\TestEnv\k8s-cert\ca.pem" -storepass password -keystore cacertshttps://blog.csdn.net/caoshiying/article/details/78668076

再次访问 apiserver 地址,已信任,但提示 401,未授权的访问:

需要给浏览器生成一个 client 证书,访问 apiserver 的 https 端口时使用。
这里使用部署 kubectl 命令行工具时创建的 admin 证书、私钥和上面的 ca 证书,创建一个浏览器可以使用 PKCS#12/PFX 格式的证书:(master节点执行)
xxxxxxxxxx$ openssl pkcs12 -export -out admin.pfx -inkey admin-key.pem -in admin.pem -certfile ca.pem将创建的 admin.pfx 导入到系统的证书中。

重启浏览器,再次访问 apiserver 地址,提示选择一个浏览器证书,这里选中上面导入 的 admin.pfx:

被授权访问 kube-apiserver 的安全端口:
