1、概述

1.1 介绍

在分布式架构、微服务以及k8s生态相关技术环境下,对应用的请求链路进行追踪(也叫做APMApplication Performance Management)是非常有必要的,链路追踪简单来说就是将应用从流量到达前端开始,一直到最后端的数据库核心,中间经过的每一层请求链路的完整行为都记录下来,而且通过可视化的形式实现链路信息查询、依赖关系、性能分析、拓扑展示等等,利用链路追踪系统可以很好的帮我们定位问题,这是常规监控手段实现起来比较困难的

常用的链路追踪系统有商业版本和开源版本,比较出名(我了解过的)的有如下:

  • 商业版本
    • 听云
    • 博睿宏远
  • 开源版本
    • Skywalking:中国,个人开源,目前隶属于Apache基金会,作者近期刚刚入选Apache首位中国董事
    • Pinpoint:韩国,个人开源
    • Zipkin:美国,Twitter公司开源
    • Cat:中国,美团开源

具体每一款链路追踪系统的的详细信息可以在网上找到,其中商业版本这里不做评价

开源版本中后两款对业务代码有侵入性,前两款的对比可以参考下图

图片地址:https://skywalking.apache.org/zh/2019-02-24-skywalking-pk-pinpoint/0081Kckwly1gkl4kjo1okj30in0q3gnb.jpg

1.2 组件

本文采用的是SkyWalking,简单来说分为以下几个组成部分(以本文中的部署方式划分)

  • skywalking-oap-server:后端服务
  • skywalking-ui:ui前端
  • skywalking-es-init:初始化es集群数据使用
  • elasticsearch:存储skywalking的数据指标

2、基础准备

2.1 准备helm环境

helm3版本只需要一个二进制包即可,我这里的版本如下

# helm version
version.BuildInfo{Version:"v3.5.2", GitCommit:"167aac70832d3a384f65f9745335e9fb40169dc2", GitTreeState:"dirty", GoVersion:"go1.15.7"}

2.2 创建单独的ns

skywalking部署在单独的命名空间下

# kubectl create ns monitoring
namespace/monitoring created

2.3 创建secret

这里记录的是在内网环境下部署的skywalking,本地电脑为helm部署客户端可以访问外网,k8s集群无外网。因此需要将skywalking用到的镜像全部由内网环境私有镜像仓库提供

2.3.1 拉取镜像的secret

# kubectl create secret docker-registry registry-pull-secret --docker-username=admin --docker-password=123456 --docker-email=admin@admin.com --docker-server=hub.ssgeek.com -n monitoring
secret/registry-pull-secret created

2.3.2 用于https安全访问的secret

可选步骤,我的集群中有cert-manager自动颁发证书,提供给skywalking uiingress使用,对应需要修改后面的chart包相关部分

# cat certificate.yaml
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: skywalking
  namespace: monitoring
spec:
  secretName: skywalking
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  duration: 2160h
  renewBefore: 360h
  keyEncoding: pkcs1
  dnsNames:
  - skywalking.ssgeek.com
# kubectl apply -f certificate.yaml
certificate.cert-manager.io/skywalking created
# kubectl get certificate,secret -n monitoring|grep skywalking
certificate.cert-manager.io/skywalking   True    skywalking   2m50s
secret/skywalking            kubernetes.io/tls                     3      2m49s

2.3.3 用于skywalking ui访问控制的secret

skywalkingui界面默认没有访问控制,可以通过下面基于Nginx Ingressbasic auth方案,也可以使用我之前文章中记录的基于k8s Ingress Nginx+OAuth2+Gitlab无代码侵入实现自定义服务的外部验证

画重点:这里使用basic有个小坑,参考官方文档经过测试,在创建secret之前通过htpasswd工具生成的记录用户名密码的文件的文件名,必须叫auth,不然经过后续的一顿操作,最终访问的结果还是503,这与传统方式配置nginxbasic auth是不同的,可能在源码中将此参数硬编码了,具体原因没有深究

# htpasswd -c auth skywalking
New password: 
Re-type new password: 
Adding password for user skywalking
# kubectl -n monitoring create secret generic ui-auth --from-file=auth
secret/ui-auth created

2.4 私有仓库镜像存储

把部署涉及到的相关镜像存储到内部仓库,部署的是目前最新版本的skywalking

apache/skywalking-ui:8.4.0
hub.ssgeek.com/skywalking/skywalking-ui:8.4.0

apache/skywalking-oap-server:8.4.0-es7
hub.ssgeek.com/skywalking/skywalking-oap-server:8.4.0-es7

busybox:1.30
hub.ssgeek.com/skywalking/busybox:1.30

docker.elastic.co/elasticsearch/elasticsearch:7.5.2
hub.ssgeek.com/skywalking/elasticsearch:7.5.2

3、获取chart并更新依赖和value相关参数

获取官方最新的chart,并更新chart依赖,更新依赖会自动下载一个子chart包,也就是elasticsearch的官方chart,下载的包不用解压更改,所有参数都通过父chartvalue.yaml全局指定

# git clone https://github.com/apache/skywalking-kubernetes.git
# cd skywalking-kubernetes/chart
# helm dep up skywalking
Hang tight while we grab the latest from your chart repositories...
Update Complete. ⎈Happy Helming!⎈
Saving 1 charts
Downloading elasticsearch from repo https://helm.elastic.co/
Deleting outdated charts

修改value.yaml,下面的内容中只列出了我修改后的部分内容,其中关于elasticsearch还有很多参数及优化可供配置,这里仅使用精简配置,更多内容可以参考官方的说明

...
imagePullSecrets:
  - name: registry-pull-secret

initContainer:
  image: hub.ssgeek.com/skywalking/busybox
  tag: '1.30'

oap:
  name: oap
  # When 'dynamicConfigEnabled' set to true, enable oap dynamic configuration through k8s configmap,
  # Note: The default configmap data is empty, please refer to the detailed documentation (https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/dynamic-config.md)
  # Sync period in seconds. Defaults to 60 seconds. env: SW_CONFIG_CONFIGMAP_PERIOD
  dynamicConfigEnabled: false
  image:
    repository: hub.ssgeek.com/skywalking/skywalking-oap-server
    tag: 8.4.0-es7  # Must be set explicitly
    pullPolicy: IfNotPresent
  storageType: elasticsearch7 # 存储类型为es7
...
  tolerations: []
  resources:
     limits:
       cpu: 2
       memory: 4Gi
     requests:
       cpu: 1
       memory: 1Gi
...
  env:
    # more env, please refer to https://hub.docker.com/r/apache/skywalking-oap-server
    # or https://github.com/apache/skywalking-docker/blob/master/6/6.4/oap/README.md#sw_telemetry
    SW_NAMESPACE: "skywalking" # 指定es索引前缀为skywalking_, 其中下划线_会自动加上
...
ui:
  name: ui
  replicas: 1
  image:
    repository: hub.ssgeek.com/skywalking/skywalking-ui
    tag: 8.4.0  # Must be set explicitly
    pullPolicy: IfNotPresent
  # podAnnotations:
  #   example: oap-foo
  nodeAffinity: {}
  nodeSelector: {}
  tolerations: []
  ingress:
    enabled: true
    annotations:
       kubernetes.io/ingress.class: nginx
       # 指定basic auth相关注解
       nginx.ingress.kubernetes.io/auth-type: basic
       nginx.ingress.kubernetes.io/auth-secret: ui-auth
       nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
    path: /
    hosts:
     - skywalking.ssgeek.com
    tls:
      - secretName: skywalking
        hosts:
          - skywalking.ssgeek.com
...
elasticsearch:
  enabled: true
  config:               # For users of an existing elasticsearch cluster,takes effect when `elasticsearch.enabled` is false
    port:
      http: 9200
#    host: elasticsearch # es service on kubernetes or host
    host: elasticsearch-logging.logging.svc
    user: "elastic"         # [optional]
    password: "elastic"     # [optional]
  clusterName: "elasticsearch"
  nodeGroup: "logging"

  # The service that non master groups will try to connect to when joining the cluster
  # This should be set to clusterName + "-" + nodeGroup for your master group
  masterService: "elasticsearch-logging"
...
  image: "hub.ssgeek.com/skywalking/elasticsearch"
  imageTag: "7.5.2"
  imagePullPolicy: "IfNotPresent"
...
  resources:
    requests:
      cpu: "100m"
      memory: "1Gi"
    limits:
      cpu: "1000m"
      memory: "2Gi"
...
  volumeClaimTemplate:
    accessModes: [ "ReadWriteOnce" ]
    storageClassName: "ceph-rbd"
    resources:
      requests:
        storage: 30Gi
...
  persistence:
    enabled: true
    annotations: {}
...
  imagePullSecrets:
    - name: registry-pull-secret

4、helm安装skywalking

前面的准备工作都做完后,就可以开始通过helm一键部署skywalking

# helm install skywalking skywalking -n monitoring --values ./skywalking/values.yaml
NAME: skywalking
LAST DEPLOYED: Thu Mar 18 18:45:03 2021
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
************************************************************************
*                                                                      *
*                 SkyWalking Helm Chart by SkyWalking Team             *
*                                                                      *
************************************************************************

Thank you for installing skywalking.

Your release is named skywalking.

Learn more, please visit https://skywalking.apache.org/

Get the UI URL by running these commands:
  https://skywalking.ssgeek.com/

5、检查

观察pod日志,直到出现create instance_jvm_thread_peak_count index template finished

2021-03-18 10:48:32,242 - org.apache.skywalking.oap.server.core.storage.model.ModelInstaller -139765 [main] INFO  [] - table: instance_jvm_thread_peak_count does not exist
2021-03-18 10:48:32,243 - org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.StorageEsInstaller -139766 [main] INFO  [] - index skywalking_instance_jvm_thread_peak_count's columnTypeEsMapping builder str: {properties={service_id={type=keyword}, count={index=false, type=long}, time_bucket={type=long}, entity_id={type=keyword}, value={type=long}, summation={index=false, type=long}}}
2021-03-18 10:48:32,614 - org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.StorageEsInstaller -140137 [main] INFO  [] - create instance_jvm_thread_peak_count index template finished, isAcknowledged: true
2021-03-18 10:48:33,319 - org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.StorageEsInstaller -140842 [main] INFO  [] - create instance_jvm_thread_peak_count-20210318 index finished, isAcknowledged: true
......
2021-03-18 10:48:33,583 - org.eclipse.jetty.server.handler.ContextHandler -141106 [main] INFO  [] - Started o.e.j.s.ServletContextHandler@12e4822b{/,null,AVAILABLE}
2021-03-18 10:48:33,597 - org.eclipse.jetty.server.AbstractConnector -141120 [main] INFO  [] - Started ServerConnector@5cc9d3d0{HTTP/1.1, (http/1.1)}{0.0.0.0:12800}
2021-03-18 10:48:33,597 - org.eclipse.jetty.server.Server -141120 [main] INFO  [] - Started @141185ms
2021-03-18 10:48:33,599 - org.apache.skywalking.oap.server.core.storage.PersistenceTimer -141122 [main] INFO  [] - persistence timer start
2021-03-18 10:48:33,603 - org.apache.skywalking.oap.server.core.cache.CacheUpdateTimer -141126 [main] INFO  [] - Cache updateServiceInventory timer start
2021-03-18 10:48:41,499 - org.apache.skywalking.oap.server.starter.OAPServerBootstrap -149022 [main] INFO  [] - OAP starts up in init mode successfully, exit now...

查看pod状态

# kubectl -n monitoring get pods                         
NAME                              READY   STATUS      RESTARTS   AGE
elasticsearch-logging-0           1/1     Running     0          5m54s
elasticsearch-logging-1           1/1     Running     0          5m53s
elasticsearch-logging-2           1/1     Running     0          5m53s
skywalking-es-init-t7ndj          0/1     Completed   0          5m54s
skywalking-oap-57d7f454f5-8gbh5   1/1     Running     2          5m54s
skywalking-oap-57d7f454f5-vqh2d   1/1     Running     2          5m54s
skywalking-ui-698cdb4dbc-xxktt    1/1     Running     0          5m54s

访问web ui,通过界面访问并输入basic auth设置的用户名和密码后,成功访问到skywalking的主界面

到这里,基于k8s+helm在内网环境下部署的skywalking服务端就结束了,如果是完全没有内网的环境,可以把前面修改完成后的chart包打包上传到私有helm仓库例如harbor中,这样chart+image都是内网,部署时就完全不需要外网了

后面会继续实践后并分享采集端的接入以及具体使用,欢迎催更~ ☺

更多参考

  • https://github.com/apache/skywalking
  • https://github.com/apache/skywalking-kubernetes