一篇文章教你搞懂日志采集利器 Filebeat
本文使用的Filebeat是7.7.0的版本,文章將從如下幾個(gè)方面說(shuō)明:
curl-L-Ohttps://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.7.0-linux-x86_64.tar.gz
tar -xzvf filebeat-7.7.0-linux-x86_64.tar.gz
export #導出
run #執行(默認執行)
test #測試配置
keystore #秘鑰存儲
modules #模塊配置管理
setup #設置初始環(huán)境
output.elasticsearch.password:"${ES_PWD}"
type: log #input類(lèi)型為log
enable: true #表示是該log類(lèi)型配置生效
paths: #指定要監控的日志,目前按照Go語(yǔ)言的glob函數處理。沒(méi)有對配置目錄做遞歸處理,比如配置的如果是:
- /var/log/* /*.log #則只會(huì )去/var/log目錄的所有子目錄中尋找以".log"結尾的文件,而不會(huì )尋找/var/log目錄下以".log"結尾的文件。
recursive_glob.enabled: #啟用全局遞歸模式,例如/foo/**包括/foo, /foo/*, /foo/*/*
encoding:#指定被監控的文件的編碼類(lèi)型,使用plain和utf-8都是可以處理中文日志的
exclude_lines: ['^DBG'] #不包含匹配正則的行
include_lines: ['^ERR', '^WARN'] #包含匹配正則的行
harvester_buffer_size: 16384 #每個(gè)harvester在獲取文件時(shí)使用的緩沖區的字節大小
max_bytes: 10485760 #單個(gè)日志消息可以擁有的最大字節數。max_bytes之后的所有字節都被丟棄而不發(fā)送。默認值為10MB (10485760)
exclude_files: ['\.gz$'] #用于匹配希望Filebeat忽略的文件的正則表達式列表
ingore_older: 0 #默認為0,表示禁用,可以配置2h,2m等,注意ignore_older必須大于close_inactive的值.表示忽略超過(guò)設置值未更新的
文件或者文件從來(lái)沒(méi)有被harvester收集
close_* #close_ *配置選項用于在特定標準或時(shí)間之后關(guān)閉harvester。 關(guān)閉harvester意味著(zhù)關(guān)閉文件處理程序。 如果在harvester關(guān)閉
后文件被更新,則在scan_frequency過(guò)后,文件將被重新拾取。 但是,如果在harvester關(guān)閉時(shí)移動(dòng)或刪除文件,Filebeat將無(wú)法再次接收文件
,并且harvester未讀取的任何數據都將丟失。
close_inactive #啟動(dòng)選項時(shí),如果在制定時(shí)間沒(méi)有被讀取,將關(guān)閉文件句柄
讀取的最后一條日志定義為下一次讀取的起始點(diǎn),而不是基于文件的修改時(shí)間
如果關(guān)閉的文件發(fā)生變化,一個(gè)新的harverster將在scan_frequency運行后被啟動(dòng)
建議至少設置一個(gè)大于讀取日志頻率的值,配置多個(gè)prospector來(lái)實(shí)現針對不同更新速度的日志文件
使用內部時(shí)間戳機制,來(lái)反映記錄日志的讀取,每次讀取到最后一行日志時(shí)開(kāi)始倒計時(shí)使用2h 5m 來(lái)表示
close_rename #當選項啟動(dòng),如果文件被重命名和移動(dòng),filebeat關(guān)閉文件的處理讀取
close_removed #當選項啟動(dòng),文件被刪除時(shí),filebeat關(guān)閉文件的處理讀取這個(gè)選項啟動(dòng)后,必須啟動(dòng)clean_removed
close_eof #適合只寫(xiě)一次日志的文件,然后filebeat關(guān)閉文件的處理讀取
close_timeout #當選項啟動(dòng)時(shí),filebeat會(huì )給每個(gè)harvester設置預定義時(shí)間,不管這個(gè)文件是否被讀取,達到設定時(shí)間后,將被關(guān)閉
close_timeout 不能等于ignore_older,會(huì )導致文件更新時(shí),不會(huì )被讀取如果output一直沒(méi)有輸出日志事件,這個(gè)timeout是不會(huì )被啟動(dòng)的,
至少要要有一個(gè)事件發(fā)送,然后haverter將被關(guān)閉
設置0 表示不啟動(dòng)
clean_inactived #從注冊表文件中刪除先前收獲的文件的狀態(tài)
設置必須大于ignore_older+scan_frequency,以確保在文件仍在收集時(shí)沒(méi)有刪除任何狀態(tài)
配置選項有助于減小注冊表文件的大小,特別是如果每天都生成大量的新文件
此配置選項也可用于防止在Linux上重用inode的Filebeat問(wèn)題
clean_removed #啟動(dòng)選項后,如果文件在磁盤(pán)上找不到,將從注冊表中清除filebeat
如果關(guān)閉close removed 必須關(guān)閉clean removed
scan_frequency #prospector檢查指定用于收獲的路徑中的新文件的頻率,默認10s
tail_files:#如果設置為true,Filebeat從文件尾開(kāi)始監控文件新增內容,把新增的每一行文件作為一個(gè)事件依次發(fā)送,
而不是從文件開(kāi)始處重新發(fā)送所有內容。
symlinks:#符號鏈接選項允許Filebeat除常規文件外,可以收集符號鏈接。收集符號鏈接時(shí),即使報告了符號鏈接的路徑,
Filebeat也會(huì )打開(kāi)并讀取原始文件。
backoff: #backoff選項指定Filebeat如何積極地抓取新文件進(jìn)行更新。默認1s,backoff選項定義Filebeat在達到EOF之后
再次檢查文件之間等待的時(shí)間。
max_backoff: #在達到EOF之后再次檢查文件之前Filebeat等待的最長(cháng)時(shí)間
backoff_factor: #指定backoff嘗試等待時(shí)間幾次,默認是2
harvester_limit:#harvester_limit選項限制一個(gè)prospector并行啟動(dòng)的harvester數量,直接影響文件打開(kāi)數
tags #列表中添加標簽,用過(guò)過(guò)濾,例如:tags: ["json"]
fields #可選字段,選擇額外的字段進(jìn)行輸出可以是標量值,元組,字典等嵌套類(lèi)型
默認在sub-dictionary位置
filebeat.inputs:
fields:
app_id: query_engine_12
fields_under_root #如果值為ture,那么fields存儲在輸出文檔的頂級位置
multiline.pattern #必須匹配的regexp模式
multiline.negate #定義上面的模式匹配條件的動(dòng)作是 否定的,默認是false
假如模式匹配條件'^b',默認是false模式,表示講按照模式匹配進(jìn)行匹配 將不是以b開(kāi)頭的日志行進(jìn)行合并
如果是true,表示將不以b開(kāi)頭的日志行進(jìn)行合并
multiline.match # 指定Filebeat如何將匹配行組合成事件,在之前或者之后,取決于上面所指定的negate
multiline.max_lines #可以組合成一個(gè)事件的最大行數,超過(guò)將丟棄,默認500
multiline.timeout #定義超時(shí)時(shí)間,如果開(kāi)始一個(gè)新的事件在超時(shí)時(shí)間內沒(méi)有發(fā)現匹配,也將發(fā)送日志,默認是5s
max_procs #設置可以同時(shí)執行的最大CPU數。默認值為系統中可用的邏輯CPU的數量。
name #為該filebeat指定名字,默認為主機的hostname
#=========================== Filebeat inputs =============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths: #配置多個(gè)日志路徑
-/var/logs/es_aaa_index_search_slowlog.log
-/var/logs/es_bbb_index_search_slowlog.log
-/var/logs/es_ccc_index_search_slowlog.log
-/var/logs/es_ddd_index_search_slowlog.log
#- c:\programdata\elasticsearch\logs\*
# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
#exclude_lines: ['^DBG']
# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ['^ERR', '^WARN']
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: ['.gz$']
# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1
### Multiline options
# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
#multiline.pattern: ^\[
# Defines if the pattern set under pattern should be negated or not. Default is false.
#multiline.negate: false
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
#multiline.match: after
#================================ Outputs =====================================
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts #配多個(gè)logstash使用負載均衡機制
hosts: ["192.168.110.130:5044","192.168.110.131:5044","192.168.110.132:5044","192.168.110.133:5044"]
loadbalance: true #使用了負載均衡
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => ["http://192.168.110.130:9200"] #這里可以配置多個(gè)
index => "query-%{yyyyMMdd}"
}
}
###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.
#=========================== Filebeat inputs =============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
-/var/logs/es_aaa_index_search_slowlog.log
-/var/logs/es_bbb_index_search_slowlog.log
-/var/logs/es_ccc_index_search_slowlog.log
-/var/logs/es_dddd_index_search_slowlog.log
#- c:\programdata\elasticsearch\logs\*
# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
#exclude_lines: ['^DBG']
# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ['^ERR', '^WARN']
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: ['.gz$']
# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1
### Multiline options
# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
#multiline.pattern: ^\[
# Defines if the pattern set under pattern should be negated or not. Default is false.
#multiline.negate: false
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
#multiline.match: after
#============================= Filebeat modules ===============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
#reload.period: 10s
#==================== Elasticsearch template setting ==========================
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
name: filebeat222
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#cloud.auth:
#================================ Outputs =====================================
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["192.168.110.130:9200","92.168.110.131:9200"]
# Protocol - either `http` (default) or `https`.
#protocol: "https"
# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
username: "elastic"
password: "${ES_PWD}" #通過(guò)keystore設置密碼
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
host: "192.168.110.130:5601" #指定kibana
username: "elastic" #用戶(hù)
password: "${ES_PWD}" #密碼,這里使用了keystore,防止明文密碼
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["192.168.110.130:9200","192.168.110.131:9200"]
# Protocol - either `http` (default) or `https`.
#protocol: "https"
# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
username: "elastic" #es的用戶(hù)
password: "${ES_PWD}" # es的密碼
#這里不能指定index,因為我沒(méi)有配置模板,會(huì )自動(dòng)生成一個(gè)名為filebeat-%{[beat.version]}-%{+yyyy.MM.dd}的索引
cd filebeat-7.7.0-linux-x86_64/modules.d
./filebeat modules elasticsearch
./filebeat modules list
./filebeat setup -e
./filebeat -e