[GCP]Logstash Output을 Google pubsub으로 보내기

기본적으로 ELK는 Logstash가 수집한 데이터를 ElasticSearch로 보내서 데이터를 분석하거나 조회할 수 있다.

이번에는 Logstash로 수집한 데이터를 Google의 DW인 BigQuery로 적재해서 분석하거나 리포팅, 나아가 머신러닝까지 해보려는것이 목적이다.

우선 테스트 환경은 현재 운영중인 서비스의 로그(오픈스택 서비스 로그)를 Logstash로 수집하는 환경을 만든다.

Filebeat

Filebeat는 각 서비스 별로 설치를 하고 설정할 부분은 logstash 서버 ip랑 포트, 어느 로그를 설정할건지에 대한 여부만 정의한다.

설치 가이드는 아래 링크를 참조한다. (본인은 ansible playbook을 만들어서 배포했다)

https://www.elastic.co/kr/downloads/beats/filebeat

[ /etc/filebeat/filebeat.yml ]

.................................

- input_type: log

# Paths that should be crawled and fetched. Glob based paths.

# To fetch all ".log" files from a specific level of subdirectories

# /var/log/*/*.log can be used.

# For each file found under this path, a harvester is started.

# Make sure not file is defined twice as this can lead to unexpected behaviour.

paths:

# - /var/log/syslog

- /var/log/nova/*.log

- /var/log/cinder/*.log

- /var/log/glance/*.log

- /var/log/keystone/*.log

- /var/log/neutron/*.log

- /var/log/ceilometer/*.log

- /var/log/ceph/*.log

- /var/log/apache/*.log

- /var/log/rabbitmq/*.log

- /var/log/mysql/*.log

- /var/log/heat/*.log

- /var/log/gnocchi/*.log

output.logstash:

# Boolean flag to enable or disable the output module.

#enabled: true

# The Logstash hosts

hosts: ["192.168.76.106:5044"]

Logstash

Logstash는 filebeat로 부터 로그를 받아서 ElasticSearch나 파일등으로 출력을 보낼 수 있다.

설치는 아래 링크를 참조하자(역시 ansible로 playbook을 만들었다)

https://www.elastic.co/kr/downloads/logstash

Logstash는 크게 Input, Filter, Output 설정이 필요하다.

[ /etc/logstash/conf.d ]

/etc/logstash/conf.d# ls

01-beats-input.conf 10-syslog-filter.conf 20-openstack-filter.conf 30-elastic-output.conf.bak patterns

input은 filebeat로 부터 수신될 포트번호와 character set등을 설정한다.

/etc/logstash/conf.d# cat 01-beats-input.conf

input {

beats {

codec => plain {

charset => "UTF-8"

}

port => 5044

client_inactivity_timeout => 60

}

Filter는 해당 로그들의 패턴 중 필요한 부분만 가져오기 위해서 설정하는 부분이며 기본적인 syslog filter는 아래와 같다

/etc/logstash/conf.d# cat 10-syslog-filter.conf

filter {

if [type] == "syslog" {

grok {

match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }

add_field => [ "received_at", "%{@timestamp}" ]

add_field => [ "received_from", "%{host}" ]

}

syslog_pri { }

date {

match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]

}

그리고 Output은 수집된 데이터를 출력하는 설정인데, ElasticSearch에 대한 설정은 아래를 참고하자

/etc/logstash/conf.d# cat 30-elastic-output.conf

output {

elasticsearch {

hosts => ["192.168.76.106"]

sniffing => true

manage_template => false

index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"

document_type => "%{[@metadata][type]}"

}

Google Pub/Sub

이번 구성에서는 Log를 ElasticSearch로 보내는 것이 아니라 Google의 pubsub이라는 Message Queue로 보내서 해당 로그에 대한 처리는 Google Cloud에서 해보는 것이 목적이다.

그 이유는 Google이 자랑하는 BigQuery에 데이터를 쌓아서, 로그를 이용한 여러가지(분석, ML등..)를 해보고자 하는 것이다.

설치는 아래 링크를 참조하자

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-google_pubsub.html

# /usr/share/logstash/bin/logstash-plugin install logstash-input-google_pubsub

Validating logstash-input-google_pubsub

Installing logstash-input-google_pubsub

Installation successful

그리고 본인의 GCP Console에서 IAM항목에 Service Account를 생성한다. Service Account 생성은 이전 포스트에 생성하는 방법이 있으니 참조한다.

/etc/logstash/conf.d 아래에 아래 처럼 설정을 만들어두고 logstash 서비스를 restart한다

input {

    google_pubsub {
        # Your GCP project id (name)
        project_id => "my-project-1234"

        # The topic name below is currently hard-coded in the plugin. You
        # must first create this topic by hand and ensure you are exporting
        # logging to this pubsub topic.
        topic => "logstash-input-dev"

        # The subscription name is customizeable. The plugin will attempt to
        # create the subscription (but use the hard-coded topic name above).
        subscription => "logstash-sub"

        # If you are running logstash within GCE, it will use
        # Application Default Credentials and use GCE's metadata
        # service to fetch tokens.  However, if you are running logstash
        # outside of GCE, you will need to specify the service account's
        # JSON key file below.
        #json_key_file => "/home/erjohnso/pkey.json"
    }
}
output { stdout { codec => rubydebug } }

그리고 Google Cloud Console에서 pubsub topic을 만들고 subscription을 생성하면 끝이 난다

Service account에서 만든 json 파일의 위치가 저위의 json_key_file에 해당된다.

하지만 google과 인증 문제인지 아래와 같은 오류가 나온다

[ /var/log/logstash/logstash-plain.log ]

[2018-04-11T07:31:47,884][INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}

[2018-04-11T07:32:14,764][ERROR][logstash.inputs.googlepubsub] Error 400: You have not specified an ack ID in the request.

[2018-04-11T07:32:35,196][ERROR][logstash.inputs.googlepubsub] Error 400: You have not specified an ack ID in the request.

[2018-04-11T07:32:55,747][ERROR][logstash.inputs.googlepubsub] Error 400: You have not specified an ack ID in the request.

[2018-04-11T07:33:15,824][ERROR][logstash.inputs.googlepubsub] Error 400: You have not specified an ack ID in the request.

위 에러는 조금 더 살펴 본뒤에 다음 글에서 정리해야 할 것 같다.

저작자표시 비영리 동일조건 (새창열림)

리눅스, 클라우드, IT 관련 기술 블로그

[GCP]Logstash Output을 Google pubsub으로 보내기

기본적으로 ELK는 Logstash가 수집한 데이터를 ElasticSearch로 보내서 데이터를 분석하거나 조회할 수 있다.

이번에는 Logstash로 수집한 데이터를 Google의 DW인 BigQuery로 적재해서 분석하거나 리포팅, 나아가 머신러닝까지 해보려는것이 목적이다.

우선 테스트 환경은 현재 운영중인 서비스의 로그(오픈스택 서비스 로그)를 Logstash로 수집하는 환경을 만든다.

Filebeat

Logstash

Google Pub/Sub

티스토리툴바

[GCP]Logstash Output을 Google pubsub으로 보내기

기본적으로 ELK는 Logstash가 수집한 데이터를 ElasticSearch로 보내서 데이터를 분석하거나 조회할 수 있다.

이번에는 Logstash로 수집한 데이터를 Google의 DW인 BigQuery로 적재해서 분석하거나 리포팅, 나아가 머신러닝까지 해보려는것이 목적이다.

우선 테스트 환경은 현재 운영중인 서비스의 로그(오픈스택 서비스 로그)를 Logstash로 수집하는 환경을 만든다.

Filebeat

Logstash

Google Pub/Sub

관련글

티스토리툴바