Merge "Delete the unnecessary space"

This commit is contained in:
Jenkins 2016-10-17 08:05:16 +00:00 committed by Gerrit Code Review
commit 2fde030fd4
5 changed files with 118 additions and 118 deletions

View File

@ -9,13 +9,13 @@ Multi-node Ansible
==================
This blueprint specifies an approach to automate the deployment of OpenStack
using Ansible and Docker best practices. The overriding principles used in
using Ansible and Docker best practices. The overriding principles used in
this specification are simplicity, flexibility and optimized deployment speed.
Problem description
===================
Kolla can be deployed multi-node currently. To do so, the environment
Kolla can be deployed multi-node currently. To do so, the environment
variables must be hand edited to define the hosts to connect to for various
services.
@ -42,10 +42,10 @@ Proposed change
===============
The docker-compose tool is single node and does nearly the same job as Ansible
would in this specification. As a result, we recommend deprecating
would in this specification. As a result, we recommend deprecating
docker-compose as the default deployment system for Kolla.
To replace it, we recommend Ansible as a technology choice. Ansible is easy
To replace it, we recommend Ansible as a technology choice. Ansible is easy
to learn, easy to use, and offers a base set of functionality to solve
deployment as outlined in our four use cases.
@ -53,36 +53,36 @@ We recommend three models of configuration.
The first model is based upon internally configuring the container and having
the container take responsibility for all container configuration including
database setup, database synchronization, and keystone registration. This
model uses docker-compose and docker as dependencies. Existing containers will
database setup, database synchronization, and keystone registration. This
model uses docker-compose and docker as dependencies. Existing containers will
be maintained but new container content will use either of the two remaining
models. James Slagle (TripleO PTL on behalf of our downstream TripleO
models. James Slagle (TripleO PTL on behalf of our downstream TripleO
community) was very clear that he would prefer to see this model stay available
and maintained. As TripleO enters the world of Big Tent, they don't intend to
and maintained. As TripleO enters the world of Big Tent, they don't intend to
deploy all of the services, and as such it doesn't make sense to maintain this
legacy operational mode for new container content except on demand of our
downstreams, hopefully with their assistance. This model is called
downstreams, hopefully with their assistance. This model is called
CONFIG_INSIDE.
The second model and third model configure the containers outside of the
container. These models depend on Ansible and Docker. In the future, the
container. These models depend on Ansible and Docker. In the future, the
OpenStack Puppet, OpenStack Chef and TripleO communities may decide to switch
to one of these two models in which case these communities may maintain tooling
to integrate with Kolla. The major difference between these two models is that
to integrate with Kolla. The major difference between these two models is that
one offers immutability and single source of truth (CONFIG_OUTSIDE_COPY_ONCE),
while the third model trades these two properties to allow an Operator to
directly modify configuration files on a system and have the configuration be
live in the container (CONFIG_OUTSIDE_COPY_ALWAYS). Because
live in the container (CONFIG_OUTSIDE_COPY_ALWAYS). Because
CONFIG_OUTSIDE_COPY_ALWAYS requires direct Operator intervention on a node, and
we prefer as a community Operators interact with the tools provided by Kolla,
CONFIG_OUTSIDE_COPY_ONCE will be the default.
We do not have to further enhance two sets of container configuration, but
instead can focus our development effort on the default Ansible configuration
methods. If a defect is found in one of the containers based upon the
methods. If a defect is found in one of the containers based upon the
CONFIG_INSIDE model, the community will repair it.
Finally we will implement a complete Ansible deployment system. The details
Finally we will implement a complete Ansible deployment system. The details
of the implementation are covered in a later section in this specification.
We estimate this will be approximately ~1000 LOC defining ~100 Ansible tasks.
We further estimate the total code base when complete will be under 6 KLOC.
@ -97,7 +97,7 @@ best practices while introducing completely customizable configuration.
The CONFIG_OUTSIDE_COPY_ALWAYS model of configuration offers the Operator
greater flexibility in managing their deployment, at greater risk of damaging
their deployment. It trades one set of best practices for another,
their deployment. It trades one set of best practices for another,
specifically the Kolla container best practices for flexibility.
Security impact

View File

@ -9,8 +9,8 @@ Containerize OpenStack
======================
When upgrading or downgrading OpenStack, it is possible to use package based
management or image-based management. Containerizing OpenStack is meant to
optimize image-based management of OpenStack. Containerizing OpenStack
management or image-based management. Containerizing OpenStack is meant to
optimize image-based management of OpenStack. Containerizing OpenStack
solves a manageability and availability problem with the current state of the
art deployment systems in OpenStack.
@ -20,34 +20,34 @@ Problem description
Current state of the art deployment systems use either image based or package
based upgrade.
Image based upgrades are utilized by TripleO. When TripleO updates a system,
Image based upgrades are utilized by TripleO. When TripleO updates a system,
it creates an image of the entire disk and deploys that rather than just the
parts that compose the OpenStack deployment. This results in significant
loss of availability. Further running VMs are shut down in the imaging
process. However, image based systems offer atomicity, because all related
parts that compose the OpenStack deployment. This results in significant
loss of availability. Further running VMs are shut down in the imaging
process. However, image based systems offer atomicity, because all related
software for a service is updated in one atomic action by reimaging the system.
Other systems use package based upgrade. Package based upgrades suffer from
a non-atomic nature. An update may update 1 or more RPM packages. The update
Other systems use package based upgrade. Package based upgrades suffer from
a non-atomic nature. An update may update 1 or more RPM packages. The update
process could fail for any number of reasons, and there is no way to back
out the existing changes. Typically in an OpenStack deployment it is
out the existing changes. Typically in an OpenStack deployment it is
desirable to update a service that does one thing including it's dependencies
as an atomic unit. Package based upgrades do not offer atomicity.
as an atomic unit. Package based upgrades do not offer atomicity.
To solve this problem, containers can be used to provide an image-based update
approach which offers atomic upgrade of a running system with minimal
interruption in service. A rough prototype of compute upgrade [1] shows
interruption in service. A rough prototype of compute upgrade [1] shows
approximately a 10 second window of unavailability during a software update.
The prototype keeps virtual machines running without interruption.
Use cases
---------
1. Upgrade or rollback OpenStack deployments atomically. End-user wants to
1. Upgrade or rollback OpenStack deployments atomically. End-user wants to
change the running software versions in her system to deploy a new upstream
release without interrupting service for significant periods.
2. Upgrade OpenStack based by component. End-user wants to upgrade her system
2. Upgrade OpenStack based by component. End-user wants to upgrade her system
in fine-grained chunks to limit damage from a failed upgrade.
3. Rollback OpenStack based by component. End-user experienced a failed
3. Rollback OpenStack based by component. End-user experienced a failed
upgrade and wishes to rollback to the last known good working version.
@ -180,16 +180,16 @@ The various container sets are composed in more detail as follows:
* swift-proxy-server
In order to achieve the desired results, we plan to permit super-privileged
containers. A super-privileged container is defined as any container launched
containers. A super-privileged container is defined as any container launched
with the --privileged=true flag to docker that:
* bind-mounts specific security-crucial host operating system directories
with -v. This includes nearly all directories in the filesystem except for
with -v. This includes nearly all directories in the filesystem except for
leaf directories with no other host operating system use.
* shares any namespace with the --ipc=host, --pid=host, or --net=host flags
We will not use the Docker EXPOSE operation since all containers will use
--net=host. One motive for using --net=host is it is inherently simpler.
--net=host. One motive for using --net=host is it is inherently simpler.
A different motive for not using EXPOSE is the 20 microsecond penalty
applied to every packet forwarded and returned by docker-proxy.
If EXPOSE functionality is desired, it can be added back by
@ -207,12 +207,12 @@ If the container does not pass its healthcheck operation, it should be
restarted.
Integration of metadata with fig or a similar single node Docker orchestration
tool will be implemented. Even though fig executes on a single node, the
tool will be implemented. Even though fig executes on a single node, the
containers will be designed to run multi-node and the deploy tool should take
some form of information to allow it to operate multi-node. The deploy tool
some form of information to allow it to operate multi-node. The deploy tool
should take a set of key/value pairs as inputs and convert them into inputs
into the environment passed to Docker. These key/value pairs could be a file
or environment variables. We will not offer integration with multi-node
into the environment passed to Docker. These key/value pairs could be a file
or environment variables. We will not offer integration with multi-node
scheduling or orchestration tools, but instead expect our consumers to manage
each bare metal machine using our fig or similar in nature tool integration.
@ -220,7 +220,7 @@ Any contributions from the community of the required metadata to run these
containers using a multi-node orchestration tool will be warmly received but
generally won't be maintained by the core team.
The technique for launching the deploy script is not handled by Kolla. This
The technique for launching the deploy script is not handled by Kolla. This
is a problem for a higher level deployment tool such as TripleO or Fuel to
tackle.
@ -229,7 +229,7 @@ Logs from the individual containers will be retrievable in some consistent way.
Security impact
---------------
Container usage with super-privileged mode may possibly impact security. For
Container usage with super-privileged mode may possibly impact security. For
example, when using --net=host mode and bind-mounting /run which is necessary
for a compute node, it is possible that a compute breakout could corrupt the
host operating system.

View File

@ -6,7 +6,7 @@ https://blueprints.launchpad.net/kolla/+spec/kolla-kubernetes
Kubernetes was evaluated by the Kolla team in the first two months of the
project and it was found to be problematic because it did not support net=host,
pid=host, and --privileged features in docker. Since then, it has developed
pid=host, and --privileged features in docker. Since then, it has developed
these features [1].
The objective is to manage the lifecycle of containerized OpenStack services by
@ -51,7 +51,7 @@ Orchestration
-------------
OpenStack on Kubernetes will be orchestrated by outside tools in order to create
a production ready OpenStack environment. The kolla-kubernetes repo is where
a production ready OpenStack environment. The kolla-kubernetes repo is where
any deployment tool can join the community and be a part of orchestrating a
kolla-kubernetes deployment.
@ -60,10 +60,10 @@ Service Config Management
Config generation will be completely decoupled from the deployment. The
containers only expect a config file to land in a specific directory in
the container in order to run. With this decoupled model, any tool could be
used to generate config files. The kolla-kubernetes community will evaluate
the container in order to run. With this decoupled model, any tool could be
used to generate config files. The kolla-kubernetes community will evaluate
any config generation tool, but will likely use Ansible for config generation
in order to reuse existing work from the community. This solution uses
in order to reuse existing work from the community. This solution uses
customized Ansible and jinja2 templates to generate the config. Also, there will
be a maintained set of defaults and a global yaml file that can override the
defaults.
@ -82,7 +82,7 @@ will be a Kubernetes Job, which will run the task until completion then
terminate the pods [7].
Each service will have a bootstrap task so that when the operator upgrades,
the bootstrap tasks are reused to upgrade the database. This will allow
the bootstrap tasks are reused to upgrade the database. This will allow
deployment and upgrades to follow the same pipeline.
The Kolla containers will communicate with the Kubernetes API server to in order
@ -96,14 +96,14 @@ require some orchestration and the bootstrap pod will need to be setup to
never restart or be replicated.
2) Use a sidecar container in the pod to handle the database sync with proper
health checking to make sure the services are coming up healthy. The big
health checking to make sure the services are coming up healthy. The big
difference between kolla's old docker-compose solution and Kubernetes, is that
docker-compose would only restart the containers. Kubernetes will completely
reschedule them. Which means, removing the pod and restarting it. The reason
docker-compose would only restart the containers. Kubernetes will completely
reschedule them. Which means, removing the pod and restarting it. The reason
this would fix that race condition failure kolla saw from docker-compose is
because glance would be rescheduled on failure allowing keystone to get a
chance to sync with the database and become active instead of constantly being
piled with glance requests. There can also be health checks around this to help
piled with glance requests. There can also be health checks around this to help
determine order.
If kolla-kubernetes used this sidecar approach, it would regain the use of
@ -116,12 +116,12 @@ Dependencies
- Docker >= 1.10.0
- Jinja2 >= 2.8.0
Kubernetes does not support dependencies between pods. The operator will launch
Kubernetes does not support dependencies between pods. The operator will launch
all the services and use kubernetes health checks to bring the deployment to an
operational state.
With orchestration around Kubernetes, the operator can determine what tasks are
run and when the tasks are run. This way, dependencies are handled at the
run and when the tasks are run. This way, dependencies are handled at the
orchestration level, but they are not required because proper health checking
will bring up the cluster in a healthy state.
@ -133,7 +133,7 @@ desired state for the pods and the deployment will move the cluster to the
desired state when a change is detected.
Kolla-kubernetes will provide Jobs that will provide the operator with the
flexibility needed to under go a step wise upgrade. In future releases,
flexibility needed to under go a step wise upgrade. In future releases,
kolla-kubernetes will look to Kubernetes to provide a means for operators to
plugin these jobs into a Deployment.
@ -141,22 +141,22 @@ Reconfigure
-----------
The operator generates a new config and loads it into the Kubernetes configmap
by changing the configmap version in the service yaml file. Then, the operator
by changing the configmap version in the service yaml file. Then, the operator
will trigger a rolling upgrade, which will scale down old pods and bring up new
ones that will run with the updated configuration files.
There's an open issue upstream in Kubernetes where the plan is to add support
around detecting if a pod has a changed in the configmap [6]. Depending on what
the solution is, kolla-kubernetes may or may not use it. The rolling
around detecting if a pod has a changed in the configmap [6]. Depending on what
the solution is, kolla-kubernetes may or may not use it. The rolling
upgrade feature will provide kolla-kubernetes with an elegant way to handle
restarting the services.
HA Architecture
---------------
Kubernetes uses health checks to bring up the services. Therefore,
Kubernetes uses health checks to bring up the services. Therefore,
kolla-kubernetes will use the same checks when monitoring if a service is
healthy. When a service fails, the replication controller will be responsible
healthy. When a service fails, the replication controller will be responsible
for bringing up a new container in its place [8][9].
However, Kubernetes does not cover all the HA corner cases, for instance,
@ -178,14 +178,14 @@ guarantee a pod will always be scheduled to a host, it makes node based
persistent storage unlikely, unless the community uses labels for every pod.
Persistent storage in kolla-kubernetes will come from volumes backed by
different storage offerings to provide persistent storage. Kolla-kubernetes
different storage offerings to provide persistent storage. Kolla-kubernetes
will provide a default solution using Ceph RBD, that the community will use to
deploy multinode with. From there, kolla-kubernetes can add any additional
persistent storage options as well as support options for the operator to
reference an existing storage solution.
To deploy Ceph, the community will use the Ansible playbooks from Kolla to
deploy a containerized Ceph at least for the 1.0 release. After Kubernetes
deploy a containerized Ceph at least for the 1.0 release. After Kubernetes
deployment matures, the community can evaluate building its own Ceph deployment
solution.
@ -198,9 +198,9 @@ Service Roles
At the broadest level, OpenStack can split up into two main roles, Controller
and Compute. With Kubernetes, the role definition layer changes.
Kolla-kubernetes will still need to define Compute nodes, but not Controller
nodes. Compute nodes hold the libvirt container and the running vms. That
nodes. Compute nodes hold the libvirt container and the running vms. That
service cannont migrate because the vms associated with it exist on the node.
However, the Controller role is more flexible. The Kubernetes layer provides IP
However, the Controller role is more flexible. The Kubernetes layer provides IP
persistence so that APIs will remain active and abstracted from the operator's
view [15]. kolla-kubernetes can direct Controller services away from the Compute
node using labels, while managing Compute services more strictly.
@ -244,7 +244,7 @@ To reuse Kolla's containers, kolla-kubernetes will use elastic search, heka, and
kibana as the default logging mechanism.
The community will implement centralized logging by using a 'side car' container
in the Kubernetes pod [17]. The logging service will trace the logs from the
in the Kubernetes pod [17]. The logging service will trace the logs from the
shared volume of the running serivce and send the data to elastic search. This
solution is ideal because volumes are shared amoung the containers in a pod.

View File

@ -4,15 +4,15 @@ Logging with Heka
https://blueprints.launchpad.net/kolla/+spec/heka
Kolla currently uses Rsyslog for logging. And Change Request ``252968`` [1]
Kolla currently uses Rsyslog for logging. And Change Request ``252968`` [1]
suggests to use ELK (Elasticsearch, Logstash, Kibana) as a way to index all the
logs, and visualize them.
This spec suggests using Heka [2] instead of Logstash, while still using
Elasticsearch for indexing and Kibana for visualization. It also discusses
Elasticsearch for indexing and Kibana for visualization. It also discusses
the removal of Rsyslog along the way.
What is Heka? Heka is a open-source stream processing software created and
What is Heka? Heka is a open-source stream processing software created and
maintained by Mozilla.
Using Heka will provide a lightweight and scalable log processing solution
@ -22,7 +22,7 @@ Problem description
===================
Change Request ``252968`` [1] adds an Ansible role named "elk" that enables
deploying ELK (Elasticsearch, Logstash, Kibana) on nodes with that role. This
deploying ELK (Elasticsearch, Logstash, Kibana) on nodes with that role. This
spec builds on that work, proposing a scalable log processing architecture
based on the Heka [2] stream processing software.
@ -34,7 +34,7 @@ OpenStack nodes rather than using a centralized log processing engine that
represents a bottleneck and a single-point-of-failure.
We also know from experience that Heka provides all the necessary flexibility
for processing other types of data streams than log messages. For example, we
for processing other types of data streams than log messages. For example, we
already use Heka together with Elasticsearch for logs, but also with collectd
and InfluxDB for statistics and metrics.
@ -53,16 +53,16 @@ in a dedicated container, referred to as the Heka container in the rest of this
document.
Each Heka instance reads and processes the logs local to the node it runs on,
and sends these logs to Elasticsearch for indexing. Elasticsearch may be
and sends these logs to Elasticsearch for indexing. Elasticsearch may be
distributed on multiple nodes for resiliency and scalability, but that part is
outside the scope of that specification.
Heka, written in Go, is fast and has a small footprint, making it possible to
run it on every node of the cluster. In contrast, Logstash runs in a JVM and
run it on every node of the cluster. In contrast, Logstash runs in a JVM and
is known [3] to be too heavy to run on every node.
Another important aspect is flow control and avoiding the loss of log messages
in case of overload. Hekas filter and output plugins, and the Elasticsearch
in case of overload. Hekas filter and output plugins, and the Elasticsearch
output plugin in particular, support the use of a disk based message queue.
This message queue allows plugins to reprocess messages from the queue when
downstream servers (Elasticsearch) are down or cannot keep up with the data
@ -74,20 +74,20 @@ which introduces some complexity and other points-of-failures.
Remove Rsyslog
--------------
Kolla currently uses Rsyslog. The Kolla services are configured to write their
logs to Syslog. Rsyslog gets the logs from the ``/var/lib/kolla/dev/log`` Unix
socket and dispatches them to log files on the local file system. Rsyslog
Kolla currently uses Rsyslog. The Kolla services are configured to write their
logs to Syslog. Rsyslog gets the logs from the ``/var/lib/kolla/dev/log`` Unix
socket and dispatches them to log files on the local file system. Rsyslog
running in a Docker container, the log files are stored in a Docker volume
(named ``rsyslog``).
With Rsyslog already running on each cluster node, the question of using two
log processing daemons, namely ``rsyslogd`` and ``hekad``, has been raised on
the mailing list. The spec evaluates the possibility of using ``hekad`` only,
log processing daemons, namely ``rsyslogd`` and ``hekad``, has been raised on
the mailing list. The spec evaluates the possibility of using ``hekad`` only,
based on some prototyping work we have conducted [4].
Note: Kolla doesn't currently collect logs from RabbitMQ, HAProxy and
Keepalived. For RabbitMQ the problem is related to RabbitMQ not having the
capability to write its logs to Syslog. HAProxy and Keepalived do have that
Keepalived. For RabbitMQ the problem is related to RabbitMQ not having the
capability to write its logs to Syslog. HAProxy and Keepalived do have that
capability, but the ``/var/lib/kolla/dev/log`` Unix socket file is currently
not mounted into the HAProxy and Keepalived containers.
@ -96,21 +96,21 @@ Use Heka's ``DockerLogInput`` plugin
To remove Rsyslog and only use Heka one option would be to make the Kolla
services write their logs to ``stdout`` (or ``stderr``) and rely on Heka's
``DockerLogInput`` plugin [5] for reading the logs. Our experiments have
``DockerLogInput`` plugin [5] for reading the logs. Our experiments have
revealed a number of problems with this option:
* The ``DockerLogInput`` plugin doesn't currently work for containers that have
a ``tty`` allocated. And Kolla currently allocates a tty for all containers
a ``tty`` allocated. And Kolla currently allocates a tty for all containers
(for good reasons).
* When ``DockerLogInput`` is used there is no way to differentiate log messages
for containers producing multiple log streams. ``neutron-agents`` is an
example of such a container. (Sam Yaple has raised that issue multiple
for containers producing multiple log streams. ``neutron-agents`` is an
example of such a container. (Sam Yaple has raised that issue multiple
times.)
* If Heka is stopped and restarted later then log messages will be lost, as the
``DockerLogInput`` plugin doesn't currently have a mechanism for tracking its
positions in the log streams. This is in contrast to the ``LogstreamerInput``
positions in the log streams. This is in contrast to the ``LogstreamerInput``
plugin [6] which does include that mechanism.
For these reasons we think that relying on the ``DockerLogInput`` plugin may
@ -119,7 +119,7 @@ not be a practical option.
For the note, our experiments have also shown that the OpenStack containers
logs written to ``stdout`` are visible to neither Heka nor ``docker logs``.
This problem is not reproducible when ``stderr`` is used rather than
``stdout``. The cause of this problem is currently unknown. And it looks like
``stdout``. The cause of this problem is currently unknown. And it looks like
other people have come across that issue [7].
Use local log files
@ -129,7 +129,7 @@ Another option consists of configuring all the Kolla services to log into local
files, and using Heka's ``LogstreamerInput`` plugin [5].
This option involves using a Docker named volume, mounted both into the service
containers (in ``rw`` mode) and into the Heka container (in ``ro`` mode). The
containers (in ``rw`` mode) and into the Heka container (in ``ro`` mode). The
services write logs into files placed in that volume, and Heka reads logs from
the files found in that volume.
@ -138,28 +138,28 @@ And it relies on Heka's ``LogstreamerInput`` plugin, which, based on our
experience, is efficient and robust.
Keeping file logs locally on the nodes has been established as a requirement by
the Kolla developers. With this option, and the Docker volume used, meeting
the Kolla developers. With this option, and the Docker volume used, meeting
that requirement necessitates no additional mechanism.
For this option to be applicable the services must have the capability of
logging into files. Most of the Kolla services have this capability. The
logging into files. Most of the Kolla services have this capability. The
exceptions are HAProxy and Keepalived, for which a different mechanism should
be used (described further down in the document). Note that this will make it
be used (described further down in the document). Note that this will make it
possible to collect logs from RabbitMQ, which does not support logging to
Syslog but does support logging to a file.
Also, this option requires that the services have the permission to create
files into the Docker volume, and that Heka has the permission to read these
files. This means that the Docker named volume will have to have appropriate
owner, group and permission bits. With the Heka container running under
files. This means that the Docker named volume will have to have appropriate
owner, group and permission bits. With the Heka container running under
a specific user (see below) this will mean using an ``extend_start.sh`` script
including ``sudo chown`` and possibly ``sudo chmod`` commands. Our prototype
including ``sudo chown`` and possibly ``sudo chmod`` commands. Our prototype
[4] already includes this.
As mentioned already the ``LogstreamerInput`` plugin includes a mechanism for
tracking positions in log streams. This works with journal files stored on the
file system (in ``/var/cache/hekad``). A specific volume, private to Heka,
will be used for these journal files. In this way no logs will be lost if the
tracking positions in log streams. This works with journal files stored on the
file system (in ``/var/cache/hekad``). A specific volume, private to Heka,
will be used for these journal files. In this way no logs will be lost if the
Heka container is removed and a new one is created.
Handling HAProxy and Keepalived
@ -174,7 +174,7 @@ This works by using Heka's ``UdpInput`` plugin with its ``net`` option set
to ``unixgram``.
This also requires that a Unix socket is created by Heka, and that socket is
mounted into the HAProxy and Keepalived containers. For that we will use the
mounted into the HAProxy and Keepalived containers. For that we will use the
same technique as the one currently used in Kolla with Rsyslog, that is
mounting ``/var/lib/kolla/dev`` into the Heka container and mounting
``/var/lib/kolla/dev/log`` into the service containers.
@ -182,7 +182,7 @@ mounting ``/var/lib/kolla/dev`` into the Heka container and mounting
Our prototype already includes some code demonstrating this. See [4].
Also, to be able to store a copy of the HAProxy and Keepalived logs locally on
the node, we will use Heka's ``FileOutput`` plugin. We will possibly create
the node, we will use Heka's ``FileOutput`` plugin. We will possibly create
two instances of that plugin, one for HAProxy and one for Keepalived, with
specific filters (``message_matcher``).
@ -190,29 +190,29 @@ Read Python Tracebacks
----------------------
In case of exceptions the OpenStack services log Python Tracebacks as multiple
log messages. If no special care is taken then the Python Tracebacks will be
log messages. If no special care is taken then the Python Tracebacks will be
indexed as separate documents in Elasticsearch, and displayed as distinct log
entries in Kibana, making them hard to read. To address that issue we will use
entries in Kibana, making them hard to read. To address that issue we will use
a custom Heka decoder, which will be responsible for coalescing the log lines
making up a Python Traceback into one message. Our prototype includes that
making up a Python Traceback into one message. Our prototype includes that
decoder [4].
Collect system logs
-------------------
In addition to container logs we think it is important to collect system logs
as well. For that we propose to mount the host's ``/var/log`` directory into
as well. For that we propose to mount the host's ``/var/log`` directory into
the Heka container, and configure Heka to get logs from standard log files
located in that directory (e.g. ``kern.log``, ``auth.log``, ``messages``). The
located in that directory (e.g. ``kern.log``, ``auth.log``, ``messages``). The
list of system log files will be determined at development time.
Log rotation
------------
Log rotation is an important aspect of the logging system. Currently Kolla
doesn't rotate logs. Logs just accumulate in the ``rsyslog`` Docker volume.
Log rotation is an important aspect of the logging system. Currently Kolla
doesn't rotate logs. Logs just accumulate in the ``rsyslog`` Docker volume.
The work on Heka proposed in this spec isn't directly related to log rotation,
but we are suggesting to address this issue for Mitaka. This will mean
but we are suggesting to address this issue for Mitaka. This will mean
creating a new container that uses ``logrotate`` to manage the log files
created by the Kolla containers.
@ -220,33 +220,33 @@ Create an ``heka`` user
-----------------------
For security reasons an ``heka`` user will be created in the Heka container and
the ``hekad`` daemon will run under that user. The ``heka`` user will be added
the ``hekad`` daemon will run under that user. The ``heka`` user will be added
to the ``kolla`` group, to make sure that Heka can read the log files created
by the services.
Security impact
---------------
Heka is a mature product maintained and used in production by Mozilla. So we
trust Heka as being secure. We also trust the Heka developers as being serious
Heka is a mature product maintained and used in production by Mozilla. So we
trust Heka as being secure. We also trust the Heka developers as being serious
should security vulnerabilities be found in the Heka code.
As described above we are proposing to use a Docker volume between the service
containers and the Heka container. The group of the volume directory and the
log files will be ``kolla``. And the owner of the log files will be the user
that executes the service producing logs. But the ``gid`` of the ``kolla``
containers and the Heka container. The group of the volume directory and the
log files will be ``kolla``. And the owner of the log files will be the user
that executes the service producing logs. But the ``gid`` of the ``kolla``
group and the ``uid``'s of the users executing the services may correspond
to a different group and different users on the host system. This means
that the permissions may not be right on the host system. This problem is
to a different group and different users on the host system. This means
that the permissions may not be right on the host system. This problem is
not specific to this specification, and it already exists in Kolla (for
the mariadb data volume for example).
Performance Impact
------------------
The ``hekad`` daemon will run in a container on each cluster node. But the
``rsyslogd`` will be removed. And we have assessed that Heka is lightweight
enough to run on every node. Also, a possible option would be to constrain the
The ``hekad`` daemon will run in a container on each cluster node. But the
``rsyslogd`` will be removed. And we have assessed that Heka is lightweight
enough to run on every node. Also, a possible option would be to constrain the
Heka container to only use a defined amount of resources.
Alternatives
@ -256,12 +256,12 @@ An alternative to this proposal involves using Logstash in a centralized
way as done in [1].
Another alternative would be to execute Logstash on each cluster node, as this
spec proposes with Heka. But this would mean running a JVM on each cluster
spec proposes with Heka. But this would mean running a JVM on each cluster
node, and using Redis as a centralized queue.
Also, as described above, we initially considered relying on services writing
their logs to ``stdout`` and use Heka's ``DockerLogInput`` plugin. But our
prototyping work has demonstrated the limits of that approach. See the
their logs to ``stdout`` and use Heka's ``DockerLogInput`` plugin. But our
prototyping work has demonstrated the limits of that approach. See the
``DockerLogInput`` section above for more information.
Implementation

View File

@ -8,8 +8,8 @@
This template should be in ReSTructured text. The filename in the git
repository should match the launchpad URL, for example a URL of
https://blueprints.launchpad.net/kolla/+spec/awesome-thing should be named
awesome-thing.rst . Please do not delete any of the sections in this
template. If you have nothing to say for a whole section, just write: None
awesome-thing.rst . Please do not delete any of the sections in this
template. If you have nothing to say for a whole section, just write: None
For help with syntax, see http://sphinx-doc.org/rest.html
To test out your formatting, see http://www.tele3.cz/jbar/rest/rest.html