Initial Sahara documentation

Includes:
* Sahara Overview
* Installation instructions
* Basic verification steps

Change-Id: Ic68c37527f99c72084c156487d9f9829af49def6
This commit is contained in:
Andrew Lazarev 2014-10-02 15:02:26 -07:00
parent a570021ccf
commit d3a6b6b68b
6 changed files with 191 additions and 0 deletions

View File

@ -204,6 +204,7 @@
<xi:include href="section_getstart_telemetry.xml"/>
<xi:include href="section_getstart_orchestration.xml"/>
<xi:include href="section_getstart_trove.xml"/>
<xi:include href="section_getstart_sahara.xml"/>
</section>
<section xml:id="feedback">
<title>Feedback</title>

View File

@ -0,0 +1,48 @@
<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="sahara-service">
<title>Data processing service</title>
<para>The Data processing service for OpenStack (sahara) aims to provide
users with simple means to provision data processing (Hadoop, Spark)
clusters by specifying several parameters like Hadoop version, cluster
topology, nodes hardware details and a few more. After user fills in
all the parameters, the Data processing service deploys the cluster in a
few minutes. Also sahara provides means to scale already provisioned
clusters by adding/removing worker nodes on demand.
</para>
<para>The solution addresses the following use cases:
<itemizedlist>
<listitem><para>Fast provisioning of Hadoop clusters on OpenStack for
development and QA.</para></listitem>
<listitem><para>Utilization of unused compute power from general
purpose OpenStack IaaS cloud.</para></listitem>
<listitem><para>Analytics-as-a-Service for ad-hoc or bursty analytic
workloads.</para></listitem>
</itemizedlist>
</para>
<para>Key features are:
<itemizedlist>
<listitem><para>Designed as an OpenStack component.</para></listitem>
<listitem><para>Managed through REST API with UI available as part
of OpenStack dashboard.</para></listitem>
<listitem><para>Support for different Hadoop distributions:
<itemizedlist>
<listitem><para>Pluggable system of Hadoop installation
engines.</para></listitem>
<listitem><para>Integration with vendor specific management tools,
such as Apache Ambari or Cloudera Management Console.</para></listitem>
</itemizedlist>
</para></listitem>
<listitem><para>Predefined templates of Hadoop configurations with
ability to modify parameters.</para></listitem>
<listitem><para>User-friendly UI for ad-hoc analytics queries based on
Hive or Pig.</para></listitem>
</itemizedlist>
</para>
</section>

View File

@ -219,6 +219,7 @@
<xi:include href="ch_heat.xml"/>
<xi:include href="ch_ceilometer.xml"/>
<xi:include href="ch_trove.xml"/>
<xi:include href="ch_sahara.xml"/>
<xi:include href="ch_launch-instance.xml"/>
<xi:include href="app_reserved_uids.xml"/>
<xi:include href="../common/app_support.xml"/>

View File

@ -0,0 +1,19 @@
<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="ch_sahara">
<title>Add the Data processing service</title>
<para>The Data processing service (sahara) enables users to provide a
scalable data processing stack and associated management interfaces.
This includes provision and operation of data processing clusters as
well as scheduling and operation of data processing jobs.
</para>
<warning><para>This chapter is a work in progress. It may contain
incorrect information, and will be updated frequently.</para></warning>
<xi:include href="../common/section_getstart_sahara.xml"/>
<xi:include href="section_sahara-install.xml" />
<xi:include href="section_sahara-verify.xml" />
</chapter>

View File

@ -0,0 +1,96 @@
<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="sahara-install">
<title>Install the Data processing service</title>
<para>This procedure installs the Data processing service (sahara) on the
controller node.</para>
<para>To install the Data processing service on the controller:</para>
<procedure>
<step os="rhel;centos;fedora;opensuse;sles">
<para>Install required packages:</para>
<screen os="rhel;centos;fedora"><prompt>#</prompt> <userinput>yum install openstack-sahara python-saharaclient</userinput></screen>
<screen os="opensuse;sles"><prompt>#</prompt> <userinput>zypper install openstack-sahara python-saharaclient</userinput></screen>
</step>
<step os="ubuntu;debian">
<warning><para>You need to install required packages. For now, sahara
doesn't have packages for Ubuntu and Debian.
Documentation will be updated once packages are available. The rest
of this document assumes that you have sahara service packages
installed on the system.</para></warning>
</step>
<step>
<para>Edit <filename>/etc/sahara/sahara.conf</filename> configuration file</para>
<substeps>
<step><para>First, edit <option>connection</option> parameter in
the <literal>[database]</literal> section. The URL provided here
should point to an empty database. For instance, connection
string for MySQL database will be:
<programlisting language="ini">connection = mysql://sahara:<replaceable>SAHARA_DBPASS</replaceable>@<replaceable>controller</replaceable>/sahara</programlisting>
</para></step>
<step><para>Switch to the <literal>[keystone_authtoken]</literal>
section. The <option>auth_uri</option> parameter should point to
the public Identity API endpoint. <option>identity_uri</option>
should point to the admin Identity API endpoint. For example:
<programlisting language="ini">auth_uri = http://<replaceable>controller</replaceable>:5000/v2.0
identity_uri = http://<replaceable>controller</replaceable>:35357</programlisting>
</para></step>
<step><para>Next specify <literal>admin_user</literal>,
<literal>admin_password</literal> and
<literal>admin_tenant_name</literal>. These parameters must specify
a keystone user which has the <literal>admin</literal> role in the
given tenant. These credentials allow sahara to authenticate and
authorize its users.
</para></step>
<step><para>Switch to the <literal>[DEFAULT]</literal> section.
Proceed to the networking parameters. If you are using Neutron
for networking, then set <literal>use_neutron=true</literal>.
Otherwise if you are using <systemitem>nova-network</systemitem> set
the given parameter to <literal>false</literal>.
</para></step>
<step><para>That should be enough for the first run. If you want to
increase logging level for troubleshooting, there are two parameters
in the config: <literal>verbose</literal> and
<literal>debug</literal>. If the former is set to
<literal>true</literal>, sahara will
start to write logs of <literal>INFO</literal> level and above. If
<literal>debug</literal> is set to
<literal>true</literal>, sahara will write all the logs, including
the <literal>DEBUG</literal> ones.
</para></step>
</substeps>
</step>
<step><para>If you use the Data processing service with MySQL database,
then for storing big job binaries in sahara internal database you must
configure size of max allowed packet. Edit <filename>my.cnf</filename>
file and change parameter:
<programlisting language="ini">[mysqld]
max_allowed_packet = 256M</programlisting>
and restart MySQL server.
</para></step>
<step><para>Create database schema:
<screen><prompt>#</prompt> <userinput>sahara-db-manage --config-file /etc/sahara/sahara.conf upgrade head</userinput></screen>
</para></step>
<step><para>You must register the Data processing service with the Identity
service so that other OpenStack services can locate it. Register the
service and specify the endpoint:
<screen><prompt>$</prompt> <userinput>keystone service-create --name sahara --type data_processing \
--description "Data processing service"</userinput>
<prompt>$</prompt> <userinput>keystone endpoint-create \
--service-id $(keystone service-list | awk '/ sahara / {print $2}') \
--publicurl http://<replaceable>controller</replaceable>:8386/v1.1/%\(tenant_id\)s \
--internalurl http://<replaceable>controller</replaceable>:8386/v1.1/%\(tenant_id\)s \
--adminurl http://<replaceable>controller</replaceable>:8386/v1.1/%\(tenant_id\)s</userinput></screen>
</para></step>
<step><para>Start the sahara service:
<screen os="rhel;centos;fedora;opensuse;ubuntu;debian"><prompt>#</prompt> <userinput>systemctl start openstack-sahara-all</userinput></screen>
<screen os="sles"><prompt>#</prompt> <userinput>service openstack-sahara-all start</userinput></screen>
</para></step>
<step><para>(Optional) Enable the Data processing service to start on boot
<screen os="rhel;centos;fedora;opensuse;ubuntu;debian"><prompt>#</prompt> <userinput>systemctl enable openstack-sahara-all</userinput></screen>
<screen os="sles"><prompt>#</prompt> <userinput>chkconfig openstack-sahara-all on</userinput></screen>
</para></step>
</procedure>
</section>

View File

@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="sahara-verify">
<title>Verify the Data processing service installation</title>
<para>To verify that the Data processing service (sahara) is installed and
configured correctly, try requesting clusters list using sahara
client.</para>
<procedure>
<step>
<para>Source the <literal>demo</literal> tenant credentials:</para>
<screen><prompt>$</prompt> <userinput>source demo-openrc.sh</userinput></screen>
</step>
<step>
<para>Retrieve sahara clusters list:</para>
<screen><prompt>$</prompt> <userinput>sahara cluster-list</userinput></screen>
<para>You should see output similar to this:</para>
<screen><computeroutput>+------+----+--------+------------+
| name | id | status | node_count |
+------+----+--------+------------+
+------+----+--------+------------+</computeroutput></screen>
</step>
</procedure>
</section>