Multi-node support for factory-install and enroll-init-reconfigure

This update ensures that factory install and subsequent enrollment
processes are flexible and compatible with various deployment
scenarios, improving overall support across different StarlingX
multi-node deployments.

Until now, factory install (as introduced in
https://review.opendev.org/c/starlingx/utilities/+/917149), was tested and worked
exclusively on the SX system. However, given the variety of StarlingX
deployments, especially multi-node deployments, it is necessary for factory
install to support configurations beyond SX. Consequently, the
enroll-init functionality, a post-factory-install operation to enroll
standalone factory-installed nodes, has been adapted to accommodate
these multi-node configurations.

It is important to note that, during the factory install process,
only the active controller is factory installed. The
remaining nodes are installed and configured later,
post-enrollment. As a result, while the remaining nodes may be added
to the inventory by the deployment manager, they may not be available
during the enrollment process.

The following changes are required to support multi-node
factory-install and enroll-init:

- oam_c0_ip and oam_c1_ip must be specified to modify the
  OAM network (system oam-modify).
- Puppet manifest must be applied to trigger OAM configuration
  (e.g., update endpoints). Typically, this should happen with
  lock/unlock/swact on DX systems. However, the inactive controller
  is not available during enroll-init.
- SYSTEM_RECONCILED check during DX factory-install must be skipped.
  This check does not apply as the system can't be reconciled without
  controller-1 and other nodes.
- Extend prestaged ISO installation/boot menu for
  standard install (non-AIO - controller install only).

Test plan:
- PASS: Verify successful factory-installation of DX system.
        controller-0 must be unlocked-enabled-available. Other hosts
        should be locked-disabled-offline.
- PASS: Verify factory-install with standard node install: Generate and
        install prestage cloud-init ISO with Standard (controller
        only) boot option.
- PASS: Verify DX reconfiguration with enroll-init-reconfigure
        script.
        - Ensure oam_c0_ip and oam_c1_ip are correctly consumed and
          an error is printed when not specified for DX, for SX these
          must not be specified.
        - Validate endpoint reconfiguration.
- PASS: Verify enroll-init of DX system using dcmanager "subcloud
        add --enroll".

Story: 2011100
Task: 50580

Change-Id: Ia9970378f94e3df0366ce80e7674245d93729c91
Signed-off-by: Salman Rana salman.rana@windriver.com
This commit is contained in:
Salman Rana 2024-07-17 16:53:01 -04:00
parent 3b5d1182d4
commit 4959808c73
3 changed files with 128 additions and 5 deletions

View File

@ -15,11 +15,21 @@ until [ -f /var/run/goenabled ]; do
done
echo "Ready - host goenabled"
system_mode=$(awk -F= '/system_mode/ {print $2}' /etc/platform/platform.conf)
echo "Wait - system deployment reconciled"
while true; do
SYSTEM_RECONCILED=$(kubectl --kubeconfig=/etc/kubernetes/admin.conf -n deployment get system -o jsonpath='{.items[0].status.reconciled}')
if [ "$system_mode" = "duplex" ]; then
SYSTEM_RECONCILED=true
else
SYSTEM_RECONCILED=$(kubectl --kubeconfig=/etc/kubernetes/admin.conf -n deployment get system -o jsonpath='{.items[0].status.reconciled}')
fi
HOST_RECONCILED=$(kubectl --kubeconfig=/etc/kubernetes/admin.conf -n deployment get host controller-0 -o jsonpath='{.status.reconciled}')
if [ $SYSTEM_RECONCILED = true -a $HOST_RECONCILED = true ]; then break; fi
if [ "$SYSTEM_RECONCILED" = true ] && [ "$HOST_RECONCILED" = true ]; then
break
fi
sleep 10
done
echo "Ready - system deployment reconciled"

View File

@ -42,7 +42,9 @@ Usage:
--oam_subnet <subnet>: Specify OAM subnet
--oam_gateway_ip <ip>: Specify OAM gateway IP
--oam_ip <ip>: Specify OAM IP
--oam_ip <ip>: Specify OAM IP (the floating IP for duplex system)
--oam_c0_ip <ip>: Specify Controller-0 OAM IP (required for duplex systems)
--oam_c1_ip <ip>: Specify Controller-1 OAM IP (required for duplex systems)
--new_password <password>: Specify new password for sysadmin user
ENDUSAGE
}
@ -107,9 +109,75 @@ function load_credentials {
}
function reconfigure_OAM {
log_info "Reconfiguring OAM with subnet: $OAM_SUBNET, gateway IP: $OAM_GATEWAY_IP, OAM IP: $OAM_IP..."
system oam-modify oam_subnet="$OAM_SUBNET" oam_gateway_ip="$OAM_GATEWAY_IP" oam_ip="$OAM_IP"
system_mode=$(awk -F= '/system_mode/ {print $2}' /etc/platform/platform.conf)
args="oam_subnet=$OAM_SUBNET oam_gateway_ip=$OAM_GATEWAY_IP"
if [ "$system_mode" = "duplex" ]; then
if [ -z "$OAM_C0_IP" ] || [ -z "$OAM_C1_IP" ]; then
log_fatal "Missing required arguments. Please specify both --oam_c0_ip and --oam_c1_ip"
fi
args="$args oam_floating_ip=$OAM_IP oam_c0_ip=$OAM_C0_IP oam_c1_ip=$OAM_C1_IP"
else
args="$args oam_ip=$OAM_IP"
fi
log_info "Reconfiguring OAM: $args ..."
max_retries=10
retries=0
# May fail if the system command is issued too early,
# before the endpoint is reachable to modify oam.
# TODO(srana): Consider checking a flag/service
while [ $retries -lt $max_retries ]; do
if system oam-modify $args; then
break
fi
log_warn "Failed to modify oam. Retrying in 30s ..."
retries=$((retries + 1))
sleep 30
done
check_rc_die $? "system oam-modify failed"
# Apply manifest for duplex systems
if [ "$system_mode" = "duplex" ]; then
log_info "Applying manifest ..."
source /etc/build.info
OAM_CONFIG_DIR="/tmp/oam_config"
rm -rf $OAM_CONFIG_DIR
mkdir -p $OAM_CONFIG_DIR
# The applied manifest must align with sysinv update_oam_config()
cat > $OAM_CONFIG_DIR/oam_runtime.yml <<EOF
classes: ['platform::network::runtime',
'platform::kubernetes::certsans::runtime',
'platform::firewall::runtime',
'platform::smapi',
'platform::sm::update_oam_config::runtime',
'platform::nfv::webserver::runtime',
'platform::haproxy::runtime',
'openstack::keystone::endpoint::runtime::post',
'platform::dockerdistribution::config',
'platform::dockerdistribution::runtime']
EOF
log_info "Wait 2m for system to settle and run puppet-manifest-apply ..."
sleep 120
/usr/local/bin/puppet-manifest-apply.sh \
/opt/platform/puppet/$SW_VERSION/hieradata/ \
controller-0 \
controller \
runtime \
$OAM_CONFIG_DIR/oam_runtime.yml
check_rc_die $? "puppet-manifest-apply failed"
rm -rf $OAM_CONFIG_DIR
fi
}
function reconfigure_password {
@ -122,6 +190,8 @@ function reconfigure_password {
OAM_SUBNET=""
OAM_GATEWAY_IP=""
OAM_IP=""
OAM_C0_IP=""
OAM_C1_IP=""
NEW_PASSWORD=""
log_info "Starting enroll-init reconfiguration..."
@ -145,6 +215,14 @@ while [[ "$#" -gt 0 ]]; do
OAM_IP="$2"
shift 2
;;
--oam_c0_ip)
OAM_C0_IP="$2"
shift 2
;;
--oam_c1_ip)
OAM_C1_IP="$2"
shift 2
;;
--new_password)
NEW_PASSWORD="$2"
shift 2

View File

@ -87,6 +87,8 @@ Usage:
3 - Prestage cloud-init All-in-one Graphical Console
4 - Prestage cloud-init All-in-one (lowlatency) Serial Console
5 - Prestage cloud-init All-in-one (lowlatency) Graphical Console
6 - Prestage cloud-init Controller Serial Console
7 - Prestage cloud-init Controller Graphical Console
--timeout <menu timeout>:
Specify boot menu timeout, in seconds. (default 30)
A value of -1 will wait forever.
@ -324,6 +326,20 @@ menu begin
append ${COMMON_ARGS_LOW_LATENCY} traits=controller,worker,lowlatency ${CLOUDINIT_BOOT_ARG} console=tty0
menu end
menu begin
menu title Prestage cloud-init Controller Install
label 6
menu label Serial Console
kernel /bzImage-std
ipappend 2
append ${COMMON_ARGS_DEFAULT} traits=controller ${CLOUDINIT_BOOT_ARG} console=ttyS0,115200 console=tty0
label 7
menu label Graphical Console
kernel /bzImage-std
ipappend 2
append ${COMMON_ARGS_DEFAULT} traits=controller ${CLOUDINIT_BOOT_ARG} console=tty0
menu end
EOF
done
for f in ${isodir}/EFI/BOOT/grub.cfg ${EFI_MOUNT}/EFI/BOOT/grub.cfg; do
@ -371,6 +387,17 @@ submenu 'Prestage cloud-init (lowlatency) All-in-one Install' --id=cloud-init-ai
}
}
submenu 'Prestage cloud-init Controller Install' --id=cloud-init-controller {
menuentry 'Serial Console' --id=serial {
linux /bzImage-std ${COMMON_ARGS_DEFAULT} traits=controller ${CLOUDINIT_BOOT_ARG} console=ttyS0,115200 serial
initrd /initrd
}
menuentry 'Graphical Console' --id=graphical {
linux /bzImage-std ${COMMON_ARGS_DEFAULT} traits=controller ${CLOUDINIT_BOOT_ARG} console=tty0
initrd /initrd
}
}
EOF
done
@ -535,6 +562,14 @@ while :; do
DEFAULT_SYSLINUX_ENTRY=5
DEFAULT_GRUB_ENTRY="cloud-init-aio-lowlat>graphical"
;;
6)
DEFAULT_SYSLINUX_ENTRY=6
DEFAULT_GRUB_ENTRY="cloud-init-controller>serial"
;;
7)
DEFAULT_SYSLINUX_ENTRY=7
DEFAULT_GRUB_ENTRY="cloud-init-controller>graphical"
;;
*)
usage
log_fatal "Invalid default boot menu option: ${DEFAULT_LABEL}"