Files
config/sysinv/ipsec-auth/files
Andy Ning b6ed785641 Fix IPsec certificate renewal fails intermittently
This update contains 3 changes to fix the issue that IPsec certificate
renewal cronjob fails intermittently on some nodes in the system.

- Currently ipsec-server caches kubernetes apiserver client certificate
and key in the default /tmp directory. But the files in this directory
will be cleaned periodically by systemd tmpfiles services. Once the
cached files are cleaned, the ipsec-server will fail to query apiserver,
and generate following SSL error:

SSLError(FileNotFoundError(2, 'No such file or directory')))

This change set up ipsec-server's TMPDIR environment variable in its
systemd service unit file, so it will cache cert/key in
/var/run/ipsec-server. This would avoid these cached files being
cleaned.

- Increase the ots token expiry time in ipsec-server, because it is
observed that in some systems (especially in a system with large number
of nodes) the certificate renewal procedure takes more than 12s. The
current setting of 7s will fail the procedure since the token is
expired in the middle.

- Enhance the IPsec certificate renewal script so that it will do up to
3 retries to renew the certificate to cover corner failure cases.

Test Plan:
PASS: In a DX system, manually made the change, and reload systemd and
      restart ipsec-server by:
      systemctl daemon-reload
      pmon-restart ipsec-server

      Verify /var/run/ipsec-server directory is created.

      Run "ipsec-client -o 2 pxecontroller" to renew IPsec certificate,
      verify the renewal is successful, and the following 3 tmp files
      are created (file names are random):

      [(keystone_admin)]# ls -lart /var/run/ipsec-server
      total 12
      -rw-------  1 root root 1814 Apr 24 18:40 tmp5843ie2q
      -rw-------  1 root root 1675 Apr 24 18:40 tmpyxwl1bzv
      -rw-------  1 root root 1501 Apr 24 18:40 tmpvep2c6hn

PASS: Remove the cached files in /var/run/ipsec-server, and re-run
      ipsec-client to renew certificate, verify that the SSL error is
      generated in ipsec-auth.log.

PASS: Deploy a DX system, verify the deployment is successful. The tmp
      directory is created, cache files are present, IPsec is working
      properly.

      Run ipsec-client to renew IPsec certificate, verify the certifate
      renewal is successful.

PASS: Simulate a situation where the ipsec-client would fail the cert
      renewal, run the cronjob script (/usr/bin/ipsec-cert-renew.sh),
      and verify from cron.log that the script retries 3 times.

PASS: Simulate a situation where the ipsec-client would successfully
      renew the certificate, run the cronjob script
      (/usr/bin/ipsec-cert-renew.sh), and verify that the cronjob
      successfully renewed IPsec cert, and there is no fm alarm.

Closes-Bug: 2109198

Change-Id: Iefd75a2e8fb217fe12b96240902548a529615a35
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2025-05-02 11:12:36 -04:00
..
2024-11-01 09:46:04 -04:00