Files
cds-ai/atvm/docs/setup/guide.md
anthony.wen 86b1a0e4a9 Scrub tracked secrets and switch ATVM docs to local credential references
- remove hardcoded credentials, tokens, registration codes, and similar secret values from tracked ATVM and CDS MCP docs
- replace those values with references to /home/aw/code/cds/.env.credentials.local and the corresponding environment variable names
- update current operator guides to instruct sourcing .env.credentials.local before credential-dependent setup and automation workflows
- update the ATVM setup scripts to consume ATVM_TARGET_PASSWORD from the environment instead of hardcoding the Ubuntu root SSH password
- scrub the remaining tracked artifact log entry that still included the old CMC registration code
- keep the local-only credential inventory in .env.credentials.local while leaving that file untracked
2026-03-24 17:32:44 -04:00

172 lines
7.7 KiB
Markdown

# ATVM Setup Script Guide
This file is guide-only documentation for running and maintaining the ATVM setup workflow.
Do not put dated run examples here.
## Scope
- Client setup script: `/home/aw/code/cds/atvm/scripts/atvm-setup-script.sh`
- Controller wrapper: `/home/aw/code/cds/atvm/scripts/run-atvm-setup-and-collect-log.sh`
- Run-learnings log: `/home/aw/code/cds/atvm/docs/setup/run-learnings.md`
## Purpose
The setup flow performs a controlled bootstrap across supported Linux distributions:
1. Validate target host identity using expected IP + expected hostname before any configuration.
2. Fix repositories (especially CD/DVD media repo entries).
3. On Ubuntu, configure root SSH password-login workflow using `ATVM_TARGET_PASSWORD` for follow-on root operations.
4. On Oracle Linux, set default boot kernel to non-UEK when available.
5. Disable unattended auto-upgrades on Ubuntu.
6. Remove specific storage-related packages and install base tooling.
7. Disable SELinux on Red Hat-family systems.
8. Configure static IP as the final step.
9. Print final summary and write logs to `atvm_setup_script.log`.
10. On SELinux-capable distros, reboot and verify runtime SELinux status post-reboot.
11. Keep client powered on after successful setup so controller-side log collection + SHA256 verification can complete.
12. Power off from controller only after successful verification and no setup errors.
## Execution Model
- Shell safety flags: `set -euo pipefail`
- Logging: colorized console + plain text log file
- Entry point: `main "$@"`
- Default operator assumption for setup access: source `/home/aw/code/cds/.env.credentials.local` and use `ATVM_TARGET_USER` plus `ATVM_TARGET_PASSWORD` unless explicitly overridden.
- When the operator refers to `192.168.3.191`, treat it as the default ATVM target host.
- For SSH to `192.168.3.191`, ignore host key mismatch by default with `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null`.
- For SSH to `192.168.3.191`, source `/home/aw/code/cds/.env.credentials.local` and use `ATVM_TARGET_USER` plus `ATVM_TARGET_PASSWORD` unless the operator explicitly provides different credentials.
## Mandatory Identity Gate
Setup must not start unless operator explicitly provides both values:
- `--expected-ip <ip>`
- `--expected-hostname <hostname>`
Rules:
- Connect to the operator-provided target IP directly.
- Do not pre-scan alternate candidate IPs.
- Do not infer hostname from target.
- If hostname is missing from request, stop and ask for it.
- If detected hostname does not exactly match expected hostname, stop immediately.
- If expected IP is not assigned on target, stop immediately.
## Canonical Run Order
1. `parse_args`
2. `validate_target_host_identity`
3. `check_sudo`
4. `fix_repositories`
5. `configure_ubuntu_root_ssh_access` (Ubuntu only)
6. `install_sudo_if_needed`
7. `configure_oracle_non_uek_kernel` (Oracle Linux only)
8. `disable_ubuntu_auto_upgrades` (Ubuntu only)
9. `run_package_installation`
10. `disable_selinux` (RHEL-family only)
11. `configure_static_ip` (final configuration step)
12. `print_final_summary`
13. `reboot_and_verify_selinux_if_needed`
14. `poweroff_client_if_successful` (controller-driven after verification)
## Core Behavior By Step
### Repository Fix
- Debian/Ubuntu: comment `cdrom` entries in apt lists and run `apt-get update`.
- RHEL-family/Oracle: disable media/cdrom/dvd repo entries and run `yum clean all && yum makecache`.
- Fedora: same model via `dnf clean all && dnf makecache`.
- openSUSE/SLES: disable CD/DVD repos with `zypper mr -d` and refresh.
### Oracle Linux Kernel Handling
- Oracle Linux only.
- Select first non-UEK kernel via `grubby --info=ALL` and set GRUB default.
- Track whether default changed and whether reboot is required.
### Ubuntu Root SSH Workflow
- Ubuntu only.
- Require `ATVM_TARGET_PASSWORD` in the environment, then set the root password to that value and unlock the root account.
- Write `/etc/ssh/sshd_config.d/99-atvm-root-login.conf` enabling root + password auth.
- Validate config and restart SSH service.
### Ubuntu Auto-Upgrade Disable
- Ubuntu only.
- Update `/etc/apt/apt.conf.d/20auto-upgrades` to disable periodic update/upgrade actions.
### Package Installation
- Package manager detection order: `apt-get`, `dnf`, `yum`, `zypper`, `pacman`, `apk`.
- Pre-cleanup removes multipath/iSCSI packages where applicable.
- Installs kernel headers per distro.
- Base package set includes:
`curl wget git vim perl gdb scsitools net-tools parted fio ca-certificates python3 elfutils-libelf-devel`
### SELinux Disable
- RHEL-family only.
- If enforcing/permissive, backup and rewrite `/etc/selinux/config` to disabled.
- Marks reboot recommendation/requirement in summary.
### Static IP Configuration (Final Step)
Hardcoded target values:
- IP: `192.168.3.191`
- Prefix: `22`
- Gateway: `192.168.0.1`
- DNS: `8.8.8.8`, `8.8.4.4`
Interface detection priority:
1. default-route interface
2. first non-loopback interface with IPv4
3. first non-loopback interface from link list
Network-stack handling includes `netplan`, `NetworkManager`/`nmcli`, `wicked`, and legacy `ifcfg` fallback patterns.
### SELinux Reboot Verification
- Applies to `rhel`, `centos`, `rocky`, `almalinux`, `fedora`, `ol` when SELinux changed.
- Creates one-time systemd verifier service before reboot.
- Post-reboot service records runtime `getenforce` and self-removes.
- On success/no real errors, keeps client on for controller log copy/hash verification before controller power-off.
- On errors, leaves client on for manual inspection.
## Power-State Rules
- After successful setup, keep client powered on until controller log collection + SHA256 verification completes.
- If verification succeeds and no real error lines exist (`^\[ERROR\]`), controller powers off client.
- If any real error lines exist, keep client powered on.
## Logging and Verification
- Client log filename: `atvm_setup_script.log`
- Common client log path when run as root: `/root/atvm_setup_script.log`
- Controller collected log naming: `atvm_configuration_<hostname>_<yyyymmdd_hhmmss>.log`
Required post-run validation:
1. Copy client log to controller `atvm/log/` path.
2. Compare SHA256 between client and copied controller log.
3. Require exact match.
## Preferred Execution Commands
Direct client execution:
```bash
source /home/aw/code/cds/.env.credentials.local
sudo bash /home/cirrususer/atvm-setup-script.sh \
--expected-ip <current-client-ip> \
--expected-hostname <exact-hostname>
```
Controller run + collect:
```bash
source /home/aw/code/cds/.env.credentials.local
EXPECTED_IP_ARG=<current-client-ip> EXPECTED_HOSTNAME_ARG=<exact-hostname> \
/home/aw/code/cds/atvm/scripts/run-atvm-setup-and-collect-log.sh
```
Controller collect-only after client run:
```bash
source /home/aw/code/cds/.env.credentials.local
/home/aw/code/cds/atvm/scripts/run-atvm-setup-and-collect-log.sh --collect-after-complete
```
## Troubleshooting
- If local collected log is missing, do not rerun full setup just for log recovery.
- Use collect-only mode and verify SHA256 after copy.
- If wrapper appears stuck after IP/reboot transition, stop older wrapper sessions and run one fresh collect-only session.
- If `sshpass` is missing on controller, wrapper can still run but may require repeated interactive password prompts.
## Operational Caveats
- Not fully idempotent for all paths; repeated runs may rewrite network configs and create multiple backups.
- Static IP values are hardcoded; adjust before use in other environments.
- Run in maintenance windows because network changes can interrupt active sessions.
- Preserve host identity gating; do not weaken expected IP/hostname checks.
## Update Rule
- After each run, update this file only for guide/rule/checklist/default behavior changes.
- Put run-specific outcomes in `run-learnings.md` only when the run produced a new learning.