Add bulk data generation controls for folder count, files per folder, file size range, and bulk dataset size limits. Also update the cdssync docs to describe the new options and how update mode applies to generated bulk files.
68 lines
3.2 KiB
Markdown
68 lines
3.2 KiB
Markdown
# AI Agent Instructions for `cdssync`
|
|
|
|
These instructions apply anywhere under `/home/aw/code/cds/cdssync`.
|
|
|
|
## Migration Test Dataset Workflow
|
|
|
|
For migration test datasets in this workspace, follow this process by default:
|
|
|
|
1. Generate the dataset locally from this workspace.
|
|
2. Preserve the local generated dataset as the canonical original copy.
|
|
3. Copy the dataset to the test machine using metadata-preserving tooling.
|
|
4. Verify the copied dataset on the test machine before using it for migration testing.
|
|
|
|
## Generation Rules
|
|
|
|
- Use `/home/aw/code/cds/cdssync/generate_migration_test_dataset.sh` to create the dataset unless the user explicitly asks for a different method.
|
|
- Prefer `/home/aw/code/cds/cdssync/migration-test-dataset` as the local canonical dataset location unless the user specifies another target.
|
|
- The generator script accepts an optional `UPDATE_INTERVAL_SECONDS` argument:
|
|
- omit it to create the dataset once and exit
|
|
- use `0` for continuous random content updates
|
|
- use any integer greater than `0` to rewrite mutable files every `N` seconds
|
|
- The generator script also accepts `--update-only`:
|
|
- use it to update an existing dataset in place without recreating files, links, or directories
|
|
- combine it with `UPDATE_INTERVAL_SECONDS` to keep mutating an existing dataset on a fixed interval
|
|
- The generator script can also create additional bulk test data under `bulk/`:
|
|
- `--folder-count N` controls how many bulk folders are created
|
|
- `--files-per-folder N` controls how many bulk files are created in each folder
|
|
- `--min-file-size-mib N` and `--max-file-size-mib N` control the random size range for bulk files
|
|
- `--max-dataset-size-mib N` caps the total size of generated bulk files only
|
|
- once bulk files exist, update mode rewrites them too as part of the mutable-content set
|
|
- If ACL/xattr coverage matters, ensure the generation host has:
|
|
- `acl` installed for `setfacl` and `getfacl`
|
|
- `attr` installed for `setfattr` and `getfattr`
|
|
|
|
## Copy Rules
|
|
|
|
- Use `rsync -aHAX` by default when copying the dataset to another machine.
|
|
- Preserve permissions, timestamps, symlinks, hard links, ACLs, and xattrs.
|
|
- Do not use GUI copy/paste or non-preserving copy methods for this dataset unless the user explicitly asks for that.
|
|
|
|
## Verification Rules
|
|
|
|
After copying to a test machine, verify at least:
|
|
|
|
- file and directory structure
|
|
- permissions
|
|
- symlinks
|
|
- hard links
|
|
- timestamps
|
|
- ACLs
|
|
- xattrs
|
|
|
|
Preferred verification commands include:
|
|
|
|
- `find DEST_DIR | sort`
|
|
- `stat DEST_DIR/regular/script_3mb_700.sh`
|
|
- `stat DEST_DIR/readonly-dir/locked_text_1mb_444.txt`
|
|
- `readlink DEST_DIR/links/symlink_to_text_1mb_644.txt`
|
|
- `stat DEST_DIR/regular/random_3mb_644.bin DEST_DIR/links/hardlink_to_random_3mb_644.bin`
|
|
- `getfacl -p DEST_DIR/metadata/acl_text_1mb_644.txt`
|
|
- `getfattr -d DEST_DIR/metadata/xattr_text_1mb_644.txt`
|
|
|
|
## Destination Host Requirements
|
|
|
|
- If the destination host lacks `acl` or `attr`, ACL/xattr verification will be incomplete.
|
|
- If the destination filesystem does not support ACLs or xattrs, those attributes may not survive transfer even when the copy method is correct.
|
|
- The generator now logs and continues when ACL/xattr assignment is unsupported on the target filesystem instead of exiting.
|