Add cdssync migration test dataset tooling

Add the cdssync migration test dataset manifest, generator script,
workspace instructions, and gitignore.

This sets the default workflow to:
- generate the dataset locally
- copy it to the test machine with metadata preserved
- verify the copied data before migration testing
This commit is contained in:
2026-04-20 11:49:41 -04:00
parent 4f56ff9c4d
commit bb1cb37dc2
4 changed files with 458 additions and 0 deletions

3
cdssync/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
tmp-dataset-test/
tmp-metadata-check/
migration-test-dataset/

53
cdssync/AGENTS.md Normal file
View File

@@ -0,0 +1,53 @@
# AI Agent Instructions for `cdssync`
These instructions apply anywhere under `/home/aw/code/cds/cdssync`.
## Migration Test Dataset Workflow
For migration test datasets in this workspace, follow this process by default:
1. Generate the dataset locally from this workspace.
2. Preserve the local generated dataset as the canonical original copy.
3. Copy the dataset to the test machine using metadata-preserving tooling.
4. Verify the copied dataset on the test machine before using it for migration testing.
## Generation Rules
- Use `/home/aw/code/cds/cdssync/generate_migration_test_dataset.sh` to create the dataset unless the user explicitly asks for a different method.
- Prefer `/home/aw/code/cds/cdssync/migration-test-dataset` as the local canonical dataset location unless the user specifies another target.
- If ACL/xattr coverage matters, ensure the generation host has:
- `acl` installed for `setfacl` and `getfacl`
- `attr` installed for `setfattr` and `getfattr`
## Copy Rules
- Use `rsync -aHAX` by default when copying the dataset to another machine.
- Preserve permissions, timestamps, symlinks, hard links, ACLs, and xattrs.
- Do not use GUI copy/paste or non-preserving copy methods for this dataset unless the user explicitly asks for that.
## Verification Rules
After copying to a test machine, verify at least:
- file and directory structure
- permissions
- symlinks
- hard links
- timestamps
- ACLs
- xattrs
Preferred verification commands include:
- `find DEST_DIR | sort`
- `stat DEST_DIR/regular/script_3mb_700.sh`
- `stat DEST_DIR/readonly-dir/locked_text_1mb_444.txt`
- `readlink DEST_DIR/links/symlink_to_text_1mb_644.txt`
- `stat DEST_DIR/regular/random_3mb_644.bin DEST_DIR/links/hardlink_to_random_3mb_644.bin`
- `getfacl -p DEST_DIR/metadata/acl_text_1mb_644.txt`
- `getfattr -d DEST_DIR/metadata/xattr_text_1mb_644.txt`
## Destination Host Requirements
- If the destination host lacks `acl` or `attr`, ACL/xattr verification will be incomplete.
- If the destination filesystem does not support ACLs or xattrs, those attributes may not survive transfer even when the copy method is correct.

View File

@@ -0,0 +1,275 @@
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<'EOF'
Usage:
generate_migration_test_dataset.sh TARGET_DIR
Creates a compact filesystem migration test dataset under TARGET_DIR.
The dataset matches the manifest in migration-test-manifest.md.
Notes:
- Existing TARGET_DIR contents are left in place unless they collide.
- ACL and xattr cases are created only if the local tools are available.
- Sparse files are created with logical size but low physical allocation.
EOF
}
if [[ $# -ne 1 ]]; then
usage
exit 1
fi
TARGET_DIR=$1
ROOT=$(realpath -m "$TARGET_DIR")
mkdir -p "$ROOT"
have_setfacl=0
have_setfattr=0
if command -v setfacl >/dev/null 2>&1; then
have_setfacl=1
fi
if command -v setfattr >/dev/null 2>&1; then
have_setfattr=1
fi
create_dir() {
mkdir -p "$ROOT/$1"
}
set_times() {
local rel=$1
local stamp=$2
touch -a -m -t "$stamp" "$ROOT/$rel"
}
write_text() {
local path=$1
local mib=$2
local bytes=$((mib * 1024 * 1024))
perl -e '
my ($target, $label) = @ARGV;
my $chunk = "Migration text payload for $label\n";
while (length($chunk) < 8192) { $chunk .= $chunk; }
while ($target > 0) {
my $part = substr($chunk, 0, $target > length($chunk) ? length($chunk) : $target);
print $part;
$target -= length($part);
}
' "$bytes" "$path" >"$ROOT/$path"
}
write_compressible() {
local path=$1
local mib=$2
local bytes=$((mib * 1024 * 1024))
perl -e '
my ($target) = @ARGV;
my $chunk = "A" x 8192;
while ($target > 0) {
my $part = substr($chunk, 0, $target > length($chunk) ? length($chunk) : $target);
print $part;
$target -= length($part);
}
' "$bytes" >"$ROOT/$path"
}
write_random() {
local path=$1
local mib=$2
dd if=/dev/urandom of="$ROOT/$path" bs=1M count="$mib" status=none
}
write_script() {
local path=$1
local mib=$2
cat >"$ROOT/$path" <<'EOF'
#!/usr/bin/env bash
echo "migration test script"
EOF
local current_size
current_size=$(wc -c <"$ROOT/$path")
local target_size=$((mib * 1024 * 1024))
if (( current_size < target_size )); then
dd if=/dev/zero bs=1 count=$((target_size - current_size)) status=none | tr '\0' '#' >>"$ROOT/$path"
fi
}
write_empty() {
: >"$ROOT/$1"
}
write_sparse() {
local path=$1
local mib=$2
truncate -s "${mib}M" "$ROOT/$path"
}
apply_mode() {
chmod "$2" "$ROOT/$1"
}
make_file() {
local path=$1
local type=$2
local mib=$3
local mode=$4
create_dir "$(dirname "$path")"
case "$type" in
text) write_text "$path" "$mib" ;;
random) write_random "$path" "$mib" ;;
compressible) write_compressible "$path" "$mib" ;;
script) write_script "$path" "$mib" ;;
empty) write_empty "$path" ;;
sparse) write_sparse "$path" "$mib" ;;
*)
echo "Unknown type: $type" >&2
exit 1
;;
esac
apply_mode "$path" "$mode"
}
create_base_dirs() {
create_dir "regular"
create_dir "hidden"
create_dir "spaces in name"
create_dir "deep/tree/level1/level2/level3"
create_dir "readonly-dir"
create_dir "links"
create_dir "metadata"
create_dir "empty-dirs/empty_a"
create_dir "empty-dirs/empty_b"
create_dir "empty-dirs/.hidden_empty_dir"
create_dir "readonly-dir/no_write_subdir"
}
create_regular_files() {
make_file "regular/text_1mb_644.txt" text 1 0644
make_file "regular/text_3mb_600.txt" text 3 0600
make_file "regular/text_5mb_755.txt" text 5 0755
make_file "regular/random_1mb_600.bin" random 1 0600
make_file "regular/random_3mb_644.bin" random 3 0644
make_file "regular/random_5mb_755.bin" random 5 0755
make_file "regular/compressible_1mb_644.log" compressible 1 0644
make_file "regular/compressible_3mb_600.log" compressible 3 0600
make_file "regular/compressible_5mb_755.log" compressible 5 0755
make_file "regular/script_1mb_755.sh" script 1 0755
make_file "regular/script_3mb_700.sh" script 3 0700
make_file "regular/script_5mb_755.sh" script 5 0755
make_file "regular/sparse_1mb_600.img" sparse 1 0600
make_file "regular/sparse_3mb_600.img" sparse 3 0600
make_file "regular/sparse_5mb_600.img" sparse 5 0600
make_file "regular/empty_000_644.txt" empty 0 0644
make_file "regular/empty_001_600.txt" empty 0 0600
make_file "regular/empty_002_755.txt" empty 0 0755
}
create_named_variants() {
make_file "hidden/.hidden_text_1mb_644.txt" text 1 0644
make_file "hidden/.hidden_random_3mb_600.bin" random 3 0600
make_file "hidden/.hidden_script_1mb_755.sh" script 1 0755
make_file "hidden/.hidden_empty_644" empty 0 0644
make_file "hidden/.hidden_sparse_5mb_600.img" sparse 5 0600
make_file "spaces in name/file with spaces text 1mb 644.txt" text 1 0644
make_file "spaces in name/file with spaces random 3mb 600.bin" random 3 0600
make_file "spaces in name/file with spaces script 1mb 755.sh" script 1 0755
make_file "spaces in name/file with spaces empty 644" empty 0 0644
make_file "spaces in name/file with spaces sparse 5mb 600.img" sparse 5 0600
make_file "regular/longname_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa_text_1mb_644.txt" text 1 0644
make_file "regular/longname_bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb_random_3mb_600.bin" random 3 0600
make_file "regular/longname_cccccccccccccccccccccccccccccccc_compressible_5mb_755.log" compressible 5 0755
}
create_deep_and_duplicate_cases() {
make_file "deep/tree/level1/level2/level3/deep_text_1mb_644.txt" text 1 0644
make_file "deep/tree/level1/level2/level3/deep_random_3mb_600.bin" random 3 0600
make_file "deep/tree/level1/level2/level3/deep_script_1mb_755.sh" script 1 0755
make_file "deep/tree/level1/level2/level3/deep_sparse_5mb_600.img" sparse 5 0600
make_file "regular/dup_source_text_3mb_644.txt" text 3 0644
cp "$ROOT/regular/dup_source_text_3mb_644.txt" "$ROOT/regular/dup_copy_a_text_3mb_600.txt"
cp "$ROOT/regular/dup_source_text_3mb_644.txt" "$ROOT/deep/tree/level1/level2/dup_copy_b_text_3mb_755.txt"
chmod 0600 "$ROOT/regular/dup_copy_a_text_3mb_600.txt"
chmod 0755 "$ROOT/deep/tree/level1/level2/dup_copy_b_text_3mb_755.txt"
}
create_time_and_readonly_cases() {
make_file "regular/old_text_1mb_644.txt" text 1 0644
make_file "regular/recent_text_1mb_644.txt" text 1 0644
make_file "regular/futureish_text_1mb_644.txt" text 1 0644
set_times "regular/old_text_1mb_644.txt" 201801020304
set_times "regular/recent_text_1mb_644.txt" 202604191530
set_times "regular/futureish_text_1mb_644.txt" 203001020304
make_file "readonly-dir/locked_text_1mb_444.txt" text 1 0444
make_file "readonly-dir/locked_random_3mb_400.bin" random 3 0400
make_file "readonly-dir/locked_script_1mb_500.sh" script 1 0500
chmod 0555 "$ROOT/readonly-dir/no_write_subdir"
}
create_links() {
ln -s ../regular/text_1mb_644.txt "$ROOT/links/symlink_to_text_1mb_644.txt"
ln -s ../deep/tree/level1/level2/level3/deep_random_3mb_600.bin "$ROOT/links/symlink_to_deep_random_3mb_600.bin"
ln -s ../hidden/.hidden_text_1mb_644.txt "$ROOT/links/symlink_to_hidden_file"
ln "$ROOT/regular/random_3mb_644.bin" "$ROOT/links/hardlink_to_random_3mb_644.bin"
ln "$ROOT/regular/compressible_5mb_755.log" "$ROOT/links/hardlink_to_compressible_5mb_755.log"
}
create_metadata_cases() {
make_file "metadata/xattr_text_1mb_644.txt" text 1 0644
make_file "metadata/xattr_random_3mb_600.bin" random 3 0600
make_file "metadata/acl_text_1mb_644.txt" text 1 0644
make_file "metadata/acl_script_1mb_755.sh" script 1 0755
if (( have_setfattr )); then
setfattr -n user.migration_case -v "xattr-text" "$ROOT/metadata/xattr_text_1mb_644.txt"
setfattr -n user.migration_case -v "xattr-random" "$ROOT/metadata/xattr_random_3mb_600.bin"
else
echo "Skipping xattr assignment: setfattr not available"
fi
if (( have_setfacl )); then
setfacl -m u:nobody:r-- "$ROOT/metadata/acl_text_1mb_644.txt"
setfacl -m u:nobody:r-x "$ROOT/metadata/acl_script_1mb_755.sh"
else
echo "Skipping ACL assignment: setfacl not available"
fi
}
write_summary() {
cat >"$ROOT/GENERATION_SUMMARY.txt" <<EOF
Dataset root: $ROOT
Manifest: migration-test-manifest.md
Optional metadata support:
- setfacl available: $have_setfacl
- setfattr available: $have_setfattr
Notes:
- Sparse files have logical size with low physical allocation.
- Hard links share inode data with their source file.
- Read-only files and directories may require elevated privileges to modify later.
EOF
}
create_base_dirs
create_regular_files
create_named_variants
create_deep_and_duplicate_cases
create_time_and_readonly_cases
create_links
create_metadata_cases
write_summary
echo "Created migration test dataset at: $ROOT"

View File

@@ -0,0 +1,127 @@
# Migration Test Dataset Manifest
This manifest defines a compact, high-value filesystem test set for validating file migration behavior. It is intended to cover common file-content, naming, metadata, and directory edge cases without generating an unnecessarily large corpus.
## Recommended Root Layout
- `regular/`
- `hidden/`
- `spaces in name/`
- `deep/tree/level1/level2/level3/`
- `readonly-dir/`
- `links/`
- `metadata/`
- `empty-dirs/`
## Test Objects
### Regular Files
- `regular/text_1mb_644.txt`
- `regular/text_3mb_600.txt`
- `regular/text_5mb_755.txt`
- `regular/random_1mb_600.bin`
- `regular/random_3mb_644.bin`
- `regular/random_5mb_755.bin`
- `regular/compressible_1mb_644.log`
- `regular/compressible_3mb_600.log`
- `regular/compressible_5mb_755.log`
- `regular/script_1mb_755.sh`
- `regular/script_3mb_700.sh`
- `regular/script_5mb_755.sh`
- `regular/sparse_1mb_600.img`
- `regular/sparse_3mb_600.img`
- `regular/sparse_5mb_600.img`
- `regular/empty_000_644.txt`
- `regular/empty_001_600.txt`
- `regular/empty_002_755.txt`
### Hidden Files
- `hidden/.hidden_text_1mb_644.txt`
- `hidden/.hidden_random_3mb_600.bin`
- `hidden/.hidden_script_1mb_755.sh`
- `hidden/.hidden_empty_644`
- `hidden/.hidden_sparse_5mb_600.img`
### Files With Spaces
- `spaces in name/file with spaces text 1mb 644.txt`
- `spaces in name/file with spaces random 3mb 600.bin`
- `spaces in name/file with spaces script 1mb 755.sh`
- `spaces in name/file with spaces empty 644`
- `spaces in name/file with spaces sparse 5mb 600.img`
### Long-Name Files
- `regular/longname_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa_text_1mb_644.txt`
- `regular/longname_bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb_random_3mb_600.bin`
- `regular/longname_cccccccccccccccccccccccccccccccc_compressible_5mb_755.log`
### Deep Path Files
- `deep/tree/level1/level2/level3/deep_text_1mb_644.txt`
- `deep/tree/level1/level2/level3/deep_random_3mb_600.bin`
- `deep/tree/level1/level2/level3/deep_script_1mb_755.sh`
- `deep/tree/level1/level2/level3/deep_sparse_5mb_600.img`
### Duplicate-Content Cases
- `regular/dup_source_text_3mb_644.txt`
- `regular/dup_copy_a_text_3mb_600.txt`
- `deep/tree/level1/level2/dup_copy_b_text_3mb_755.txt`
### Timestamp Variants
- `regular/old_text_1mb_644.txt`
- `regular/recent_text_1mb_644.txt`
- `regular/futureish_text_1mb_644.txt`
### Read-Only Or Awkward Placement Cases
- `readonly-dir/locked_text_1mb_444.txt`
- `readonly-dir/locked_random_3mb_400.bin`
- `readonly-dir/locked_script_1mb_500.sh`
### Links
- `links/symlink_to_text_1mb_644.txt`
- `links/symlink_to_deep_random_3mb_600.bin`
- `links/symlink_to_hidden_file`
- `links/hardlink_to_random_3mb_644.bin`
- `links/hardlink_to_compressible_5mb_755.log`
### Directories
- `empty-dirs/empty_a/`
- `empty-dirs/empty_b/`
- `empty-dirs/.hidden_empty_dir/`
- `readonly-dir/no_write_subdir/`
- `deep/tree/level1/level2/level3/`
### Metadata Cases
These should only be created if the source filesystem supports them and the test environment allows them.
- `metadata/xattr_text_1mb_644.txt`
- `metadata/xattr_random_3mb_600.bin`
- `metadata/acl_text_1mb_644.txt`
- `metadata/acl_script_1mb_755.sh`
## Approximate Storage
Estimated real disk usage for this manifest:
- core allocated files: about `95 MiB` to `125 MiB`
- with filesystem overhead and modest headroom: plan for about `150 MiB`
- comfortable reserve for later additions: `250 MiB`
Important notes:
- sparse files may report a logical size of `1 MiB` to `5 MiB` while using much less physical disk space
- symlinks, hard links, directories, ACLs, xattrs, and empty files add little compared with regular allocated files
- if you later expand this set with more size permutations or more metadata variants, storage will grow mostly with the fully allocated non-sparse files
## Usage Recommendation
Use this directory as the canonical definition of the source dataset. Generate the files once, preserve the original unchanged, and transfer a copy to the source test machine using metadata-preserving tooling such as `rsync -aH`, `cp -a`, or a tar archive workflow.