Add cdssync migration test dataset tooling
Add the cdssync migration test dataset manifest, generator script, workspace instructions, and gitignore. This sets the default workflow to: - generate the dataset locally - copy it to the test machine with metadata preserved - verify the copied data before migration testing
This commit is contained in:
127
cdssync/migration-test-manifest.md
Normal file
127
cdssync/migration-test-manifest.md
Normal file
@@ -0,0 +1,127 @@
|
||||
# Migration Test Dataset Manifest
|
||||
|
||||
This manifest defines a compact, high-value filesystem test set for validating file migration behavior. It is intended to cover common file-content, naming, metadata, and directory edge cases without generating an unnecessarily large corpus.
|
||||
|
||||
## Recommended Root Layout
|
||||
|
||||
- `regular/`
|
||||
- `hidden/`
|
||||
- `spaces in name/`
|
||||
- `deep/tree/level1/level2/level3/`
|
||||
- `readonly-dir/`
|
||||
- `links/`
|
||||
- `metadata/`
|
||||
- `empty-dirs/`
|
||||
|
||||
## Test Objects
|
||||
|
||||
### Regular Files
|
||||
|
||||
- `regular/text_1mb_644.txt`
|
||||
- `regular/text_3mb_600.txt`
|
||||
- `regular/text_5mb_755.txt`
|
||||
- `regular/random_1mb_600.bin`
|
||||
- `regular/random_3mb_644.bin`
|
||||
- `regular/random_5mb_755.bin`
|
||||
- `regular/compressible_1mb_644.log`
|
||||
- `regular/compressible_3mb_600.log`
|
||||
- `regular/compressible_5mb_755.log`
|
||||
- `regular/script_1mb_755.sh`
|
||||
- `regular/script_3mb_700.sh`
|
||||
- `regular/script_5mb_755.sh`
|
||||
- `regular/sparse_1mb_600.img`
|
||||
- `regular/sparse_3mb_600.img`
|
||||
- `regular/sparse_5mb_600.img`
|
||||
- `regular/empty_000_644.txt`
|
||||
- `regular/empty_001_600.txt`
|
||||
- `regular/empty_002_755.txt`
|
||||
|
||||
### Hidden Files
|
||||
|
||||
- `hidden/.hidden_text_1mb_644.txt`
|
||||
- `hidden/.hidden_random_3mb_600.bin`
|
||||
- `hidden/.hidden_script_1mb_755.sh`
|
||||
- `hidden/.hidden_empty_644`
|
||||
- `hidden/.hidden_sparse_5mb_600.img`
|
||||
|
||||
### Files With Spaces
|
||||
|
||||
- `spaces in name/file with spaces text 1mb 644.txt`
|
||||
- `spaces in name/file with spaces random 3mb 600.bin`
|
||||
- `spaces in name/file with spaces script 1mb 755.sh`
|
||||
- `spaces in name/file with spaces empty 644`
|
||||
- `spaces in name/file with spaces sparse 5mb 600.img`
|
||||
|
||||
### Long-Name Files
|
||||
|
||||
- `regular/longname_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa_text_1mb_644.txt`
|
||||
- `regular/longname_bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb_random_3mb_600.bin`
|
||||
- `regular/longname_cccccccccccccccccccccccccccccccc_compressible_5mb_755.log`
|
||||
|
||||
### Deep Path Files
|
||||
|
||||
- `deep/tree/level1/level2/level3/deep_text_1mb_644.txt`
|
||||
- `deep/tree/level1/level2/level3/deep_random_3mb_600.bin`
|
||||
- `deep/tree/level1/level2/level3/deep_script_1mb_755.sh`
|
||||
- `deep/tree/level1/level2/level3/deep_sparse_5mb_600.img`
|
||||
|
||||
### Duplicate-Content Cases
|
||||
|
||||
- `regular/dup_source_text_3mb_644.txt`
|
||||
- `regular/dup_copy_a_text_3mb_600.txt`
|
||||
- `deep/tree/level1/level2/dup_copy_b_text_3mb_755.txt`
|
||||
|
||||
### Timestamp Variants
|
||||
|
||||
- `regular/old_text_1mb_644.txt`
|
||||
- `regular/recent_text_1mb_644.txt`
|
||||
- `regular/futureish_text_1mb_644.txt`
|
||||
|
||||
### Read-Only Or Awkward Placement Cases
|
||||
|
||||
- `readonly-dir/locked_text_1mb_444.txt`
|
||||
- `readonly-dir/locked_random_3mb_400.bin`
|
||||
- `readonly-dir/locked_script_1mb_500.sh`
|
||||
|
||||
### Links
|
||||
|
||||
- `links/symlink_to_text_1mb_644.txt`
|
||||
- `links/symlink_to_deep_random_3mb_600.bin`
|
||||
- `links/symlink_to_hidden_file`
|
||||
- `links/hardlink_to_random_3mb_644.bin`
|
||||
- `links/hardlink_to_compressible_5mb_755.log`
|
||||
|
||||
### Directories
|
||||
|
||||
- `empty-dirs/empty_a/`
|
||||
- `empty-dirs/empty_b/`
|
||||
- `empty-dirs/.hidden_empty_dir/`
|
||||
- `readonly-dir/no_write_subdir/`
|
||||
- `deep/tree/level1/level2/level3/`
|
||||
|
||||
### Metadata Cases
|
||||
|
||||
These should only be created if the source filesystem supports them and the test environment allows them.
|
||||
|
||||
- `metadata/xattr_text_1mb_644.txt`
|
||||
- `metadata/xattr_random_3mb_600.bin`
|
||||
- `metadata/acl_text_1mb_644.txt`
|
||||
- `metadata/acl_script_1mb_755.sh`
|
||||
|
||||
## Approximate Storage
|
||||
|
||||
Estimated real disk usage for this manifest:
|
||||
|
||||
- core allocated files: about `95 MiB` to `125 MiB`
|
||||
- with filesystem overhead and modest headroom: plan for about `150 MiB`
|
||||
- comfortable reserve for later additions: `250 MiB`
|
||||
|
||||
Important notes:
|
||||
|
||||
- sparse files may report a logical size of `1 MiB` to `5 MiB` while using much less physical disk space
|
||||
- symlinks, hard links, directories, ACLs, xattrs, and empty files add little compared with regular allocated files
|
||||
- if you later expand this set with more size permutations or more metadata variants, storage will grow mostly with the fully allocated non-sparse files
|
||||
|
||||
## Usage Recommendation
|
||||
|
||||
Use this directory as the canonical definition of the source dataset. Generate the files once, preserve the original unchanged, and transfer a copy to the source test machine using metadata-preserving tooling such as `rsync -aH`, `cp -a`, or a tar archive workflow.
|
||||
Reference in New Issue
Block a user