Tools/picklebench/README.md - external/github.com/python/cpython - Git at Google

 # Pickle Chunked Reading Benchmark

 This benchmark measures the performance impact of the chunked reading optimization in GH PR #119204 for the pickle module.

 ## What This Tests

 The PR adds chunked reading (1MB chunks) to prevent memory exhaustion when unpickling large objects:
 - **BINBYTES8** - Large bytes objects (protocol 4+)
 - **BINUNICODE8** - Large strings (protocol 4+)
 - **BYTEARRAY8** - Large bytearrays (protocol 5)
 - **FRAME** - Large frames
 - **LONG4** - Large integers
 - An antagonistic mode that tests using memory denial of service inducing malicious pickles.

 ## Quick Start

 ```bash
 # Run full benchmark suite (1MiB → 200MiB, takes several minutes)
 build/python Tools/picklebench/memory_dos_impact.py

 # Test just a few sizes (quick test: 1, 10, 50 MiB)
 build/python Tools/picklebench/memory_dos_impact.py --sizes 1 10 50

 # Test smaller range for faster results
 build/python Tools/picklebench/memory_dos_impact.py --sizes 1 5 10

 # Output as markdown for reports
 build/python Tools/picklebench/memory_dos_impact.py --format markdown > results.md

 # Test with protocol 4 instead of 5
 build/python Tools/picklebench/memory_dos_impact.py --protocol 4
 ```

 **Note:** Sizes are specified in MiB. Use `--sizes 1 2 5` for 1MiB, 2MiB, 5MiB objects.

 ## Antagonistic Mode (DoS Protection Test)

 The `--antagonistic` flag tests **malicious pickles** that demonstrate the memory DoS protection:

 ```bash
 # Quick DoS protection test (claims 10, 50, 100 MB but provides 1KB)
 build/python Tools/picklebench/memory_dos_impact.py --antagonistic --sizes 10 50 100

 # Full DoS test (default: 10, 50, 100, 500, 1000, 5000 MB claimed)
 build/python Tools/picklebench/memory_dos_impact.py --antagonistic
 ```

 ### What This Tests

 Unlike normal benchmarks that test **legitimate pickles**, antagonistic mode tests:
 - **Truncated BINBYTES8**: Claims 100MB but provides only 1KB (will fail to unpickle)
 - **Truncated BINUNICODE8**: Same for strings
 - **Truncated BYTEARRAY8**: Same for bytearrays
 - **Sparse memo attacks**: PUT at index 1 billion (would allocate huge array before PR)

 **Key difference:**
 - **Normal mode**: Tests real data, shows ~5% time overhead
 - **Antagonistic mode**: Tests malicious data, shows ~99% memory savings

 ### Expected Results

 ```
 100MB Claimed (actual: 1KB)
   binbytes8_100MB_claim
     Peak memory:     1.00 MB (claimed: 100 MB, saved: 99.00 MB, 99.0%)
     Error: UnpicklingError  ← Expected!

 Summary:
   Average claimed: 126.2 MB
   Average peak:    0.54 MB
   Average saved:   125.7 MB (99.6% reduction)
 Protection Status: ✓ Memory DoS attacks mitigated by chunked reading
 ```

 **Before PR**: Would allocate full claimed size (100MB+), potentially crash
 **After PR**: Allocates 1MB chunks, fails fast with minimal memory

 This demonstrates the **security improvement** - protection against memory exhaustion attacks.

 ## Before/After Comparison

 The benchmark includes an automatic comparison feature that runs the same tests on both a baseline and current Python build.

 ### Option 1: Automatic Comparison (Recommended)

 Build both versions, then use `--baseline` to automatically compare:

 ```bash
 # Build the baseline (main branch without PR)
 git checkout main
 mkdir -p build-main
 cd build-main && ../configure && make -j $(nproc) && cd ..

 # Build the current version (with PR)
 git checkout unpickle-overallocate
 mkdir -p build
 cd build && ../configure && make -j $(nproc) && cd ..

 # Run automatic comparison (quick test with a few sizes)
 build/python Tools/picklebench/memory_dos_impact.py \
   --baseline build-main/python \
   --sizes 1 10 50

 # Full comparison (all default sizes)
 build/python Tools/picklebench/memory_dos_impact.py \
   --baseline build-main/python
 ```

 The comparison output shows:
 - Side-by-side metrics (Current vs Baseline)
 - Percentage change for time and memory
 - Overall summary statistics

 ### Interpreting Comparison Results

 - **Time change**: Small positive % is expected (chunking adds overhead, typically 5-10%)
 - **Memory change**: Negative % is good (chunking saves memory, especially for large objects)
 - **Trade-off**: Slightly slower but much safer against memory exhaustion attacks

 ### Option 2: Manual Comparison

 Save results separately and compare manually:

 ```bash
 # Baseline results
 build-main/python Tools/picklebench/memory_dos_impact.py --format json > baseline.json

 # Current results
 build/python Tools/picklebench/memory_dos_impact.py --format json > current.json

 # Manual comparison
 diff -y <(jq '.' baseline.json) <(jq '.' current.json)
 ```

 ## Understanding the Results

 ### Critical Sizes

 The default test suite includes:
 - **< 1MiB (999,000 bytes)**: No chunking, allocates full size upfront
 - **= 1MiB (1,048,576 bytes)**: Threshold, chunking just starts
 - **> 1MiB (1,048,577 bytes)**: Chunked reading engaged
 - **1, 2, 5, 10MiB**: Show scaling behavior with chunking
 - **20, 50, 100, 200MiB**: Stress test large object handling

 **Note:** The full suite may require more than 16GiB of RAM.

 ### Key Metrics

 - **Time (mean)**: Average unpickling time - should be similar before/after
 - **Time (stdev)**: Consistency - lower is better
 - **Peak Memory**: Maximum memory during unpickling - **expected to be LOWER after PR**
 - **Pickle Size**: Size of the serialized data on disk

 ### Test Types

 | Test | What It Stresses |
 |------|------------------|
 | `bytes_*` | BINBYTES8 opcode, raw binary data |
 | `string_ascii_*` | BINUNICODE8 with simple ASCII |
 | `string_utf8_*` | BINUNICODE8 with multibyte UTF-8 (€ chars) |
 | `bytearray_*` | BYTEARRAY8 opcode (protocol 5) |
 | `list_large_items_*` | Multiple chunked reads in sequence |
 | `dict_large_values_*` | Chunking in dict deserialization |
 | `nested_*` | Realistic mixed data structures |
 | `tuple_*` | Immutable structures |

 ## Expected Results

 ### Before PR (main branch)
 - Single large allocation per object
 - Risk of memory exhaustion with malicious pickles

 ### After PR (unpickle-overallocate branch)
 - Chunked allocation (1MB at a time)
 - **Slightly higher CPU time** (multiple allocations + resizing)
 - **Significantly lower peak memory** (no large pre-allocation)
 - Protection against DoS via memory exhaustion

 ## Advanced Usage

 ### Test Specific Sizes

 ```bash
 # Test only 5MiB and 10MiB objects
 build/python Tools/picklebench/memory_dos_impact.py --sizes 5 10

 # Test large objects: 50, 100, 200 MiB
 build/python Tools/picklebench/memory_dos_impact.py --sizes 50 100 200
 ```

 ### More Iterations for Stable Timing

 ```bash
 # Run 10 iterations per test for better statistics
 build/python Tools/picklebench/memory_dos_impact.py --iterations 10 --sizes 1 10
 ```

 ### JSON Output for Analysis

 ```bash
 # Generate JSON for programmatic analysis
 build/python Tools/picklebench/memory_dos_impact.py --format json | python -m json.tool
 ```

 ## Interpreting Memory Results

 The **peak memory** metric shows the maximum memory allocated during unpickling:

 - **Without chunking**: Allocates full size immediately
   - 10MB object → 10MB allocation upfront

 - **With chunking**: Allocates in 1MB chunks, grows geometrically
   - 10MB object → starts with 1MB, grows: 2MB, 4MB, 8MB (final: ~10MB total)
   - Peak is lower because allocation is incremental

 ## Typical Results

 On a system with the PR applied, you should see:

 ```
 1.00MiB Test Results
   bytes_1.00MiB:     ~0.3ms, 1.00MiB peak  (just at threshold)

 2.00MiB Test Results
   bytes_2.00MiB:     ~0.8ms, 2.00MiB peak  (chunked: 1MiB → 2MiB)

 10.00MiB Test Results
   bytes_10.00MiB:    ~3-5ms, 10.00MiB peak (chunked: 1→2→4→8→10 MiB)
 ```

 Time overhead is minimal (~10-20% for very large objects), but memory safety is significantly improved.
	# Pickle Chunked Reading Benchmark

	This benchmark measures the performance impact of the chunked reading optimization in GH PR #119204 for the pickle module.

	## What This Tests

	The PR adds chunked reading (1MB chunks) to prevent memory exhaustion when unpickling large objects:
	- BINBYTES8 - Large bytes objects (protocol 4+)
	- BINUNICODE8 - Large strings (protocol 4+)
	- BYTEARRAY8 - Large bytearrays (protocol 5)
	- FRAME - Large frames
	- LONG4 - Large integers
	- An antagonistic mode that tests using memory denial of service inducing malicious pickles.

	## Quick Start

	```bash
	# Run full benchmark suite (1MiB → 200MiB, takes several minutes)
	build/python Tools/picklebench/memory_dos_impact.py

	# Test just a few sizes (quick test: 1, 10, 50 MiB)
	build/python Tools/picklebench/memory_dos_impact.py --sizes 1 10 50

	# Test smaller range for faster results
	build/python Tools/picklebench/memory_dos_impact.py --sizes 1 5 10

	# Output as markdown for reports
	build/python Tools/picklebench/memory_dos_impact.py --format markdown > results.md

	# Test with protocol 4 instead of 5
	build/python Tools/picklebench/memory_dos_impact.py --protocol 4
	```

	Note: Sizes are specified in MiB. Use `--sizes 1 2 5` for 1MiB, 2MiB, 5MiB objects.

	## Antagonistic Mode (DoS Protection Test)

	The `--antagonistic` flag tests malicious pickles that demonstrate the memory DoS protection:

	```bash
	# Quick DoS protection test (claims 10, 50, 100 MB but provides 1KB)
	build/python Tools/picklebench/memory_dos_impact.py --antagonistic --sizes 10 50 100

	# Full DoS test (default: 10, 50, 100, 500, 1000, 5000 MB claimed)
	build/python Tools/picklebench/memory_dos_impact.py --antagonistic
	```

	### What This Tests

	Unlike normal benchmarks that test legitimate pickles, antagonistic mode tests:
	- Truncated BINBYTES8: Claims 100MB but provides only 1KB (will fail to unpickle)
	- Truncated BINUNICODE8: Same for strings
	- Truncated BYTEARRAY8: Same for bytearrays
	- Sparse memo attacks: PUT at index 1 billion (would allocate huge array before PR)

	Key difference:
	- Normal mode: Tests real data, shows ~5% time overhead
	- Antagonistic mode: Tests malicious data, shows ~99% memory savings

	### Expected Results

	```
	100MB Claimed (actual: 1KB)
	binbytes8_100MB_claim
	Peak memory: 1.00 MB (claimed: 100 MB, saved: 99.00 MB, 99.0%)
	Error: UnpicklingError ← Expected!

	Summary:
	Average claimed: 126.2 MB
	Average peak: 0.54 MB
	Average saved: 125.7 MB (99.6% reduction)
	Protection Status: ✓ Memory DoS attacks mitigated by chunked reading
	```

	Before PR: Would allocate full claimed size (100MB+), potentially crash
	After PR: Allocates 1MB chunks, fails fast with minimal memory

	This demonstrates the security improvement - protection against memory exhaustion attacks.

	## Before/After Comparison

	The benchmark includes an automatic comparison feature that runs the same tests on both a baseline and current Python build.

	### Option 1: Automatic Comparison (Recommended)

	Build both versions, then use `--baseline` to automatically compare:

	```bash
	# Build the baseline (main branch without PR)
	git checkout main
	mkdir -p build-main
	cd build-main && ../configure && make -j $(nproc) && cd ..

	# Build the current version (with PR)
	git checkout unpickle-overallocate
	mkdir -p build
	cd build && ../configure && make -j $(nproc) && cd ..

	# Run automatic comparison (quick test with a few sizes)
	build/python Tools/picklebench/memory_dos_impact.py \
	--baseline build-main/python \
	--sizes 1 10 50

	# Full comparison (all default sizes)
	build/python Tools/picklebench/memory_dos_impact.py \
	--baseline build-main/python
	```

	The comparison output shows:
	- Side-by-side metrics (Current vs Baseline)
	- Percentage change for time and memory
	- Overall summary statistics

	### Interpreting Comparison Results

	- Time change: Small positive % is expected (chunking adds overhead, typically 5-10%)
	- Memory change: Negative % is good (chunking saves memory, especially for large objects)
	- Trade-off: Slightly slower but much safer against memory exhaustion attacks

	### Option 2: Manual Comparison

	Save results separately and compare manually:

	```bash
	# Baseline results
	build-main/python Tools/picklebench/memory_dos_impact.py --format json > baseline.json

	# Current results
	build/python Tools/picklebench/memory_dos_impact.py --format json > current.json

	# Manual comparison
	diff -y <(jq '.' baseline.json) <(jq '.' current.json)
	```

	## Understanding the Results

	### Critical Sizes

	The default test suite includes:
	- < 1MiB (999,000 bytes): No chunking, allocates full size upfront
	- = 1MiB (1,048,576 bytes): Threshold, chunking just starts
	- > 1MiB (1,048,577 bytes): Chunked reading engaged
	- 1, 2, 5, 10MiB: Show scaling behavior with chunking
	- 20, 50, 100, 200MiB: Stress test large object handling

	Note: The full suite may require more than 16GiB of RAM.

	### Key Metrics

	- Time (mean): Average unpickling time - should be similar before/after
	- Time (stdev): Consistency - lower is better
	- Peak Memory: Maximum memory during unpickling - expected to be LOWER after PR
	- Pickle Size: Size of the serialized data on disk

	### Test Types

	\| Test \| What It Stresses \|
	\|------\|------------------\|
	\| `bytes_*` \| BINBYTES8 opcode, raw binary data \|
	\| `string_ascii_*` \| BINUNICODE8 with simple ASCII \|
	\| `string_utf8_*` \| BINUNICODE8 with multibyte UTF-8 (€ chars) \|
	\| `bytearray_*` \| BYTEARRAY8 opcode (protocol 5) \|
	\| `list_large_items_*` \| Multiple chunked reads in sequence \|
	\| `dict_large_values_*` \| Chunking in dict deserialization \|
	\| `nested_*` \| Realistic mixed data structures \|
	\| `tuple_*` \| Immutable structures \|

	## Expected Results

	### Before PR (main branch)
	- Single large allocation per object
	- Risk of memory exhaustion with malicious pickles

	### After PR (unpickle-overallocate branch)
	- Chunked allocation (1MB at a time)
	- Slightly higher CPU time (multiple allocations + resizing)
	- Significantly lower peak memory (no large pre-allocation)
	- Protection against DoS via memory exhaustion

	## Advanced Usage

	### Test Specific Sizes

	```bash
	# Test only 5MiB and 10MiB objects
	build/python Tools/picklebench/memory_dos_impact.py --sizes 5 10

	# Test large objects: 50, 100, 200 MiB
	build/python Tools/picklebench/memory_dos_impact.py --sizes 50 100 200
	```

	### More Iterations for Stable Timing

	```bash
	# Run 10 iterations per test for better statistics
	build/python Tools/picklebench/memory_dos_impact.py --iterations 10 --sizes 1 10
	```

	### JSON Output for Analysis

	```bash
	# Generate JSON for programmatic analysis
	build/python Tools/picklebench/memory_dos_impact.py --format json \| python -m json.tool
	```

	## Interpreting Memory Results

	The peak memory metric shows the maximum memory allocated during unpickling:

	- Without chunking: Allocates full size immediately
	- 10MB object → 10MB allocation upfront

	- With chunking: Allocates in 1MB chunks, grows geometrically
	- 10MB object → starts with 1MB, grows: 2MB, 4MB, 8MB (final: ~10MB total)
	- Peak is lower because allocation is incremental

	## Typical Results

	On a system with the PR applied, you should see:

	```
	1.00MiB Test Results
	bytes_1.00MiB: ~0.3ms, 1.00MiB peak (just at threshold)

	2.00MiB Test Results
	bytes_2.00MiB: ~0.8ms, 2.00MiB peak (chunked: 1MiB → 2MiB)

	10.00MiB Test Results
	bytes_10.00MiB: ~3-5ms, 10.00MiB peak (chunked: 1→2→4→8→10 MiB)
	```

	Time overhead is minimal (~10-20% for very large objects), but memory safety is significantly improved.