blob: 7d52485c3863503c3b74cfd8c70079ea60dff070 [file] [log] [blame] [view] [edit]
# Pickle Chunked Reading Benchmark
This benchmark measures the performance impact of the chunked reading optimization in GH PR #119204 for the pickle module.
## What This Tests
The PR adds chunked reading (1MB chunks) to prevent memory exhaustion when unpickling large objects:
- **BINBYTES8** - Large bytes objects (protocol 4+)
- **BINUNICODE8** - Large strings (protocol 4+)
- **BYTEARRAY8** - Large bytearrays (protocol 5)
- **FRAME** - Large frames
- **LONG4** - Large integers
- An antagonistic mode that tests using memory denial of service inducing malicious pickles.
## Quick Start
```bash
# Run full benchmark suite (1MiB → 200MiB, takes several minutes)
build/python Tools/picklebench/memory_dos_impact.py
# Test just a few sizes (quick test: 1, 10, 50 MiB)
build/python Tools/picklebench/memory_dos_impact.py --sizes 1 10 50
# Test smaller range for faster results
build/python Tools/picklebench/memory_dos_impact.py --sizes 1 5 10
# Output as markdown for reports
build/python Tools/picklebench/memory_dos_impact.py --format markdown > results.md
# Test with protocol 4 instead of 5
build/python Tools/picklebench/memory_dos_impact.py --protocol 4
```
**Note:** Sizes are specified in MiB. Use `--sizes 1 2 5` for 1MiB, 2MiB, 5MiB objects.
## Antagonistic Mode (DoS Protection Test)
The `--antagonistic` flag tests **malicious pickles** that demonstrate the memory DoS protection:
```bash
# Quick DoS protection test (claims 10, 50, 100 MB but provides 1KB)
build/python Tools/picklebench/memory_dos_impact.py --antagonistic --sizes 10 50 100
# Full DoS test (default: 10, 50, 100, 500, 1000, 5000 MB claimed)
build/python Tools/picklebench/memory_dos_impact.py --antagonistic
```
### What This Tests
Unlike normal benchmarks that test **legitimate pickles**, antagonistic mode tests:
- **Truncated BINBYTES8**: Claims 100MB but provides only 1KB (will fail to unpickle)
- **Truncated BINUNICODE8**: Same for strings
- **Truncated BYTEARRAY8**: Same for bytearrays
- **Sparse memo attacks**: PUT at index 1 billion (would allocate huge array before PR)
**Key difference:**
- **Normal mode**: Tests real data, shows ~5% time overhead
- **Antagonistic mode**: Tests malicious data, shows ~99% memory savings
### Expected Results
```
100MB Claimed (actual: 1KB)
binbytes8_100MB_claim
Peak memory: 1.00 MB (claimed: 100 MB, saved: 99.00 MB, 99.0%)
Error: UnpicklingError ← Expected!
Summary:
Average claimed: 126.2 MB
Average peak: 0.54 MB
Average saved: 125.7 MB (99.6% reduction)
Protection Status: ✓ Memory DoS attacks mitigated by chunked reading
```
**Before PR**: Would allocate full claimed size (100MB+), potentially crash
**After PR**: Allocates 1MB chunks, fails fast with minimal memory
This demonstrates the **security improvement** - protection against memory exhaustion attacks.
## Before/After Comparison
The benchmark includes an automatic comparison feature that runs the same tests on both a baseline and current Python build.
### Option 1: Automatic Comparison (Recommended)
Build both versions, then use `--baseline` to automatically compare:
```bash
# Build the baseline (main branch without PR)
git checkout main
mkdir -p build-main
cd build-main && ../configure && make -j $(nproc) && cd ..
# Build the current version (with PR)
git checkout unpickle-overallocate
mkdir -p build
cd build && ../configure && make -j $(nproc) && cd ..
# Run automatic comparison (quick test with a few sizes)
build/python Tools/picklebench/memory_dos_impact.py \
--baseline build-main/python \
--sizes 1 10 50
# Full comparison (all default sizes)
build/python Tools/picklebench/memory_dos_impact.py \
--baseline build-main/python
```
The comparison output shows:
- Side-by-side metrics (Current vs Baseline)
- Percentage change for time and memory
- Overall summary statistics
### Interpreting Comparison Results
- **Time change**: Small positive % is expected (chunking adds overhead, typically 5-10%)
- **Memory change**: Negative % is good (chunking saves memory, especially for large objects)
- **Trade-off**: Slightly slower but much safer against memory exhaustion attacks
### Option 2: Manual Comparison
Save results separately and compare manually:
```bash
# Baseline results
build-main/python Tools/picklebench/memory_dos_impact.py --format json > baseline.json
# Current results
build/python Tools/picklebench/memory_dos_impact.py --format json > current.json
# Manual comparison
diff -y <(jq '.' baseline.json) <(jq '.' current.json)
```
## Understanding the Results
### Critical Sizes
The default test suite includes:
- **< 1MiB (999,000 bytes)**: No chunking, allocates full size upfront
- **= 1MiB (1,048,576 bytes)**: Threshold, chunking just starts
- **> 1MiB (1,048,577 bytes)**: Chunked reading engaged
- **1, 2, 5, 10MiB**: Show scaling behavior with chunking
- **20, 50, 100, 200MiB**: Stress test large object handling
**Note:** The full suite may require more than 16GiB of RAM.
### Key Metrics
- **Time (mean)**: Average unpickling time - should be similar before/after
- **Time (stdev)**: Consistency - lower is better
- **Peak Memory**: Maximum memory during unpickling - **expected to be LOWER after PR**
- **Pickle Size**: Size of the serialized data on disk
### Test Types
| Test | What It Stresses |
|------|------------------|
| `bytes_*` | BINBYTES8 opcode, raw binary data |
| `string_ascii_*` | BINUNICODE8 with simple ASCII |
| `string_utf8_*` | BINUNICODE8 with multibyte UTF-8 (€ chars) |
| `bytearray_*` | BYTEARRAY8 opcode (protocol 5) |
| `list_large_items_*` | Multiple chunked reads in sequence |
| `dict_large_values_*` | Chunking in dict deserialization |
| `nested_*` | Realistic mixed data structures |
| `tuple_*` | Immutable structures |
## Expected Results
### Before PR (main branch)
- Single large allocation per object
- Risk of memory exhaustion with malicious pickles
### After PR (unpickle-overallocate branch)
- Chunked allocation (1MB at a time)
- **Slightly higher CPU time** (multiple allocations + resizing)
- **Significantly lower peak memory** (no large pre-allocation)
- Protection against DoS via memory exhaustion
## Advanced Usage
### Test Specific Sizes
```bash
# Test only 5MiB and 10MiB objects
build/python Tools/picklebench/memory_dos_impact.py --sizes 5 10
# Test large objects: 50, 100, 200 MiB
build/python Tools/picklebench/memory_dos_impact.py --sizes 50 100 200
```
### More Iterations for Stable Timing
```bash
# Run 10 iterations per test for better statistics
build/python Tools/picklebench/memory_dos_impact.py --iterations 10 --sizes 1 10
```
### JSON Output for Analysis
```bash
# Generate JSON for programmatic analysis
build/python Tools/picklebench/memory_dos_impact.py --format json | python -m json.tool
```
## Interpreting Memory Results
The **peak memory** metric shows the maximum memory allocated during unpickling:
- **Without chunking**: Allocates full size immediately
- 10MB object 10MB allocation upfront
- **With chunking**: Allocates in 1MB chunks, grows geometrically
- 10MB object starts with 1MB, grows: 2MB, 4MB, 8MB (final: ~10MB total)
- Peak is lower because allocation is incremental
## Typical Results
On a system with the PR applied, you should see:
```
1.00MiB Test Results
bytes_1.00MiB: ~0.3ms, 1.00MiB peak (just at threshold)
2.00MiB Test Results
bytes_2.00MiB: ~0.8ms, 2.00MiB peak (chunked: 1MiB → 2MiB)
10.00MiB Test Results
bytes_10.00MiB: ~3-5ms, 10.00MiB peak (chunked: 1→2→4→8→10 MiB)
```
Time overhead is minimal (~10-20% for very large objects), but memory safety is significantly improved.