Language
PiC · π / constants compression · BD × AI Lab

PiC — constants as compression generators

The original seed was simple: if a file contains bits that can be described as digits or bits of π, why store those bits directly? PiC turns that seed into a strict MDL experiment: use π, e, φ, √2 and other deterministic streams only when a pointer, transform and residual are cheaper than ordinary storage or compression.

Short name: PiC = Pi Compression / Constants Compression. It is the compression sibling of PiX.

Seed → bridge → test → result

PiC preserves the original intuition without turning it into a magical claim. The idea is allowed to live, but only behind a hard accounting gate.

Seed
π coordinates

Can data in a file be replaced by coordinates inside π?

Bridge
generator tokens

Store constant ID, offset, length, transform and residual instead of raw bits.

Test
MDL battle

Accept only if the whole description is shorter than RAW, zstd, PNG/FLAC, or other baselines.

Result
not claimed yet

PiC is a falsifiable experiment, not a statement that π compresses ordinary files.

Clean formulation: a file chunk may be encoded as constant_stream[offset:length] + transform + residual, but only if this is cheaper and reconstructs exactly, or meets a declared lossy quality target.

Core idea: the file and the constant are both streams

A file is not naturally decimal text. It is a byte stream and therefore also a bit stream. π is a decimal digit stream, but it can be converted into several binary or symbolic streams. PiC searches for useful alignments between them.

File side

file bytes: 137 080 078 071 ...
file bits:  10001001 01010000 01001110 01000111 ...
chunk:      b[i : i+n]

Constant side

π digits:   314159265358979323846...
π as BCD:   0011 0001 0100 0001 0101 1001 ...
π as bytes: 24 3F 6A 88 ...  (hex/binary stream)
1. ChunkTake a file region: bytes, bits, pixels, samples, or symbols.
2. GenerateCreate π/e/φ/√2 streams under several encodings.
3. AlignTry offset, length, stride, mapping, transform.
4. ResidualStore only the difference if the match is not exact.
5. GateMDL accepts or rejects the candidate.

Tiny toy example

This is the simple version you were imagining: a file bit segment equals π digits converted to bits. The encoder stores the π pointer instead of the literal segment.

Literal file segment

file bits around segment:
11001010 0011000101000001010110010010 01101100
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         28 bits to replace

π pointer

π digits: 3 1 4 1 5 9 2
BCD:      0011 0001 0100 0001 0101 1001 0010

store:
mode      = PI_BCD
offset    = 0
len_digits= 7
The catch: the pointer has its own cost. If the token costs 60 bits, replacing a 28-bit segment loses. PiC only becomes real compression when the saved segment plus residual beats the token cost and the ordinary codec baseline.

Constant stream modes

PiC should not treat π as the only source. π is one deterministic tape in a family of tapes. The claim only becomes interesting if a constant beats random controls and other constants under the same budget.

ModeMeaningBest useRisk
PI_BCDDecimal digits encoded as 4-bit nibbles.Your direct “π digits to bits” idea.Wasteful: only 10 of 16 nibble values used.
PI_ASCIIDigits stored as ASCII bytes: “3”, “1”, “4”…Text-like or symbolic files.Usually bad for binary files.
PI_DEC3File bytes written as decimal triples 000–255, then searched in π.Human-readable bridge from bytes to decimal digits.Triples above 255 are invalid unless remapped.
PI_HEX / PI_BINUse hexadecimal/binary expansion of constants as true bytes/bits.Cleanest raw byte-level PiC mode.Exact matches should be rare in normal data.
CONST_XOREncode chunk as constant stream XOR residual.Near-matches and sparse differences.Residual may cost as much as original.
CONST_CASeed a cellular automaton from π/e/φ/√2 and compare generated texture.Images, masks, grids, synthetic textures.Easy to overfit without controls.
CONST_LOSSYUse constants as texture/noise/basis for controlled lossy coding.Image/audio/sensor residual shaping.Must beat normal lossy codecs at same quality.
πeφ√2log 2ζ(3)ChampernowneRNG control

Lossless PiC: exact reconstruction or reject

The strict version must reconstruct the exact original bytes. It is the right first test because it cannot hide behind visual similarity.

Accepted chunk

A PiC token is kept only if token + residual + checksum is smaller than the best ordinary candidate.

Verification

Decoder regenerates the constant stream, applies transform/residual, then verifies SHA3 or another strong hash of the reconstructed chunk.

Rejected chunk

If the pointer cost or residual cost loses, the encoder stores RAW, zstd, lzma, PNG, FLAC, or the normal 8Z winner.

Possible token shape

PiCChunk {
  mode:        PI_BIN | PI_BCD | PI_DEC3 | CONST_XOR | CONST_CA
  constant_id: PI | E | PHI | SQRT2 | RNG_CONTROL
  offset:      unsigned integer
  length:      unsigned integer
  stride:      optional integer
  transform:   optional transform ID + params
  residual:    compressed residual bytes
  hash:        reconstructed chunk hash
}

Lossy PiC: more hits, stricter controls

Yes: if lossy compression is allowed, π and other constants will be “found” much more often. That is powerful, but dangerous. Approximate matches are easy; meaningful wins are hard.

Image / texture direction

Constants can act as deterministic grain, masks, phase fields, tile orderings, or residual bases. Test at equal PSNR/SSIM or a fixed perceptual error budget.

Audio / sensor direction

Constants can act as deterministic dither/noise/residual scaffolds. Test at equal SNR, spectral error, or perceptual score against normal codecs.

Lossy rule: PiC must not ask “can π approximate this?” It must ask “does π/e/φ/√2 plus residual achieve better bitrate–distortion than random streams and standard codecs?”

The MDL gate: the part that keeps PiC honest

The core formula is simple. PiC is not accepted because it is beautiful. It is accepted only when the total description is cheaper.

Lossless objective

L_total = L_token + L_transform + L_residual + L_hash
accept if:
L_total < min(L_RAW, L_zstd, L_lzma, L_png, L_8Z)

Lossy objective

L_total = L_token + L_transform + L_residual + λD
accept if:
rate is lower at equal declared distortion D

Pointer economy toy calculator

Use this to feel the central obstacle. Small matches lose because the pointer is bigger than the data it replaces.

...

Controls and kill tests

PiC becomes real only if it survives boring controls. The goal is not to protect π. The goal is to discover whether constants give any useful generator family at all.

Good control

Compare π against e, φ, √2, a cryptographic-quality deterministic random stream, shuffled π, and standard codecs.

Watch-out

If random wins equally often, the signal is not π. It may still be useful as generator search, but not as a special π claim.

Kill condition

If PiC cannot beat controls after fair tuning, keep it as a creative/scouting branch, not as a compression claim.

Pi Hunter vs PiC

Pi Hunter was an important precursor, but it was not yet this raw-file compressor. It searched symbolic image textures against π/CA streams. PiC is the sharper file-compression formulation.

SystemWhat it readsWhat it searchesStatus
Pi Hunter v2.3Image resized/quantized into a symbolic grid.π digit streams, π-seeded CA rules, spatial mappings, residual scoring.Scout / texture testbed, not a full codec.
PiXSequences, levels, rhythms, schedules, decision candidates.π as universal MDL candidate generator across domains.Umbrella page and sequence laboratory.
PiCRaw bytes/bits/samples/pixels/chunks.Constant pointer + transform + residual under MDL.This page: clean spec seed for the next experiment.

Next build: PiC Scout

The smallest useful program is not a full 8Z integration. It is a scout that directly tests your original idea on chunks and reports honest wins/losses.

Minimum modes

1. PI_BCD exact
2. PI_BIN exact
3. PI_DEC3 byte triples
4. CONST_XOR residual
5. CONST_CA symbolic texture
6. RNG/e/φ/√2 controls

Best first inputs

raw BMP / TIFF / PGM
raw WAV / PCM
FASTA / CSV / plain text
synthetic planted controls
already-compressed PNG/JPEG/MP3 as negative controls
Acceptance criterion: at least one real corpus class shows statistically repeatable savings against best ordinary baseline and random/control streams, with exact reconstruction for lossless mode or equal declared distortion for lossy mode.

8Z integration idea

PiC should eventually become one optional generator family inside 8Z, not a replacement for normal compression. DCC chooses when to try it; MDL decides whether it survives.

Encoder role

Try PiC candidates only on chunks where cheap scouts suggest possible structure. Record timing, false positives, residual size, and final MDL decision.

Decoder role

No search. The decoder only regenerates the declared constant stream, applies declared transform/residual, and verifies the hash.

Important: decoder simplicity is sacred. All expensive hunting belongs to the encoder/scout side.