`tpwalk bruteforce`¶

Actively enumerate candidate URLs that the passive sources never reference, by crossing a date-path generator with a model-name generator and HEAD-checking each candidate against the S3 origin.

tpwalk bruteforce [OPTIONS]

This issues a lot of requests

The heavy tiers generate tens to hundreds of millions of candidate URLs. Always start with --dry-run to size the job, and bound it with --max-candidates.

Options¶

Option	Default	Description
`--strategy`	`all`	Which generator(s) to run: `dates`, `models`, or `all`.
`--thorough`	off	Model strategy over ~389 known GPL date directories (medium coverage, ~40M HEADs).
`--exhaustive`	off	Full model × date-path cross (~203M HEADs). Always pair with `--max-candidates`.
`--max-candidates`	none	Hard cap on candidates checked — the safety valve for the heavy tiers.
`--dry-run`	off	Count candidates without issuing any HEAD requests.
`-c`, `--concurrency`	`100`	Maximum concurrent S3 HEAD requests.
`--data-dir`	`data`	Root data directory; writes a timestamped run directory here.

Strategies¶

dates — walks the date-hierarchical /upload/gpl-code/YYYY/YYYYMM/YYYYMMDD/ prefix space.
models — crosses extracted firmware model names with the empirical GPL filename patterns mined from the known corpus.
all — runs both.

Reference data lives under data/

The model strategy extracts model tokens from data/firmware_s3_listing.json (an 8.7 MB S3 index) and mines filename patterns from the GPL corpus at data/scrapes/seed/gpl_urls_master.txt. Both are committed to the repository, so a cloned checkout works out of the box. Neither is included in the installed wheel, so run bruteforce from a checkout — or place those files under data/ — for full recall; without them the model strategy produces no candidates and the date strategy falls back to bare date-path enumeration.

Examples¶

# Size the full job without touching the network
tpwalk bruteforce --exhaustive --dry-run

# Date paths only, capped
tpwalk bruteforce --strategy dates --max-candidates 1000000

# Thorough model sweep, bounded
tpwalk bruteforce --strategy models --thorough --max-candidates 500000

Confirmed-live hits are appended to per-strategy .txt files in a timestamped run directory; run verify afterward to fold them into the manifests.

tpwalk bruteforce¶

Options¶

Strategies¶

Examples¶

`tpwalk bruteforce`¶