Methodology

How we build
the data.

WashIndex is built by professionals who care about getting it right and showing the work. This page documents the pipeline, the signals, the validation, the aggregation logic, the scoring, and the limitations. Nothing is hidden — including what the data can’t do.

Open the platform Jump to the pipeline

On this page

01 Pipeline
02 Signals
03 Validation
04 Aggregation
05 Format & pricing
06 Scope diagnosis
07 PE thesis scoring
08 Refresh
09 Limitations
10 Sources & licensing
11 Versioning

Pipeline

From public sources to per-location signal.

The pipeline takes three streams of raw input — customer reviews, operator websites, and direct outreach to locations — and produces structured operating-quality data at the location level. Five stages, run on rolling cadences, with every output traceable back to its source.

Stage 01

Collect

Public reviews scraped across 64 jurisdictions, operator websites scraped for format and pricing, and automated voice agents called to locations whose pricing isn't published online.

→

Stage 02

Resolve

Each input joined to a verified location, deduplicated across sources, and mapped to its operator (DBA, parent, sponsor where known).

→

Stage 03

Extract

Per-review structured extraction across 55 fields using a language-model pipeline with JSON-schema enforcement. Format and pricing parsed from website and call transcripts in parallel.

→

Stage 04

Aggregate

Per-review fields rolled up to per-location metrics. Confidence intervals tied to review density and recency. Format and pricing tagged with source provenance.

→

Stage 05

Score

Per-location operating-quality scores plus six PE thesis scores, joined with format and pricing for every wash with sufficient signal.

13M+ reviews + operator websites + automated calls → 80,000+ scored locations · refreshed monthly

Signals

How the schema is designed.

Each review is processed against a JSON schema that defines 55 structured fields. The schema is enforced at extraction time: every output validates against the schema or the review is reprocessed. We don’t publish the full field list — the schema is the proprietary part of the pipeline — but the design principles behind it are public, and several of the signals it produces are named throughout our public research and customer-facing outputs.

Design 01

Behavioral over rating

Star ratings carry too little information. The schema prioritizes specific behavioral signals reviewers describe — what happened, how staff responded, whether the customer came back — over global sentiment scores.

Design 02

Multi-dimensional sentiment

Different aspects of operations matter to different buyers. Sentiment is decomposed across distinct dimensions — wash quality, staff, equipment, value, wait — rather than collapsed into a single score.

Design 03

Event detection

Specific events — damage, breakdowns, cancellation friction, resolution outcomes — are extracted as discrete fields with explicit presence/absence semantics. This is what enables rate calculations and inflection-point detection.

Design 04

Underwriting-relevant fields

Fields are chosen for their relevance to institutional decisions: PE diligence, insurance underwriting, and operator benchmarking. Signals that don't help one of those audiences don't earn a slot in the schema.

Design 05

Conditional sub-fields

Several fields are conditional — they only apply when a parent signal is present. A damage event, for instance, triggers downstream fields that are otherwise null. This keeps the per-review record honest about what's known versus inferred.

Design 06

Metadata for downstream weighting

Review-level metadata (timestamps, length, owner-response presence) feeds the aggregation layer's recency and quality weights. The metadata fields aren't published as outputs but materially shape per-location confidence intervals.

Representative signals from the schema

Damage rate — reviewer-reported physical damage to customer vehicles
Customer satisfaction — multi-dimensional sentiment scoring
Format classification — express tunnel, full-service, in-bay, self-serve, hybrid, detail
Membership penetration — frequency of membership mentions in reviews
Cancellation friction — pain rate when customers describe canceling membership
Equipment-specific signals — references to dryers, vacuums, breakdowns
Operator quality consistency — variance in metrics across a chain's locations
Wait and throughput signals — line length and time complaints

These signals are named in our research, sales materials, and customer-facing outputs. They’re a subset of the full 55-field schema. The complete schema, including field-level definitions, conditional logic, and validation rules, is shared under engagement with customers who need it for their own modeling. Request the full schema.

Validation

How we know the data is right.

Extraction quality is measured against hand-validated samples — reviews coded directly by a human reviewer and used as ground truth. The samples are stratified to represent the full distribution of formats, regions, and sentiment polarities present in the corpus, rather than drawn at random. Stratification matters because rare-but-high-stakes signals (damage events, cancellation friction) are otherwise underrepresented in any random sample.

Every field in the schema has its own validation pass. Higher-stakes fields — damage events, cancellation friction, equipment breakdowns — are validated independently by two coders to control for individual coder judgment. Lower-stakes fields are validated against a single coder. The methodology distinguishes between exact-match agreement (the coder and the model produced identical outputs) and within-tolerance agreement (the coder and the model produced outputs that round to the same per-location metric).

Validation is not a one-time exercise. The sample is rebuilt at each model version, and new extractions must meet or exceed the prior version’s agreement before the model is rolled out across the full corpus. Per-version agreement reports are versioned in our methodology archive and shared with customers under engagement.

What we commit to

Stratified sampling — never random, always representative of the corpus distribution
Two-coder validation on high-stakes fields — damage, cancellation, breakdowns
Field-level agreement reporting — every field has its own scorecard, not a global accuracy claim
Per-version regression testing — new model versions cannot ship if they degrade validated fields
Confidence intervals — per-location metrics ship with explicit uncertainty, not point estimates
Sparse-data exclusion — locations without enough signal are flagged, not imputed

Per-field agreement scores, validation sample size, and full version history are documented in our methodology archive, shared with customers under engagement. Request the validation report.

Aggregation

From per-review fields to per-location signal.

Per-review extractions are aggregated to per-location metrics. The aggregation logic varies by signal type — rates (e.g., damage rate) are pooled means with a recency weight, while sentiments use weighted averages. Confidence intervals are tied to review density (more reviews → tighter intervals) and review recency (older reviews get downweighted at a documented decay rate).

Sparse-data locations are flagged for caution rather than scored at false precision. We don’t impute. If a location has too little signal to be useful, the location is shown to consumers with a sparse-data flag instead of being assigned an unreliable score.

Confidence intervals reported on per-location metrics reflect the actual uncertainty in the underlying data:

High density (200+ reviews)

±2pp

Medium density (50–199 reviews)

±5pp

Low density (15–49 reviews)

±9pp

Sparse (5–14 reviews)

flagged

Insufficient (<5 reviews)

excluded

Cross-state operator identity matching joins location-level records into chain-level rollups. This is what enables per-franchisee variance analysis (e.g., the Tommy’s variance findings) and chain-level rollups across DBA, parent, and sponsor structures. The matching layer is non-trivial — operators are notoriously messy on the entity side, with different LLCs per state, holding-company structures, and franchise vs. corporate-owned mixes — and represents a meaningful piece of the engineering moat.

Format & pricing capture

Multi-source capture, with fallbacks.

Format classification (express tunnel, full-service, in-bay, self-serve, hybrid, detail) and posted pricing aren’t extracted from reviews — they’re captured from operator sources directly. The car wash industry is uneven about how it publishes this information, so the capture pipeline is built with explicit fallbacks: each source is tried in order of authority, and gaps are escalated to the next layer rather than guessed.

Source 01

Operator websites

Primary

Format is captured from operator websites by parsing how operators describe their own offering — language about tunnel length, conveyor systems, hand-detail packages, self-serve bays, and similar terms maps cleanly to format categories when the operator publishes it. Pricing is captured the same way: posted single-wash prices, unlimited membership tiers, and package structures are scraped directly from the pricing or location page when published.

This is the highest-confidence source — when an operator publishes their own format and pricing, that's the canonical answer. The majority of multi-location operators publish enough on their websites for both signals to be captured this way.

Source 02

Review-derived inference

Fallback for format

When an operator's website doesn't disclose format clearly — common for independent and smaller operators — format is inferred from review text patterns. Reviewers describe what they experienced: "drove through the tunnel," "the attendant hand-dried it," "I vacuumed at the bay myself." These descriptions map probabilistically to format categories, and the inference pipeline assigns format with an associated confidence level. Locations classified by review-inference are flagged so downstream consumers can choose to filter them out for high-stakes use cases.

Review inference is a fallback, not a primary source. When operator-website format and review-inferred format disagree, operator-website wins; the review-inferred classification is only used when the website signal is absent or ambiguous.

Source 03

Automated voice agents

Escalation for pricing

Many car washes — particularly independents and smaller chains — don't publish pricing on their websites at all. For these locations, the pricing pipeline escalates to direct outreach: automated voice agents place outbound calls to the location and collect posted pricing in conversation, then transcribe the result back into the structured pricing record. The agent identifies itself, asks standard pricing questions, and captures the response.

This is what closes the long tail of pricing coverage. Without it, pricing data would be heavily biased toward larger operators with marketing-driven websites; with it, pricing coverage extends to the independent and small-chain operators that make up a meaningful share of the long tail.

Source priority is preserved end-to-end: every per-location format and pricing record is tagged with which source produced it. When buyers pull the data, they can filter by source confidence — a buyer who only wants operator-published values can exclude review-inferred and call-derived entries; a buyer who wants maximum coverage can include all three. The honest about source provenance principle applies here as much as anywhere else in the pipeline.

Scope diagnosis

Localizing operational change to the right tier.

Reviews are timestamped at the location level. When operating quality changes at a location, the change is observable in the review stream within weeks. The diagnostic question for an investor or operator isn’t did it change — it’s at what scope.

Scope diagnosis answers that question by detecting inflection points in the time-series of operating signals at every location, then asking: are nearby locations in the same chain showing the same inflection? Are distant locations in the same chain? The simultaneity pattern tells us whether the cause is local, regional, or chain-wide — and that’s the diagnostic that tells you where to look for root cause.

Single Location

One site degrades while every other site in the chain holds flat through the same window. Cause typically lives at the site level — manager event, equipment failure, staff turnover.

District / Region

A geographic cluster of sites degrades simultaneously while distant sites in the same chain hold flat. Cause typically lives at the regional layer — district leadership, area vendor, local labor market.

Chain-Wide

Most or all sites in the chain inflect within the same window — across districts, formats, and geographies. Cause typically lives at HQ — policy change, software rollout, leadership transition.

The scope output ships with the inflection date, the affected location set, and the diagnostic framing — never with a confident root-cause attribution. The data tells you where to look. The conversation with the operator is what tells you what actually happened.

PE thesis scoring

Six pre-built theses, one location at a time.

Every location in the index is scored against six PE acquisition theses. Each thesis weights different combinations of the underlying signals — the formulas are deterministic, documented, and replicable from the per-location data. The thesis structure mirrors how institutional acquirers actually segment opportunities in the wash sector.

Premium Roll-up

Identifies operationally strong sites that fit a quality-led roll-up. Weighted on satisfaction, low damage rate, healthy membership economics, and operator-level consistency. Used for buyers extending high-quality platforms.

Fixer-Upper

Identifies operationally distressed but viable sites — high damage rate and low satisfaction sitting in markets with sufficient demand signal. Used for operator-PE turnaround playbooks.

Geographic Density

Scores every region on operator-coverage whitespace and tuck-in fit relative to a sponsor's existing portfolio. Used for buyers extending platforms into adjacent metros.

Distressed

Identifies chains showing chain-wide operational distress — most or all sites underperforming, often across multiple states. Used for distressed pricing and turnaround buyers.

Format Migration

Identifies legacy full-service or in-bay locations sitting in markets where express conversion economics work. Used for capex-led value creation theses.

Membership LTV

Scores subscription durability per location based on cancellation friction signals, member-only complaint patterns, and renewal language. Used to pressure-test seller LTV assumptions on subscription-heavy targets.

The exact formulas, weights, and thresholds for each thesis are openly documented in our methodology reference, available under engagement. We don’t treat the formulas as a trade secret — buyers should be able to audit them and challenge them. The proprietary work is in the underlying corpus and the entity-resolution layer, not the scoring math.

Refresh

Monthly refresh, incrementally.

The full corpus refreshes on a monthly cadence. Refreshes are incremental: only genuinely new reviews are re-extracted, so adding new jurisdictions and updating existing ones is fast and cheap at the marginal review level. The historical panel is preserved on each refresh — review timestamps already provide multi-year retrospective signal at the location level when used carefully.

Custom-engagement clients can request out-of-cycle refreshes on specific operators, regions, or chains when a deal timeline requires it. Out-of-cycle refreshes are scoped per request.

Operator-website signals (format classification, posted pricing, package structure, location additions or closures) are re-pulled on the same monthly cadence and joined back to the per-location records. Voice-agent pricing capture runs on demand, focused on locations whose website pricing was missing or stale at the most recent refresh.

Limitations

What the data can’t tell you.

Sophisticated buyers should know what they’re getting and what they’re not. We’re explicit about the boundaries of the dataset.

Reviewer-reported, not filed-claim

Damage rate is what reviewers reported on Google Maps — it's a leading indicator and a valuable risk signal, but it's not the same as filed insurance claims. Claim severity (cost) isn't captured. Carriers calibrating their books should treat reviewer-damage-rate as a frequency proxy and combine it with their own loss data.

Selection bias in voluntary review behavior

Customers who write reviews are not a perfect sample of customers who visit. Reviewers skew toward extreme experiences — particularly good or particularly bad. Per-location and per-operator comparisons are well-controlled for this because the bias is structurally similar across all locations, but absolute-level interpretation should account for it.

No revenue or financial data

WashIndex is an operating-quality dataset. We don't model or report site-level revenue, EBITDA, or member counts. Operating quality is a powerful complement to operator P&L data, but it doesn't substitute for it.

Sparse-data locations are excluded

Locations with fewer than five extracted reviews are flagged and excluded from scored outputs. This biases the index slightly toward higher-volume sites. Newly-opened locations may take a quarter or two to accumulate enough signal to be scored.

Time-of-event imprecision

Review timestamps reflect when the review was written, not necessarily when the underlying experience occurred. The lag is typically short (days to weeks) but can occasionally be longer for damage events that take time to surface.

Coverage of non-English reviews

The extraction pipeline is calibrated for English. Reviews in other languages are detected and excluded from extraction in the current version. This biases coverage in heavily Spanish-speaking and French-speaking markets, particularly Quebec. Multi-language extraction is on the roadmap.

Sources & licensing

What goes in, and how it’s licensed.

WashIndex is built from publicly available source materials, accessed and processed under licensed infrastructure. The full source set is documented below.

Primary sources

Google Maps reviews (public)
Google Maps location metadata (public)
Operator websites for format and pricing (public)
Public M&A filings, news, and press releases
State and provincial business registries (public)
Direct outreach to locations via automated voice agents (when pricing isn't published)
DOT and provincial traffic-volume data — average annual daily traffic counts on U.S. and Canadian road segments
U.S. Census Bureau and Statistics Canada data — population density, median household income, and demographic catchments at the tract level
Median home value data — U.S. and Canadian residential value estimates joined to the census-tract demographic layer

Infrastructure & processing

Licensed scraping infrastructure for review and website collection
Language-model extraction with JSON-schema enforcement
Versioned model pipeline with reproducible outputs
Stratified hand-validation samples regenerated at every model version
Cross-state entity-resolution and operator matching
Automated voice-agent pipeline for long-tail pricing capture

Our use of public review data falls within the scope of fair-use research and analysis on publicly-available materials. We do not republish raw review text in customer-facing outputs; the structured signals we publish are derived metrics, not reproductions of source content. Licensing details and full sourcing documentation are available under NDA on request.

Versioning & reproducibility

Every output traceable to its inputs and version.

Every per-location score and every diligence output ships with a model version pin. The version specifies the extraction pipeline, the schema version, the validation sample state, the aggregation logic version, and the scoring formula version. Two runs against the same model version on the same data must produce identical outputs, or the version is invalid and a patch is issued.

Diligence briefs include the model version inline, alongside the per-location confidence intervals. If a brief is challenged or revisited later, the lineage is reconstructable from the version pin alone — no ambiguity about which version of which signal produced which output.

Major model versions are infrequent (roughly annual) and are accompanied by a methodology release note documenting changes, validation deltas, and any backwards-incompatible scoring shifts. Minor versions roll out as schema additions or extraction-quality improvements without changing existing outputs.

Read the methodology, then see it in action.

The free platform demonstrates everything described above against any operator or location. Or scope a custom engagement if you have a specific question.

Open the platform Scope an engagement

How we build the data.