The data

Anonymized partner data, citable and open.

17 datasets, aggregated under NDA from operating-company partners, de-identified, and published as structured JSON with methodology, sample size, and license.

How the datasets are sourced

Each dataset in this catalog is aggregated from an operating-company partner that agreed to share raw data with me under a non-disclosure agreement, on the condition that what gets published is de-identified. “De-identified” here means: the company name is not disclosed, any row-level IDs are stripped or hashed, and the smallest aggregation that still answers the analytic question is the one you see.

The raw rows remain with the originating company. What is published on this page is the aggregation, the methodology, and the resulting statistics — enough to audit the argument in the essays that cite the dataset, not enough to re-identify a source. This is the same arrangement academic economists use when publishing on IRS, census, or proprietary administrative data.

If you want to cite a dataset and need methodology detail the page doesn't list, reach out on LinkedIn. The partner will often approve additional disclosure on request.

  • Loss Aversion Ratios by Stake Level

    JSON →

    Empirical loss aversion coefficient λ observed in marketplace pricing experiments, decomposed by transaction stake level. Shows that λ is not constant (textbook 2.25) but varies systematically with stake magnitude and user platform investment.

    Sample size
    ~14.1M total observations across buckets
    Collected
    2024-06/2025-05
    License
    CC-BY-4.0 for cited figures

    Used in · loss-aversion-asymmetry-digital-marketplaces

  • Churn Windows by Discount-Type Subscriber Cohort

    JSON →

    Observed cancellation concentration around billing dates across subscription cohorts, decomposed by estimated beta (present-bias) parameter. Shows that 52–68% of annual churn events in consumer subscriptions occur within 7 days of a billing date — consistent with the beta-delta hyperbolic discounting model's prediction of billing-day regret.

    Sample size
    ~2.4M subscriber-months
    Collected
    2022-01/2024-12
    License
    CC-BY-4.0 for aggregate figures

    Used in · hyperbolic-discounting-subscription-churn

  • Ladder-Up vs Ladder-Down SaaS Pricing Conversion

    JSON →

    Direct A/B test of ladder-up (start on free or starter, prompt to upgrade) vs ladder-down (start on premium trial, prompt to downgrade) pricing paths across 9 SaaS products. Ladder-down converts 31–58% more paying users, driven by endowment-effect-induced resistance to losing premium features.

    Sample size
    ~182K signups across 9 products
    Collected
    2024-Q2/Q3
    License
    CC-BY-4.0 for aggregate figures

    Used in · endowment-effect-saas-pricing

  • Platform Entry Threshold by Complementor Category Share

    JSON →

    Observed relationship between a complementor category's share of platform transaction volume and the probability that the platform enters the category within 24 months. Entry becomes likely once a category exceeds ~8% of platform volume.

    Sample size
    412 categories across 4 platforms
    Collected
    2018-2024
    License
    CC-BY-4.0 for cited figures

    Used in · platform-cannibalization-dynamics

  • Vertical SaaS Market Concentration and Multi-Homing

    JSON →

    Market concentration (top-3 share) and supplier-side multi-homing rates across 40 vertical SaaS markets. Winner-take-most is the exception, not the rule: only 22% of vertical markets exhibit top-3 share above 70%, and those markets also show low multi-homing.

    Sample size
    40 vertical markets
    Collected
    2024
    License
    CC-BY-4.0 for synthesis

    Used in · winner-take-most-multi-homing-vertical-saas · two-sided-network-effects-dead

  • MTA Reported ROAS vs Experimental (Incrementality) ROAS

    JSON →

    Side-by-side comparison of ROAS reported by multi-touch attribution systems versus ROAS estimated via randomized geo-lift experiments for the same channels and periods. MTA systematically overstates ROAS by 2.4–6.5x, with the gap widest for retargeting and display.

    Sample size
    6 published studies, 22 channel-study combinations
    Collected
    2015-2024
    License
    CC-BY-4.0 for synthesis

    Used in · multi-touch-attribution-causal-inference-dag · unified-measurement-architecture-mmm-mta-experimentation

  • Bayesian MMM — Channel Saturation and Adstock Parameters

    JSON →

    Posterior estimates of adstock half-life and saturation parameters (Hill function) for eight paid-media channels from a privacy-first Bayesian marketing mix model. Reveals that TV has the longest decay (12-week half-life) while search has the shortest (under 1 week).

    Sample size
    156 weeks × 180 DMAs × 8 channels
    Collected
    2022-01/2024-12
    License
    CC-BY-4.0 for cited figures

    Used in · marketing-mix-modeling-privacy-first-era · unified-measurement-architecture-mmm-mta-experimentation

  • CausalImpact Lift from a B2B Content Program

    JSON →

    Bayesian structural time series (CausalImpact) estimate of the causal lift on organic traffic from launching a dedicated 36-article B2B content program. Non-branded organic captures only 38% of total SEO impact; the remaining 62% flows through branded search and direct traffic.

    Sample size
    104 weeks, 8 control variables, 6 outcomes
    Collected
    2023-Q1/2025-Q1
    License
    CC-BY-4.0 for cited figures

    Used in · causal-impact-seo-branded-search · compounding-advantage-content-moats-seo

  • Cohort LTV/CAC and Payback by Acquisition Channel

    JSON →

    Acquisition-cohort unit economics for a consumer SaaS business, decomposed by channel. Exposes the aggregation fallacy: the rolled-up 3.1x LTV/CAC hides a channel portfolio with individual ratios ranging from 0.8x (brand-misaligned display) to 8.4x (organic referral), with very different payback profiles.

    Sample size
    ~18,400 customers across 7 channels in Q1 2023 cohort
    Collected
    2023-01/2025-01
    License
    CC-BY-4.0 for cited figures

    Used in · cohort-based-unit-economics · clv-control-variable-bid-strategies

  • Test Duration Reduction from Bayesian vs Frequentist A/B Testing

    JSON →

    Head-to-head comparison of decision latency between Bayesian posterior-probability testing and classical frequentist fixed-sample testing across 48 production experiments. Median time-to-decision dropped 36% under Bayesian methodology with no increase in downstream product regret.

    Sample size
    48 experiments, ~29M visitor-sessions total
    Collected
    2024-Q1/2025-Q1
    License
    CC-BY-4.0 for aggregate figures

    Used in · bayesian-ab-testing-practice

  • Cox Proportional Hazards — SaaS Churn Covariates

    JSON →

    Fitted hazard ratios for ten covariates on 18-month SaaS subscriber survival. Feature usage depth and onboarding completion dominate (hazard ratios 0.34 and 0.41 respectively); price tier and annual billing have smaller but significant effects. Shows that churn is primarily a product-engagement phenomenon, not a pricing phenomenon.

    Sample size
    82,450 subscribers, 14,212 churn events
    Collected
    2023-07/2025-01
    License
    CC-BY-4.0 for cited figures

    Used in · survival-analysis-subscription-businesses

E-commerce ML3 datasets
  • Learning-to-Rank Revenue Lift by Objective Function

    JSON →

    Incremental revenue per session from different ranking objective functions on an e-commerce search result page. Revenue-weighted composite (relevance × margin × projected LTV) outperforms pure relevance ranking by 23% in GMV per session, with neutral effect on relevance perception.

    Sample size
    ~14.2M search sessions, 4 variants
    Collected
    2024-08/2024-10
    License
    CC-BY-4.0 for cited figures

    Used in · search-ranking-revenue-optimization-l2r

  • Transformer Product Embeddings — CTR Lift vs Collaborative Filtering

    JSON →

    CTR and downstream conversion lift from replacing a matrix-factorization collaborative filter with transformer-based session embeddings (BERT4Rec-style). Transformer embeddings lift CTR by 18–32% across cold-start, returning-user, and category-diverse segments.

    Sample size
    ~6.2M users, 4 segments
    Collected
    2024-10/2024-12
    License
    CC-BY-4.0 for cited figures

    Used in · transformer-product-embeddings-ecommerce · cold-start-problem-few-shot-learning

  • Uplift Modeling — Persuadable Share by Customer Segment

    JSON →

    Share of customers falling into each of the four uplift quadrants (sure-thing, persuadable, lost-cause, do-not-disturb) for a promotional email campaign, decomposed by customer segment. Only 18% of the audience is genuinely persuadable; 64% of promotional budget is historically wasted on the other three groups.

    Sample size
    ~1.6M customers, 4-segment decomposition
    Collected
    2024-Q3/Q4
    License
    CC-BY-4.0 for cited figures

    Used in · personalized-promotion-uplift-modeling

  • Cost per Attention Second by Media Format

    JSON →

    CPAS (cost per attention second) computed across 12 digital and traditional media formats from eye-tracking and dwell-inferred attention data. Display banners — the cheapest format on CPM — are the most expensive on attention. Connected-TV and audio invert the traditional CPM-based ROI ranking.

    Sample size
    ~120M measured impressions across 7 studies
    Collected
    2022-03/2024-11
    License
    CC-BY-4.0 for cited figures

    Used in · attention-economics-cognitive-load-advertising

  • Creative Fatigue Decay by Impression Band

    JSON →

    Relative response (click-through and post-click conversion) as the same creative is shown repeatedly to the same audience, segmented by audience-frequency decile. Fatigue onset is earlier than industry convention assumes — entropy-based detection flags decay 2–4 weeks before CTR collapse.

    Sample size
    ~3.8B impressions, 14 campaigns
    Collected
    2024-Q2/2025-Q1
    License
    CC-BY-4.0 for cited figures

    Used in · creative-fatigue-detection-entropy-metrics

  • Content Moat — Traffic per Article as Archive Grows

    JSON →

    Traffic per article as a niche content archive grows from 1 to 200+ articles. Per-article traffic COMPOUNDS with archive size (network effect via internal linking + topical authority), not flat-linear — 50th article gets 2.8× the traffic of the 1st article for identical quality.

    Sample size
    8 sites, 1,420 articles tracked
    Collected
    2022-01/2025-01
    License
    CC-BY-4.0 for cited figures

    Used in · compounding-advantage-content-moats-seo