15 min read

Risk Scoring Frameworks for Technical Debt

Q: How often should the risk score be recalculated?

Run an incremental recalculation daily (low-cost delta pass) and a full recalculation weekly. Trigger on-demand runs after every significant deployment or migration.

Q: What is the difference between a critical and a high severity tier?

Critical (score ≥ 85) means immediate revenue or crawlability impact — these should open an incident. High (65–84) means a degradation visible in search performance within days; fix within the next sprint.

Q: How do I prevent score inflation from seasonal traffic spikes?

Use a rolling 30-day baseline window with a minimum-periods guard of at least 7. Consider a year-over-year delta comparison for sites with strong seasonality (e-commerce, news).

Q: Can I apply the same weights to a blog and an e-commerce catalogue?

No. Blogs prioritise crawl coverage and LCP; e-commerce catalogues weight CLS, INP, and structured-data validity more heavily. Use separate weight configs per section type.

Without a consistent scoring model, teams waste sprint cycles on cosmetic issues while indexability loss and LCP regressions compound undetected. This workflow — part of the broader Technical Audit Fundamentals & Scope Mapping practice — converts raw crawl data into a reproducible composite score that drives automated triage, not just reports.

Prerequisites & Environment Setup

The scoring engine depends on Python 3.11+, pandas 2.2, numpy 1.26, and BigQuery client libraries. Pin versions in a lockfile so CI runs match local results.

# requirements.txt
pandas==2.2.2
numpy==1.26.4
google-cloud-bigquery==3.21.0
requests==2.32.2
pyyaml==6.0.1

Required environment variables — export these before running any pipeline step:

export SCORING_BASE_URL="https://example.com"
export SCORING_SITEMAP_URL="https://example.com/sitemap.xml"
export BQ_PROJECT="your-gcp-project"
export BQ_DATASET="audit_metrics"
export JIRA_BASE_URL="https://your-org.atlassian.net"
export JIRA_TOKEN="$(cat /run/secrets/jira_token)"
export PAGERDUTY_ROUTING_KEY="$(cat /run/secrets/pd_routing_key)"

Step 1 — Ingestion & Scope Validation

The first failure mode in any scoring pipeline is dirty input: staging URLs, session-parameterised duplicates, and disallowed paths that slip through inflate error counts and distort severity tiers. Before any metric is read, enforce a clean URL manifest. Review the canonical scope parameters to confirm which environments belong in scope, and establish your crawl depth boundaries before running the validator below.

#!/usr/bin/env python3
# pipeline/01_ingest_validate.py
import os, requests, urllib.parse, json
from xml.etree import ElementTree

BASE_URL    = os.environ["SCORING_BASE_URL"]
SITEMAP_URL = os.environ["SCORING_SITEMAP_URL"]
STRIP_PARAMS = {"utm_source", "utm_medium", "utm_campaign", "sid", "session_id", "ref"}

def load_disallowed(base_url: str) -> list[str]:
    r = requests.get(f"{base_url}/robots.txt", timeout=10)
    r.raise_for_status()
    return [
        line.split(": ", 1)[1].strip()
        for line in r.text.splitlines()
        if line.startswith("Disallow:") and len(line.split(": ", 1)) > 1
    ]

def clean_url(raw: str) -> str:
    p = urllib.parse.urlparse(raw.strip())
    qs = {k: v for k, v in urllib.parse.parse_qs(p.query).items() if k not in STRIP_PARAMS}
    return urllib.parse.urlunparse((p.scheme, p.netloc, p.path, p.params,
                                    urllib.parse.urlencode(qs, doseq=True), ""))

def build_manifest(base_url: str, sitemap_url: str) -> dict:
    disallowed = load_disallowed(base_url)
    r = requests.get(sitemap_url, timeout=15)
    r.raise_for_status()
    tree = ElementTree.fromstring(r.content)
    ns = {"s": "http://www.sitemaps.org/schemas/sitemap/0.9"}
    targets = []
    for loc in tree.findall(".//s:loc", ns):
        url = clean_url(loc.text)
        path = urllib.parse.urlparse(url).path
        if not any(path.startswith(d) for d in disallowed if d and d != "/"):
            targets.append({"url": url, "status": "valid", "depth": path.count("/") - 1})
    return {"manifest_version": "1.0", "url_count": len(targets), "targets": targets}

if __name__ == "__main__":
    manifest = build_manifest(BASE_URL, SITEMAP_URL)
    with open("/tmp/url_manifest.json", "w") as f:
        json.dump(manifest, f, indent=2)
    print(f"Manifest written: {manifest['url_count']} URLs")

Common mistakes:

Including canonicalized duplicates in the raw scoring pool — always deduplicate on the clean URL, not the raw one.
Ignoring HTTP 4xx/5xx rate during ingestion; a high error rate signals a crawler configuration issue, not a site health issue.
Failing to strip session IDs before writing the manifest — they re-introduce duplicates downstream.

Step 2 — Metric Normalisation & Baseline Calibration

Raw Core Web Vitals and crawlability metrics are dimensionally incompatible: LCP is measured in milliseconds while crawl waste is measured in URL counts. Before assigning weights you must normalise everything onto a common [0, 100] scale. Use a rolling 30-day baseline so seasonal variation does not trigger false alerts; see establishing baseline health metrics for guidance on initialising windows for recently migrated properties.

#!/usr/bin/env python3
# pipeline/02_normalise.py
import pandas as pd
import numpy as np

WINDOW_DAYS = 30
CAP_PERCENTILE = 95

def normalise_metrics(df: pd.DataFrame) -> pd.DataFrame:
    """
    Expects columns: url, raw_metric, metric_type, date.
    Returns: url, metric_type, normalised_score, percentile_rank.
    """
    df = df.sort_values("date")

    # Rolling baseline — prevents seasonal spikes from inflating scores
    df["rolling_mean"] = (
        df.groupby("metric_type")["raw_metric"]
        .transform(lambda s: s.rolling(WINDOW_DAYS, min_periods=7).mean())
    )
    df["rolling_std"] = (
        df.groupby("metric_type")["raw_metric"]
        .transform(lambda s: s.rolling(WINDOW_DAYS, min_periods=7).std().fillna(1))
    )

    # Z-score, then cap outliers at 95th percentile before scaling
    df["z_score"] = (df["raw_metric"] - df["rolling_mean"]) / df["rolling_std"]
    cap = df.groupby("metric_type")["z_score"].transform(
        lambda s: np.percentile(s.dropna(), CAP_PERCENTILE) if len(s.dropna()) else 1
    )
    df["capped"] = np.clip(df["z_score"], a_min=None, a_max=cap)

    # Min-Max to [0, 100]
    def minmax(s: pd.Series) -> pd.Series:
        lo, hi = s.min(), s.max()
        return ((s - lo) / (hi - lo + 1e-9)) * 100

    df["normalised_score"] = df.groupby("metric_type")["capped"].transform(minmax)
    df["percentile_rank"]  = df.groupby("metric_type")["normalised_score"].transform(
        lambda s: s.rank(pct=True) * 100
    )
    return df[["url", "metric_type", "normalised_score", "percentile_rank"]]

Key parameters for the normalisation step:

Parameter	Type	Default	Purpose
`WINDOW_DAYS`	int	`30`	Rolling window for baseline mean/std computation
`min_periods`	int	`7`	Minimum observations before a rolling value is emitted
`CAP_PERCENTILE`	int	`95`	Percentile above which z-scores are capped to suppress outliers
`1e-9` epsilon	float	`1e-9`	Prevents divide-by-zero when min equals max in a metric group

Common mistakes:

Using static thresholds rather than dynamic percentiles — a 3-second LCP is fine for a B2B tool but catastrophic on a news article.
Applying uniform scaling across LCP, CLS, and INP together; normalise each metric type independently within its own distribution.
Ignoring the min_periods guard — early in a new domain's life, rolling windows with fewer than 7 observations produce extreme z-scores that inflate severity.

Step 3 — Risk Matrix & Weight Assignment

Weights must reflect revenue impact, not engineering preference. LCP and INP regressions on conversion-path pages carry far higher business cost than the same regression on a blog archive. Pair each metric weight with a page-level impact multiplier derived from your crawl depth and scope configuration, where shallow revenue-critical URLs receive the highest multipliers.

# config/risk_matrix.yaml
version: "2.1"
weights:
  lcp:               0.25   # Largest Contentful Paint — direct ranking signal
  cls:               0.20   # Cumulative Layout Shift — UX & conversion impact
  inp:               0.20   # Interaction to Next Paint — engagement signal
  wcag_violations:   0.15   # Accessibility issues — legal risk + indexability
  crawl_budget_waste: 0.20  # Orphaned / low-value pages consuming crawl quota

impact_multipliers:
  revenue_critical:  2.5    # Checkout, pricing, product detail pages
  conversion_path:   1.8    # Lead-gen, sign-up, key landing pages
  informational:     1.0    # Blog posts, FAQs, docs
  orphaned:          0.6    # Pages with no inbound internal links

severity_tiers:
  critical: 85   # Triggers immediate Jira + PagerDuty
  high:     65   # Tickets to current sprint backlog
  medium:   40   # Logged to weekly review queue
  low:       0   # Monitored; no immediate action

#!/usr/bin/env python3
# pipeline/03_score.py
"""
Composite Risk Score = Σ(weight_i × normalised_metric_i) × impact_multiplier
"""
import yaml, pandas as pd
from pathlib import Path

CONFIG = yaml.safe_load(Path("/abs/config/risk_matrix.yaml").read_text())
WEIGHTS = CONFIG["weights"]
MULTIPLIERS = CONFIG["impact_multipliers"]
TIERS = CONFIG["severity_tiers"]

def assign_tier(score: float) -> str:
    if score >= TIERS["critical"]: return "critical"
    if score >= TIERS["high"]:     return "high"
    if score >= TIERS["medium"]:   return "medium"
    return "low"

def compute_risk_scores(normalised: pd.DataFrame, page_meta: pd.DataFrame) -> pd.DataFrame:
    """
    normalised:  [url, metric_type, normalised_score]
    page_meta:   [url, page_class]  — page_class maps to impact_multipliers keys
    """
    pivot = normalised.pivot_table(index="url", columns="metric_type",
                                   values="normalised_score", aggfunc="mean").fillna(0)
    for col in WEIGHTS:
        if col not in pivot.columns:
            pivot[col] = 0.0

    pivot["weighted_sum"] = sum(pivot[m] * w for m, w in WEIGHTS.items())

    result = pivot[["weighted_sum"]].join(page_meta.set_index("url"), how="left")
    result["impact_multiplier"] = result["page_class"].map(MULTIPLIERS).fillna(1.0)
    result["risk_score"] = (result["weighted_sum"] * result["impact_multiplier"]).clip(0, 100)
    result["severity_tier"] = result["risk_score"].apply(assign_tier)
    return result.reset_index()[["url", "risk_score", "severity_tier", "impact_multiplier"]]

The BigQuery query below validates your weight choices: if the revenue_impact_delta for critical-tier pages is not at least 15–20 percentage points below baseline conversion, your weights are likely miscalibrated.

-- BigQuery: verify weight calibration via historical conversion correlation
SELECT
    metric_type,
    ROUND(AVG(conversion_rate) * 100, 2)                                              AS baseline_conv_pct,
    ROUND(AVG(CASE WHEN risk_score >= 85 THEN conversion_rate END) * 100, 2)          AS critical_conv_pct,
    ROUND(
        SAFE_DIVIDE(
            AVG(conversion_rate) - AVG(CASE WHEN risk_score >= 85 THEN conversion_rate END),
            AVG(conversion_rate)
        ) * 100, 2
    )                                                                                  AS revenue_delta_pct
FROM `${BQ_PROJECT}.${BQ_DATASET}.audit_metrics`
WHERE date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) AND CURRENT_DATE()
GROUP BY metric_type
ORDER BY revenue_delta_pct DESC

Step 4 — Execution, Scheduling & Alert Routing

Scores are only useful if they are acted upon automatically. Run an incremental daily job plus a full weekly recalculation; route critical issues to Jira immediately and trigger PagerDuty when the score velocity (rate of change) exceeds a threshold — a sudden 20-point jump indicates a deployment regression rather than a gradual drift.

# .github/workflows/risk-scoring-pipeline.yml
name: Daily Risk Recalculation & Routing
on:
  schedule:
    - cron: '0 2 * * *'   # 02:00 UTC daily
  workflow_dispatch:

env:
  BQ_PROJECT:    ${{ secrets.BQ_PROJECT }}
  BQ_DATASET:    audit_metrics
  JIRA_BASE_URL: ${{ secrets.JIRA_BASE_URL }}
  JIRA_TOKEN:    ${{ secrets.JIRA_TOKEN }}
  PD_ROUTING_KEY: ${{ secrets.PD_ROUTING_KEY }}

jobs:
  score-and-route:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }
      - run: pip install -r requirements.txt

      - name: Build URL manifest
        run: |
          set -euo pipefail
          python pipeline/01_ingest_validate.py

      - name: Normalise and score
        run: |
          set -euo pipefail
          python pipeline/02_normalise.py --input /tmp/url_manifest.json
          python pipeline/03_score.py --output /tmp/risk_manifest.json

      - name: Route critical issues to Jira
        run: |
          set -euo pipefail
          jq -r '.items[] | select(.severity_tier == "critical") | .url' /tmp/risk_manifest.json \
          | while IFS= read -r url; do
              curl -sf -X POST "${JIRA_BASE_URL}/rest/api/3/issue" \
                -H "Authorization: Bearer ${JIRA_TOKEN}" \
                -H "Content-Type: application/json" \
                -d "{\"fields\":{\"project\":{\"key\":\"TECH\"},\"summary\":\"Critical Risk: ${url}\",\"issuetype\":{\"name\":\"Bug\"},\"priority\":{\"name\":\"Highest\"}}}"
            done

      - name: Alert PagerDuty on score-velocity spike
        run: |
          set -euo pipefail
          # Fire alert if any URL's score increased by ≥20 points vs prior run
          VELOCITY=$(jq '[.items[] | select(.score_delta >= 20)] | length' /tmp/risk_manifest.json)
          if [ "${VELOCITY}" -gt 0 ]; then
            curl -sf -X POST "https://events.pagerduty.com/v2/enqueue" \
              -H "Content-Type: application/json" \
              -d "{\"routing_key\":\"${PD_ROUTING_KEY}\",\"event_action\":\"trigger\",\"payload\":{\"summary\":\"${VELOCITY} URLs show ≥20pt risk-score spike\",\"severity\":\"critical\",\"source\":\"risk-scoring-pipeline\"}}"
          fi

      - name: Upload risk manifest artefact
        uses: actions/upload-artifact@v4
        with:
          name: risk-manifest-${{ github.run_id }}
          path: /tmp/risk_manifest.json
          retention-days: 90

Concurrency guard — prevent parallel runs from overwriting the same manifest:

concurrency:
  group: risk-scoring-pipeline
  cancel-in-progress: false   # queue rather than cancel; never discard a run

Step 5 — Artefact Capture & Storage

The risk manifest and supporting Parquet files form an audit trail for trend analysis and tracking metric trends across release cycles. Without versioned artefacts you cannot correlate a score spike to a specific deployment SHA.

#!/usr/bin/env python3
# pipeline/05_persist_artefacts.py
"""
Write risk_manifest.json and risk_scores.parquet to a versioned GCS path.
Path convention: gs://<bucket>/risk-scores/<YYYY-MM-DD>/<git_sha>/
"""
import os, json, datetime, subprocess
from pathlib import Path
import pandas as pd
from google.cloud import storage

BUCKET      = os.environ["GCS_BUCKET"]         # e.g. "my-audit-artefacts"
MANIFEST_IN = Path("/tmp/risk_manifest.json")

def get_git_sha() -> str:
    return subprocess.check_output(["git", "rev-parse", "--short", "HEAD"],
                                   text=True).strip()

def upload(local_path: Path, gcs_key: str) -> None:
    client = storage.Client()
    bucket = client.bucket(BUCKET)
    blob   = bucket.blob(gcs_key)
    blob.upload_from_filename(str(local_path))
    print(f"Uploaded: gs://{BUCKET}/{gcs_key}")

if __name__ == "__main__":
    today  = datetime.date.today().isoformat()
    sha    = get_git_sha()
    prefix = f"risk-scores/{today}/{sha}"

    # JSON manifest
    upload(MANIFEST_IN, f"{prefix}/risk_manifest.json")

    # Parquet for BI / BigQuery external table
    manifest = json.loads(MANIFEST_IN.read_text())
    df = pd.DataFrame(manifest["items"])
    parquet_path = Path("/tmp/risk_scores.parquet")
    df.to_parquet(parquet_path, index=False, compression="snappy")
    upload(parquet_path, f"{prefix}/risk_scores.parquet")

Retention policy: keep daily manifests for 90 days; weekly full-recalculation exports indefinitely. Set this on the GCS bucket lifecycle rule, not in code.

Verification Checklist

Run these checks after each pipeline execution to confirm the workflow produced valid output:

Manifest URL count matches expectation — compare url_count in /tmp/url_manifest.json against the previous run's count. A drop greater than 5% indicates a sitemap regression.
Score distribution sanity — fewer than 2% of URLs should be critical tier on a stable site; if the percentage is higher, inspect the weight config for miscalibration.
Jira deduplication — query Jira for open TECH issues with summaries matching Critical Risk: and confirm no URL received two tickets in the same run.
Artefact checksum — verify the GCS object's md5Hash matches a local md5sum /tmp/risk_manifest.json before closing the pipeline run.

BigQuery row count — confirm the row count in audit_metrics increased by the expected URL count:

SELECT COUNT(*) FROM `${BQ_PROJECT}.${BQ_DATASET}.audit_metrics`
WHERE date = CURRENT_DATE()

PagerDuty alert closed — if a velocity spike alert fired, confirm it resolves automatically when the next run produces no spikes (event_action: resolve).

Troubleshooting

Score distribution is heavily skewed toward critical Root cause: weight config and/or impact multipliers applied to a URL set that includes staging or parameterised pages that escaped the manifest filter. Fix:

jq '[.targets[] | select(.url | test("staging|dev|\\?.*sid="))] | length' /tmp/url_manifest.json
# If non-zero, re-run 01_ingest_validate.py with an extended STRIP_PARAMS list

Rolling baseline returns NaN for new domains Root cause: fewer than min_periods observations in the rolling window on day 1–6 of monitoring. Fix: seed the baseline using historical Lighthouse CI data or set min_periods=1 for the first two weeks, then raise it to 7.

# Temporary override in pipeline/02_normalise.py
WINDOW_MIN_PERIODS = int(os.getenv("BASELINE_MIN_PERIODS", "7"))

Jira tickets created for already-resolved URLs Root cause: deduplication logic is missing — the pipeline checks current risk_score but not whether an open Jira issue already exists for the URL. Fix: query the Jira search API before creating each ticket:

curl -sf "${JIRA_BASE_URL}/rest/api/3/search?jql=project=TECH+AND+summary~\"${ENCODED_URL}\"+AND+statusCategory!=Done" \
  -H "Authorization: Bearer ${JIRA_TOKEN}" | jq '.total'
# Only POST if total == 0

PagerDuty alert never resolves after a fix Root cause: the pipeline only fires trigger events, never resolve. Without a matching resolve event, PagerDuty holds the incident open indefinitely. Fix: add a resolve step that fires when VELOCITY == 0 and the previous run had a spike:

# Store run velocity in GCS as a state file; compare to prior run
PREV_VELOCITY=$(gsutil cat "gs://${GCS_BUCKET}/state/last_velocity.txt" 2>/dev/null || echo "0")
if [ "${PREV_VELOCITY}" -gt 0 ] && [ "${VELOCITY}" -eq 0 ]; then
  curl -sf -X POST "https://events.pagerduty.com/v2/enqueue" \
    -H "Content-Type: application/json" \
    -d "{\"routing_key\":\"${PD_ROUTING_KEY}\",\"event_action\":\"resolve\",\"dedup_key\":\"risk-velocity-spike\"}"
fi
echo "${VELOCITY}" | gsutil cp - "gs://${GCS_BUCKET}/state/last_velocity.txt"

Parquet export fails with ArrowInvalid on mixed-type columns Root cause: normalised_score column contains a mix of float and None when some URLs have no metrics for a given type. Fix:

df["normalised_score"] = pd.to_numeric(df["normalised_score"], errors="coerce").astype("float32")

BigQuery load job times out on large manifests Root cause: writing row-by-row via the streaming API instead of bulk load from GCS Parquet. Fix: always write the Parquet file first and load from GCS:

bq load --source_format=PARQUET --replace \
  "${BQ_PROJECT}:${BQ_DATASET}.audit_metrics" \
  "gs://${GCS_BUCKET}/risk-scores/$(date +%F)/*/risk_scores.parquet"

FAQ

How often should the risk score be recalculated?

Run an incremental delta pass daily (processing only URLs touched by recent crawl diffs) and a full recalculation weekly. Trigger on-demand runs after any significant deployment, migration, or sitemap structural change. The quarterly audit schedule describes how to align scoring cadence with wider audit cycles.

Can I apply the same weight config to a blog and an e-commerce catalogue?

No. Blogs prioritise crawl coverage and LCP; catalogues weight CLS, INP, and structured-data validity more heavily. Maintain separate risk_matrix.yaml files per site section and select the correct config at scoring time based on the URL's page_class field. See calibrating error thresholds for different site sections for a worked example.

What is the difference between a critical and a high severity tier?

critical (score ≥ 85) means immediate revenue or crawlability impact — open an incident and halt deployments to affected templates. high (65–84) means a degradation that will surface in search performance within days; fix within the next sprint without a full incident declaration. Triage decisions for these tiers are expanded in prioritizing critical vs non-critical site errors.

How do I prevent score inflation from seasonal traffic spikes?

Use a 30-day rolling baseline with min_periods=7 as the default. For sites with strong year-over-year seasonality — e-commerce during peak, news during elections — supplement the rolling window with a year-over-year delta comparison. Query the same date range from the prior year in BigQuery and blend the baselines with a 0.7 / 0.3 weighting in favour of the recent window.

Technical Audit Fundamentals & Scope Mapping — parent section covering the full audit scope and methodology
Prioritizing Critical vs Non-Critical Site Errors — decision rules for acting on the severity tiers this workflow produces
Establishing Baseline Health Metrics for New Domains — seeding the rolling window before sufficient history exists
Defining Crawl Depth & Scope for Enterprise Sites — the depth-based penalty multipliers that feed into the risk matrix
Designing Custom Health Score Algorithms — extending this framework to multi-site dashboard aggregation

Risk Scoring Frameworks for Technical Debt #

Prerequisites & Environment Setup #

Step 1 — Ingestion & Scope Validation #

Step 2 — Metric Normalisation & Baseline Calibration #

Step 3 — Risk Matrix & Weight Assignment #

Step 4 — Execution, Scheduling & Alert Routing #

Step 5 — Artefact Capture & Storage #

Verification Checklist #

Troubleshooting #

FAQ #

Related #