April 8, 2026

ㅤ•ㅤ 7 min read
Android chrome 512x512 1

Welcome to WordPress. This is your first post. Edit or delete it, then start writing!

Short answer: No — don’t test with GPT-4.1 for task=classification. The remaining issues are code bugs, not model capability gaps. Here’s the full breakdown.


Output Evaluation

Scenes 1, 2, 3, 4, 7 — Cinematic ✅ Era regression fully resolved

No decade tokens anywhere. Fix A (setting strip) and Fix B (vintage props) are confirmed working. All settings are contemporary: "dimly lit trading floor""modern office""hectic trading floor with large digital tickers". This is the biggest win — 5 scenes clean.

Scene 0 — Hybrid ✅ All three annotations present

8%, 7%, and 2.5% all appear. The scene splitter bug was either fixed separately or this run got the complete text. Best output this scene has produced across all test runs.

Scene 6 — Hybrid ✅ Gerund strip working correctly

"bar chart showing sector-specific economic pressures" → subject "bar chart", action "showing sector-specific economic pressures". Correct behavior.

Scene 3 — Cinematic ❌ Gerund over-fire on -ing adjective

textstripped gerund from subject: 'middle-aged Caucasian man with graying hair 
in a dark suit, focused expression' → action='graying hair in a dark suit...'
thin subject warning: 'middle-aged Caucasian man' word_count=3

"graying" is an adjective, not a gerund action verb. The gerund strip regex is matching any word ending in -ing in subject position, including adjectives (grayingbaldingaging). This produces a thin subject (3 words) and orphans descriptive text into the action field.

Scene 5 — MG ❌ Two remaining issues

Issue 1 — Fix 3 scope gap: "point at 6%" not matching single-data-point signal

The foreground_prop reads: "line graph titled 'Percentage drop...' with y-axis 0–21% and point at 6%"

Fix 3 scans for "data point" as the single-data-point keyword. "point at" doesn’t match. The line graph survives for a single-value measurement. Fix 3 needs "point at" added to its single-point signal set.

Issue 2 — 94% complement annotation from data_context

data_context.figures contains {"label": "ExxonMobil global production volume Q1 2024", "value": "94%"}. The figure extractor computed 100% - 6% = 94% (remaining production) and stored it as a figure. The LLM then faithfully injected it as an annotation: "annotations with exact figures '94% ExxonMobil global production volume Q1 2024'". This is a semantically confusing annotation — the chart should show the 6% drop, not the 94% remainder. The complement figure should not be stored in data_context.figures at all.


On the Model Question

Don’t upgrade task=classification to GPT-4.1 (full). The log shows classification handles two things — search query generation (297 tokens) and figure extraction (780 tokens). Both are simple extraction tasks where GPT-4.1-mini scores well and the output is correct. None of the remaining bugs trace to classification output quality.

The remaining three issues are all code:

IssueRoot causeModel or Code?
Gerund over-fire on "graying"_GERUND_RE too broad, matches -ing adjectivesCode
"point at 6%" not triggering Fix 3Single-point signal set missing "point at"Code
94% complement annotation_extract_inline_figures storing complement valuesCode

CC Prompt — Three Targeted Fixes

text# visual_pipeline.py: Three fixes — gerund adjective guard, Fix 3 scope, complement value filter

---

## Fix D — Gerund strip: exclude adjective-type -ing words

**Root cause confirmed (scene 3 log):**
`_GERUND_RE` fires on `"graying hair in a dark suit"` because `"graying"`
ends in `-ing`. The strip extracts `"graying hair in a dark suit, focused
expression"` as the action, leaving `"middle-aged Caucasian man"` as a
3-word thin subject.

The gerund strip is intended to catch action phrases like
`"man in suit inspecting documents"` → strip `"inspecting documents"`.
It must not fire on standalone `-ing` adjectives that describe physical
appearance (`graying`, `balding`, `aging`, `thinning`, `fading`, `receding`).

**Fix — add `_APPEARANCE_ADJ_RE` module-level constant**
near the existing gerund-related constants:

```python
# Adjective-type -ing words that describe physical appearance, not actions.
# These must not be stripped by the gerund strip logic.
_APPEARANCE_ADJ_RE = re.compile(
    r"\b(graying|greying|balding|aging|ageing|thinning|receding|"
    r"fading|grizzling|silvering|whitening|greying)\b",
    re.IGNORECASE,
)
```

**In the gerund strip block**, add a pre-check before firing the strip:

```python
# Guard: do not strip if the matched -ing word is an appearance adjective.
# These describe physical attributes, not actions, and must stay in subjects.
if _APPEARANCE_ADJ_RE.search(subject_item):
    # Appearance adjective detected — skip gerund strip for this subject.
    repaired_subjects.append(subject_item)
    continue
```

This guard should execute **before** the gerund regex match attempt on
`subject_item`, so appearance-adjective subjects pass through unchanged.

---

## Fix E — Fix 3 scope: add `"point at"` to single-data-point signal set

**Root cause confirmed (scene 5):**
Fix 3 detects single data points to convert line graphs → bar charts.
It scans foreground_props for `"data point"` as the keyword.

The LLM emitted: `"line graph titled '...' with y-axis 0–21% and point at 6%"`

`"point at"` is a valid single-data-point description but does not match
`r"\bdata\s*point\b"`. Fix 3 did not fire, and the line graph survived
for a single-value measurement.

**Fix — in the `_normalize_visual_spec()` Fix 3 block**, extend the
single-data-point detection pattern:

```python
# Original (only matches "data point"):
_SINGLE_POINT_RE = re.compile(r"\bdata\s*point\b", re.IGNORECASE)

# Replace with (also matches "point at <value>" and "single value"):
_SINGLE_POINT_RE = re.compile(
    r"\b(data\s*point[s]?|point\s+at\s+\d|single\s+value|"
    r"one\s+(bar|value|figure)|snapshot)\b",
    re.IGNORECASE,
)
```

If `_SINGLE_POINT_RE` is defined inline inside the function rather than
at module level, move it to module level and rename as shown — it is now
used twice (once for single-point detection, once for the existing guard)
and should not be recompiled per call.

---

## Fix F — `_extract_inline_figures`: suppress complement values

**Root cause (scene 5):**
`_extract_inline_figures` extracted `"6%"` from the text (correct), then
computed or accepted a `"94%"` complement figure stored as:
`{"label": "ExxonMobil global production volume Q1 2024", "value": "94%"}`.

This complement is mathematically derived (`100 - 6 = 94`), not a figure
that appears in the scene text. The LLM injected it as a confusing annotation
(`"94% ExxonMobil global production volume Q1 2024"`), which contradicts the
chart's purpose (showing the 6% drop).

**Fix — in `_extract_inline_figures()`** (or wherever `data_context.figures`
is assembled post-extraction), add a complement-suppression step:

```python
# Suppress complement figures: if two figures A and B appear where
# A + B ≈ 100 and only one of them appears verbatim in the scene text,
# remove the one that does NOT appear in the source text.
# This prevents "94% = 100% - 6%" from surviving as an annotation
# when only "6%" was stated in the text.

def _suppress_complement_figures(
    figures: list[dict],
    source_text: str,
) -> list[dict]:
    """Remove figures that are complements of text-stated values
    (i.e., 100 - stated_value) and do not appear verbatim in source_text."""
    # Extract all numeric values that appear verbatim in source_text.
    _verbatim_values: set[str] = set(
        re.findall(r"\b\d+(?:\.\d+)?%", source_text)
    )
    filtered: list[dict] = []
    for fig in figures:
        val_str: str = str(fig.get("value") or "")
        # If the value appears verbatim in the text → always keep.
        if val_str in _verbatim_values:
            filtered.append(fig)
            continue
        # Check if this value is a complement of a verbatim figure.
        # A complement is defined as: round(100 - verbatim_value) == this_value.
        _is_complement = False
        try:
            val_num = float(val_str.rstrip("%"))
            for vv in _verbatim_values:
                try:
                    vv_num = float(vv.rstrip("%"))
                    if abs((val_num + vv_num) - 100.0) < 1.0:
                        _is_complement = True
                        break
                except ValueError:
                    continue
        except ValueError:
            pass
        if _is_complement:
            if DEBUG_RULES:
                log.info(
                    "[visual_prompts][figures] suppressed complement figure: "
                    "%r (not verbatim in text)", val_str,
                )
            continue
        # Not verbatim, not a complement — keep (may be a derived/rounded value).
        filtered.append(fig)
    return filtered
```

Call `_suppress_complement_figures(figures, scene_text)` after the figure
list is assembled, before it is written to `data_context`.

---

## Expected post-fix state

**Scene 3:**
- `"middle-aged Caucasian man with graying hair in a dark suit, focused expression"` → preserved intact
- No thin subject warning, no orphaned action text

**Scene 5:**
- `"line graph ... with y-axis 0–21% and point at 6%"` → converted to bar chart (Fix E)
- `"94%"` complement figure → suppressed, not written to `data_context.figures` (Fix F)
- Rendered annotations: `6%` and `17%` only, each attached to correct panel

Ready to put this into practice?

VidClever handles the research, script, voice, and edit so you can post daily without touching a timeline. Built to stay monetized.
New batch open: 100 creators, 50% off. First come, first served!

About the author 

Hazel Seo

Hazel is the founder of VidClver, a YouTube content automation SaaS. With a background in social media marketing, paid media buying, and building automation tools, she writes about content growth and digital marketing strategy. By night, you'll find her binge-watching K-dramas or listening to BTS on repeat.

✅ P.S. Want your YouTube channel posting daily without touching it, and without the AI slop that gets channels demonetized? A new batch of spots just opened. 100 seats, 50% off your first 3 months.

Disclosure: Some links in this article may be affiliate links. We only recommend tools we've tested and believe add genuine value.

Related posts

test 2