Tag Pipeline Operator Console

Pipeline Overview

Run summary across all sources. Numbers come from phase3_stats.json + canonical_stats.json; everything else in this console is computed live from the JSONL files.

Matching breakdown

Source	Records	In multi-source clusters	Singletons

Denpasoft Enrichment Queue

One row per denpasoft product with a cross-source peer. green pills show the number of new tag adds proposed. Click any row to see the full delta with per-tag source attribution.

Only show products with new tags Only peers with FAKKU

Title	ID	Peer sources	Current	New tags	New pubs	New studios

Denpasoft System-Requirement Enrichment

Per-denpasoft product: the current sysreq fields on the ACF record vs. what JAST's system_requirements attribute reports. Free-text is parsed into consistent structured fields (OS list · RAM MB · CPU GHz · resolution W×H · VRAM MB · storage MB · DirectX). Proposed values take the max/union when peers disagree. Accepted values are written to the new structured ACF fields by the denpasoft-sysreq WP-CLI plugin, which also renders a display shortcode for the product page.

Only products where ≥1 field would change Decision:

Title	ID	Fields changing	Decided	Current → Proposed

Denpasoft Publisher / Developer — pwb-brand → ACF Migration

Consistency audit of every denpasoft product's company data across its channels: pwb-brand (legacy), publisher / developers / circle / artist taxonomies, and ACF metadata_game_fields / metadata_book_fields (target). No peer adds are proposed — this is an intra-denpasoft audit for the brands→ACF migration. Peer data (FAKKU / JAST / VNDB) is surfaced only as a read-only reference when an operator is reviewing.

Status: Confidence:

Title	ID	Type	Migration	Actions	Decided	Peer label ≠

Featured Tags (Customer-Facing Filter Recommendation)

Curated slim lists to surface in the storefront filter UI. Emitted as three separate files: featured_tags.jsonl (themes → seeds the theme-tag taxonomy / "Tags" rail), featured_genres.jsonl (thematic genres → the genre taxonomy / "Genre Tags" rail), and featured_mechanics.jsonl (mechanical genres → the game-genre taxonomy / "Game Genre" rail). Denpasoft already has dedicated sections for Language / Platform / Content-rating, so those metadata tags are filtered out.

Kind:

Themes and genres are ranked independently. Themes default view — switch the dropdown to inspect the genre list. The denpasoft-theme-tag WP-CLI seeder should consume the themes file; a future genre-seeder should consume the genres file.

#	Kind	Label	Slug	Titles	Sources	Score	Why

Doujin / Book / CG Enrichment

Per-product enrichment candidates from e-hentai (primary — richer tags, namespaced: parody / character / female / male / other / group) and doujinshi.org (secondary — parody / genre / type / circle). Covers books, games, and DLCs. E-hentai's tag vocabulary maps to the candidate "parody", "character", "theme-tag" taxonomies; accepted values land in the CSV for a future wp denpasoft-doujin apply pass. Decisions persist in localStorage.

Type: Match: Decision:

Title	ID	Type	Artist(s)	e-hentai	doujinshi.org	Decided

VNDB Backfill — Sekai ↔ VNDB ID Mapping

Sekai titles without a vndb_id can't pull tag data from VNDB. The backfill script in scripts/backfill_sekai_vndb_ids.py auto-matches via normalized title, flags ambiguous/missing cases for review, and skips non-game products (books/soundtracks/demos) since VNDB catalogs visual novels only. Every decision — auto-applied, ambiguous, no-match, skipped — surfaces here for audit.

Action: Reason: Decision:

Action	Sekai name	Denpasoft name	Product	Publisher / Dev	Candidates	Reason	Decision

Feature Blocks — Body-prose → ACF migration

Four-source feature review queue with canonical normalization. Body: 146 denpasoft products embed a <h3>Features</h3> bullet list in their body prose. Steam: controller support / VR / accessibility only (Steam-only platform features are excluded). VNDB: length class + voiced class from VN records. acf_existing: bullets already present in the product's ACF features field (operator-authored or previously-applied) — pulled in so merge mode can preserve them. Each item runs through scripts/feature_vocab.py (~40 feature_ids across 11 categories); matched items get a canonical label + norm badge. Cross-source dedupe collapses "Multiple Endings" from body + "Multiple endings" from VNDB + "Two different endings" from existing ACF into a single canonical multiple_endings item with all sources in provenance. Two accept modes per row: ✓ real keeps only new real_features (overwrites ACF), ✓ merge keeps real_features + existing ACF bullets (union, preserves operator-authored content). Unmatched items keep raw text. wp denpasoft-features apply writes accepted items (canonical label preferred) into ACF group group_67e49cf373226.

Kinds present: Source: Normalization: Decision:

Action	Product	Current	Items	Languages	Counts	Decision

Product Preview — Categorized Tag Display

Simulates how the WordPress plugin wp/denpasoft-tag-display.php would render a product's taxonomy assignments as categorized sections (Featured / Genres / Setting / Heroines / Story / Gameplay / Sexual Content / Content Warnings / Languages). Pure read-only — data comes from denpasoft_taxonomy_snapshot.json (regenerated on every deploy from the last denpasoft crawl) and the schema comes from tag_display_categories.json. No writes to production.

Apply proposed tag additions Overlays the cross-source tag proposals from denpasoft_tag_delta.jsonl on top of the product's current tags. New tags are marked with a green outline.

Current flat display

Every tag/term on the product, flat. What storefront shows today if it lists them all.

Proposed categorized display

What [denpasoft_categorized_tags] would render after deploying wp/denpasoft-tag-display.php.

Proposed additions for this product

Audits for this product

Catalog-wide audits

Zero-tag products by category (DLC vs series vs standalone)

Uncategorized terms (in use on products but not in tag_display_categories.json)

Canonical Entities

Phase 2 produced these identity clusters. Each row is a canonical entity; expand to see the per-source members and union metadata.

Sources: Has tags:

Canonical title	Canonical ID	Sources	Members	Tags	Genres	Sekai

Canonical Tag Catalog

The unified tag vocabulary across all crawled sources. Slug-keyed. Usage per source and namespace preserved for audit.

Source:

Slug	Display label	Sources	Titles	Breakdown

Pipeline Overview ?

Matching breakdown

Denpasoft Enrichment Queue ?

Denpasoft System-Requirement Enrichment ?

Denpasoft Publisher / Developer — pwb-brand → ACF Migration ?

Featured Tags (Customer-Facing Filter Recommendation) ?

Doujin / Book / CG Enrichment ?

VNDB Backfill — Sekai ↔ VNDB ID Mapping ?

Feature Blocks — Body-prose → ACF migration ?

Product Preview — Categorized Tag Display ?

Current flat display

Proposed categorized display

Audits for this product

Catalog-wide audits

Canonical Entities ?

Canonical Tag Catalog ?

Pipeline Overview

Denpasoft Enrichment Queue

Denpasoft System-Requirement Enrichment

Denpasoft Publisher / Developer — pwb-brand → ACF Migration

Featured Tags (Customer-Facing Filter Recommendation)

Doujin / Book / CG Enrichment

VNDB Backfill — Sekai ↔ VNDB ID Mapping

Feature Blocks — Body-prose → ACF migration

Product Preview — Categorized Tag Display

Canonical Entities

Canonical Tag Catalog