Tag Pipeline — Operator Console

loading…

Pipeline Overview

Run summary across all sources. Numbers come from phase3_stats.json + canonical_stats.json; everything else in this console is computed live from the JSONL files.

Matching breakdown

SourceRecordsIn multi-source clustersSingletons

Denpasoft Enrichment Queue

One row per denpasoft product with a cross-source peer. green pills show the number of new tag adds proposed. Click any row to see the full delta with per-tag source attribution.

Title ID Peer sources Current New tags New pubs New studios

Denpasoft System-Requirement Enrichment

Per-denpasoft product: the current sysreq fields on the ACF record vs. what JAST's system_requirements attribute reports. Free-text is parsed into consistent structured fields (OS list · RAM MB · CPU GHz · resolution W×H · VRAM MB · storage MB · DirectX). Proposed values take the max/union when peers disagree. Accepted values are written to the new structured ACF fields by the denpasoft-sysreq WP-CLI plugin, which also renders a display shortcode for the product page.

Title ID Fields changing Decided Current → Proposed

Denpasoft Publisher / Developer — pwb-brand → ACF Migration

Consistency audit of every denpasoft product's company data across its channels: pwb-brand (legacy), publisher / developers / circle / artist taxonomies, and ACF metadata_game_fields / metadata_book_fields (target). No peer adds are proposed — this is an intra-denpasoft audit for the brands→ACF migration. Peer data (FAKKU / JAST / VNDB) is surfaced only as a read-only reference when an operator is reviewing.

Title ID Type Migration Actions Decided Peer label ≠

Doujin / Book / CG Enrichment

Per-product enrichment candidates from e-hentai (primary — richer tags, namespaced: parody / character / female / male / other / group) and doujinshi.org (secondary — parody / genre / type / circle). Covers books, games, and DLCs. E-hentai's tag vocabulary maps to the candidate "parody", "character", "theme-tag" taxonomies; accepted values land in the CSV for a future wp denpasoft-doujin apply pass. Decisions persist in localStorage.

Title ID Type Artist(s) e-hentai doujinshi.org Decided

VNDB Backfill — Sekai ↔ VNDB ID Mapping

Sekai titles without a vndb_id can't pull tag data from VNDB. The backfill script in scripts/backfill_sekai_vndb_ids.py auto-matches via normalized title, flags ambiguous/missing cases for review, and skips non-game products (books/soundtracks/demos) since VNDB catalogs visual novels only. Every decision — auto-applied, ambiguous, no-match, skipped — surfaces here for audit.

Action Sekai name Denpasoft name Product Publisher / Dev Candidates Reason Decision

Feature Blocks — Body-prose → ACF migration

Four-source feature review queue with canonical normalization. Body: 146 denpasoft products embed a <h3>Features</h3> bullet list in their body prose. Steam: controller support / VR / accessibility only (Steam-only platform features are excluded). VNDB: length class + voiced class from VN records. acf_existing: bullets already present in the product's ACF features field (operator-authored or previously-applied) — pulled in so merge mode can preserve them. Each item runs through scripts/feature_vocab.py (~40 feature_ids across 11 categories); matched items get a canonical label + norm badge. Cross-source dedupe collapses "Multiple Endings" from body + "Multiple endings" from VNDB + "Two different endings" from existing ACF into a single canonical multiple_endings item with all sources in provenance. Two accept modes per row: ✓ real keeps only new real_features (overwrites ACF), ✓ merge keeps real_features + existing ACF bullets (union, preserves operator-authored content). Unmatched items keep raw text. wp denpasoft-features apply writes accepted items (canonical label preferred) into ACF group group_67e49cf373226.

Action Product Current Items Languages Counts Decision

Product Preview — Categorized Tag Display

Simulates how the WordPress plugin wp/denpasoft-tag-display.php would render a product's taxonomy assignments as categorized sections (Featured / Genres / Setting / Heroines / Story / Gameplay / Sexual Content / Content Warnings / Languages). Pure read-only — data comes from denpasoft_taxonomy_snapshot.json (regenerated on every deploy from the last denpasoft crawl) and the schema comes from tag_display_categories.json. No writes to production.

Overlays the cross-source tag proposals from denpasoft_tag_delta.jsonl on top of the product's current tags. New tags are marked with a green outline.

Current flat display

Every tag/term on the product, flat. What storefront shows today if it lists them all.

Proposed categorized display

What [denpasoft_categorized_tags] would render after deploying wp/denpasoft-tag-display.php.
Proposed additions for this product

Audits for this product

Catalog-wide audits

Zero-tag products by category (DLC vs series vs standalone)
Uncategorized terms (in use on products but not in tag_display_categories.json)

Canonical Entities

Phase 2 produced these identity clusters. Each row is a canonical entity; expand to see the per-source members and union metadata.

Canonical title Canonical ID Sources Members Tags Genres Sekai

Canonical Tag Catalog

The unified tag vocabulary across all crawled sources. Slug-keyed. Usage per source and namespace preserved for audit.

Slug Display label Sources Titles Breakdown
Static single-file dashboard. Data is fetched from the same directory as this HTML file. To serve locally: python -m http.server -d out/live then open http://localhost:8000/. · Help & tour · Docs