Pipeline Overview
Run summary across all sources. Numbers come from phase3_stats.json + canonical_stats.json; everything else in this console is computed live from the JSONL files.
Matching breakdown
| Source | Records | In multi-source clusters | Singletons |
|---|
Denpasoft Enrichment Queue
One row per denpasoft product with a cross-source peer. green pills show the number of new tag adds proposed. Click any row to see the full delta with per-tag source attribution.
| Title | ID | Peer sources | Current | New tags | New pubs | New studios |
|---|
Denpasoft System-Requirement Enrichment
Per-denpasoft product: the current sysreq fields on the ACF record vs. what JAST's system_requirements attribute reports. Free-text is parsed into consistent structured fields (OS list · RAM MB · CPU GHz · resolution W×H · VRAM MB · storage MB · DirectX). Proposed values take the max/union when peers disagree. Accepted values are written to the new structured ACF fields by the denpasoft-sysreq WP-CLI plugin, which also renders a display shortcode for the product page.
| Title | ID | Fields changing | Decided | Current → Proposed |
|---|
Denpasoft Publisher / Developer — pwb-brand → ACF Migration
Consistency audit of every denpasoft product's company data across its channels: pwb-brand (legacy), publisher / developers / circle / artist taxonomies, and ACF metadata_game_fields / metadata_book_fields (target). No peer adds are proposed — this is an intra-denpasoft audit for the brands→ACF migration. Peer data (FAKKU / JAST / VNDB) is surfaced only as a read-only reference when an operator is reviewing.
| Title | ID | Type | Migration | Actions | Decided | Peer label ≠ |
|---|
Featured Tags (Customer-Facing Filter Recommendation)
Curated slim lists to surface in the storefront filter UI. Emitted as three separate files: featured_tags.jsonl (themes → seeds the theme-tag taxonomy / "Tags" rail), featured_genres.jsonl (thematic genres → the genre taxonomy / "Genre Tags" rail), and featured_mechanics.jsonl (mechanical genres → the game-genre taxonomy / "Game Genre" rail). Denpasoft already has dedicated sections for Language / Platform / Content-rating, so those metadata tags are filtered out.
Themes and genres are ranked independently. Themes default view — switch the dropdown to inspect the genre list. The denpasoft-theme-tag WP-CLI seeder should consume the themes file; a future genre-seeder should consume the genres file.
| # | Kind | Label | Slug | Titles | Sources | Score | Why |
|---|
Doujin / Book / CG Enrichment
Per-product enrichment candidates from e-hentai (primary — richer tags, namespaced: parody / character / female / male / other / group) and doujinshi.org (secondary — parody / genre / type / circle). Covers books, games, and DLCs. E-hentai's tag vocabulary maps to the candidate "parody", "character", "theme-tag" taxonomies; accepted values land in the CSV for a future wp denpasoft-doujin apply pass. Decisions persist in localStorage.
| Title | ID | Type | Artist(s) | e-hentai | doujinshi.org | Decided |
|---|
VNDB Backfill — Sekai ↔ VNDB ID Mapping
Sekai titles without a vndb_id can't pull tag data from VNDB. The backfill script in scripts/backfill_sekai_vndb_ids.py auto-matches via normalized title, flags ambiguous/missing cases for review, and skips non-game products (books/soundtracks/demos) since VNDB catalogs visual novels only. Every decision — auto-applied, ambiguous, no-match, skipped — surfaces here for audit.
| Action | Sekai name | Denpasoft name | Product | Publisher / Dev | Candidates | Reason | Decision |
|---|
Feature Blocks — Body-prose → ACF migration
Four-source feature review queue with canonical normalization. Body: 146 denpasoft products embed a <h3>Features</h3> bullet list in their body prose. Steam: controller support / VR / accessibility only (Steam-only platform features are excluded). VNDB: length class + voiced class from VN records. acf_existing: bullets already present in the product's ACF features field (operator-authored or previously-applied) — pulled in so merge mode can preserve them. Each item runs through scripts/feature_vocab.py (~40 feature_ids across 11 categories); matched items get a canonical label + norm badge. Cross-source dedupe collapses "Multiple Endings" from body + "Multiple endings" from VNDB + "Two different endings" from existing ACF into a single canonical multiple_endings item with all sources in provenance. Two accept modes per row: ✓ real keeps only new real_features (overwrites ACF), ✓ merge keeps real_features + existing ACF bullets (union, preserves operator-authored content). Unmatched items keep raw text. wp denpasoft-features apply writes accepted items (canonical label preferred) into ACF group group_67e49cf373226.
| Action | Product | Current | Items | Languages | Counts | Decision |
|---|
Product Preview — Categorized Tag Display
Simulates how the WordPress plugin wp/denpasoft-tag-display.php would render a product's taxonomy assignments as categorized sections (Featured / Genres / Setting / Heroines / Story / Gameplay / Sexual Content / Content Warnings / Languages). Pure read-only — data comes from denpasoft_taxonomy_snapshot.json (regenerated on every deploy from the last denpasoft crawl) and the schema comes from tag_display_categories.json. No writes to production.
denpasoft_tag_delta.jsonl on top of the product's current tags. New tags are marked with a green outline.
Current flat display
Proposed categorized display
[denpasoft_categorized_tags] would render after deploying wp/denpasoft-tag-display.php.Proposed additions for this product
Audits for this product
Catalog-wide audits
Zero-tag products by category (DLC vs series vs standalone)
Uncategorized terms (in use on products but not in tag_display_categories.json)
Canonical Entities
Phase 2 produced these identity clusters. Each row is a canonical entity; expand to see the per-source members and union metadata.
| Canonical title | Canonical ID | Sources | Members | Tags | Genres | Sekai |
|---|