From Manual Enrichment to Market-Ready Content

Derek Gregg
Product Management

In this article
A mission-critical process that couldn't scale

They invested in improving product data through a variety of strategies: supplier engagement, data syndication, data brokers, software, and more. They paired these strategies with a content team that is experienced, thoughtful, and deeply familiar with their categories, but progress remained slow.
As customer expectations have shifted towards self-service, speed, and accuracy, our customer has invested in digital infrastructure and experiences. But they found themselves constrained by a problem that nearly every large technical distributor eventually hits: product data had become a bottleneck for growth.
"A large portion of content comes from manual effort"
— Senior Product Analyst
The costs were not abstract, as catalog enrichment relied on spreadsheets, manual resources, and outsourced web scraping.
Poor product content doesn't just affect e-commerce; it limited their ability to enable their counter sales and showroom teams, inside sales, quoting, and more.
"The ecommerce system is downstream from our PIM, but the PIM also feeds other systems. All of that depends on enriched data. Product data has really become the lifeblood of the company."
— IT Consultant
Product content is a workflow problem, not a one-time data project
As they evaluated their options, one realization became unavoidable: product content was not a data acquisition problem. It was a human workflow problem.
Each product requires interpretation. Teammates had to research the product, determine which attributes mattered, normalize values, and resolve ambiguity. Repeating that work across hundreds of thousands of SKUs, suppliers, and product categories made the process deeply time-consuming and difficult to scale through manual effort alone.
The core challenge was scale. They needed to dramatically increase the amount and quality of product content they produced, without simply hiring a much larger team or forcing suppliers to conform to yet another rigid data standard.
The core challenge was scale. They needed to dramatically increase the amount and quality of product content they produced, without simply hiring a much larger team or forcing suppliers to conform to yet another rigid data standard.
"A lot of this has been done by hand with best effort. We need something that can understand which attributes matter for which products, and do that at scale."
— IT Consultant
To make this workflow viable long-term, they identified a set of concrete requirements:
Increase output 2-3x without adding headcount: The existing enrichment team was already operating at capacity. Any approach that scaled linearly with people was not viable.
Normalize attributes across heterogeneous categories: The system needed to handle category-specific complexity without requiring custom configuration or manual setup for each new product type.
Operate effectively on poor, partial, and unstructured inputs: Many suppliers lacked clean feeds, consistent schemas, or modern digital catalogs. The system needed to work directly with PDFs, fragmented web content, and incomplete internal records, without waiting for supplier compliance.
Produce high-confidence structured data usable across the business: Enriched data needed to support not just e-commerce, but search and filtering, pricing logic, internal sales and showroom tools, and emerging automation initiatives such as email-to-order workflows.
Taken together, these requirements ruled out incremental fixes. Hiring more analysts, relying more heavily on suppliers, or purchasing another static data source would not change the underlying constraint.
What they needed was a fundamentally different approach to product content, one that could scale human judgment without scaling human effort.
Product truth lives across many sources
The customer brought deep domain knowledge about how their products were documented in the real world. Kaavio brought tooling to work with that reality at scale.
Together, they accepted a critical constraint: for many technical products, the most accurate source of truth is still a PDF spec sheet, not a clean data feed.
The workflow intentionally combined:
Supplier-provided structured data (when available)
Manufacturer websites
PDF catalogs and technical documentation
The customer's existing internal product records
Rather than declaring a single "golden source," the system evaluated all sources together, allowing facts to emerge through synthesis.
Use the customer's existing catalog as the learning backbone
A key decision was to start from the customer's catalog as it existed, rather than asking the team to define rules.
With access to the full catalog, Kaavio's system could:
Detect which attributes were already populated for different product categories
Learn patterns from the customer's prior enrichment work
Reinforce consistency based on what had historically worked inside the customer organization
"With the full catalog, we can use the customer's own data, which massively improves consistency and data quality."
— Stephen Perkins, Lead Engineer at Kaavio
This made their past effort an asset.
As they plan for future growth, product data is no longer viewed as the primary constraint. It has become a strategic asset, supported by a workflow that can evolve over time.

Stay ahead of product content challenges

