Building Data Combiners: Merging Multiple APIs Into One
Applications that aggregate listings—products, jobs, events, social posts—from multiple providers are essentially data combiners. They call several APIs, merge the results, and present a single view. Doing that in a reliable, maintainable way is harder than it looks. This post covers what combiners are, why merging APIs is difficult, how schema conflicts show up, and how deterministic ranking and normalization APIs help you ship a consistent experience.
What is a data combiner?
A data combiner is a system that takes input from multiple data sources (APIs, feeds, uploads), maps them to a common schema, and produces one unified output—often with ordering, filtering, or ranking applied. Examples:
- Price comparison: same product from Amazon, Walmart, eBay → one list with “cheapest first.”
- Job aggregator: jobs from LinkedIn, Indeed, company career pages → one searchable feed.
- Event discovery: events from Eventbrite, Meetup, Ticketmaster → one calendar or list.
- Social dashboard: posts from Twitter, LinkedIn, Facebook → one timeline with a shared format.
The combiner doesn’t replace the underlying APIs; it sits on top of them and gives you a single interface and a single schema.
Why merging APIs is hard
Each provider has its own:
- Schema — Different field names, nesting, and types.
- Pagination and rate limits — Different page sizes and throttling.
- Identifiers — No shared ID across sources; you have to match or merge by title, URL, or other heuristics.
- Semantics — “Available,” “in stock,” “ships in 24h” may mean different things.
If you merge raw responses, your application code has to handle every variant. That leads to branching, bugs when a provider changes format, and slow onboarding of new sources. A better approach is to normalize first, then merge and rank on the canonical model.
Schema conflicts
When you merge two payloads that represent the same kind of entity (e.g. one product from two retailers), you run into schema conflicts:
- Same concept, different keys — e.g.
pricevsunit_pricevsamount. - Same key, different structure — e.g. price as a number vs
{ value, currency }. - Missing or optional fields — one API always has
brand, another doesn’t.
Resolving these in ad‑hoc code is error‑prone. Normalization APIs are built to map each source’s schema into one canonical form, so your combiner only ever sees one shape. Conflicts are resolved in the normalizer (e.g. how to represent “no brand” or “price unknown”) instead of scattered across your app.
Deterministic ranking
After normalization, you often want to rank or sort merged results: by price, by date, by rating, or by “best match.” To avoid flaky UX and hard-to-reproduce bugs, ranking should be deterministic: same inputs and rules always produce the same order. That means:
- Clear comparison rules (e.g. numeric price first, then tie-break on ID).
- Stable handling of nulls and missing values.
- No reliance on request order or unstable sort keys.
Some of our normalization APIs (e.g. Retail Data Normalization) return comparison metadata (cheapest retailer, price range, rankings) so your combiner can sort and display results consistently.
APIs that power data combiners
These APIs in the catalog are built for combiner-style use cases: you send one or more source payloads and get back normalized (and sometimes compared) data.
Normalization APIs for combiners
- Retail Data Normalization — Multi-retailer product payloads into one merged product list with comparison (e.g. cheapest, best reviewed).
- Job Posting Normalization — Job listings from multiple HR and job-board sources into a single schema.
- Event Listing Normalization — Events from multiple ticketing and event platforms into one canonical event format.
- Calendar Event Normalization — Calendar events across providers for unified scheduling and display.
- Social Media Data Normalization — Social content from multiple platforms into a consistent structure for feeds and dashboards.
All are stateless and accept user-provided JSON; they don’t fetch from retailers or platforms themselves. You can run them in your pipeline after fetching from each API, then merge and rank the normalized output in your application.
For more on the idea of a single canonical model, see What Is Data Normalization in APIs?. For validating payloads before or after normalization, see API Payload Validation Best Practices.