How Cost Computation Works

Unomiq computes costs for your application by matching billing data from your cloud provider with traces and spans collected from your services. This gives you a detailed, per-request breakdown of what each operation actually costs.

The Overall Process

Cost computation happens in two stages:

Trace collection and resource identification — As your application runs, Unomiq collects OpenTelemetry traces and identifies the cloud resources involved in each span (e.g., which database job ran, which API service handled a request, which LLM model was called).
Billing matching — Unomiq takes the identified resources and matches them against your cloud billing data to assign a dollar cost to each span.

Supported Service Types

Costs are computed for three types of services, each with a different matching approach.

Database Jobs

For database jobs with unique identifiers (e.g., GCP > BigQuery), Unomiq matches the job ID from the trace directly to the corresponding line item in your billing data. This is a one-to-one match — the full cost of the billing line item is attributed to the span that triggered it.

API Requests (e.g., Cloud Run)

API services like Cloud Run handle many requests concurrently on the same underlying resource. A single billing line item may cover dozens or hundreds of requests that ran during the same time window. To handle this, Unomiq uses proportional cost distribution based on how much time each request overlapped with the billing window:

Identify overlapping requests — For each billing line item, Unomiq finds all API request spans that were active during that billing window (i.e., the request started before the billing window ended and finished after the billing window started).
Measure overlap duration — For each request, the actual overlap time is calculated. If a request started before the billing window or ended after it, only the portion within the window counts.
Distribute costs proportionally — Each request receives a share of the billing cost proportional to its overlap time relative to the total overlap of all requests in that window.

Example: proportional cost distribution

A billing window covers 1 hour with a total cost of $10. During that window:

Request A ran for 2 seconds
Request B ran for 3 seconds

Total overlap = 5 seconds. Costs are distributed as:

Request	Overlap	Cost
Request A	2s	$10 × (2/5) = $ 4.00
Request B	3s	$10 × (3/5) = $ 6.00

Usage metrics (e.g., CPU seconds, memory usage) are distributed using the same proportional method.

LLM Calls (e.g., AI Model Inference)

For LLM calls, costs are computed based on token usage and the model’s published pricing. The cost formula accounts for three components:

Component	Rate
Input tokens (non-cached)	Model’s prompt rate
Cached input tokens	Model’s cache-read rate (lower than fresh input)
Output and reasoning tokens	Model’s completion rate

The total cost for an LLM call is:

cost = (fresh_input_tokens × prompt_price)
     + (cached_input_tokens × cache_read_price)
     + (output_tokens + reasoning_tokens) × completion_price

Model pricing is maintained in a pricing catalog that is kept up to date with current rates.

Currency and Units

All costs are normalized to US dollars (USD), regardless of the original billing currency.
Usage metrics retain their original units (e.g., tokens, byte-seconds, request count) so you can see both the cost and the underlying consumption.

Incremental Processing

Cost computation runs incrementally. Each run picks up only the spans that have arrived since the last run, so there is no reprocessing of data you’ve already seen. If processing is delayed for any reason, the system automatically backfills all unprocessed spans on the next run.

Getting started

Sending Data

Cost Attribution

How Cost Computation Works

The Overall Process

Supported Service Types

Database Jobs

API Requests (e.g., Cloud Run)

LLM Calls (e.g., AI Model Inference)

Currency and Units

Incremental Processing

Getting started

Sending Data

Cost Attribution

​The Overall Process

​Supported Service Types

​Database Jobs

​API Requests (e.g., Cloud Run)

​LLM Calls (e.g., AI Model Inference)

​Currency and Units

​Incremental Processing

The Overall Process

Supported Service Types

Database Jobs

API Requests (e.g., Cloud Run)

LLM Calls (e.g., AI Model Inference)

Currency and Units

Incremental Processing