Skip to main content
Unomiq computes costs for your application by matching billing data from your cloud provider with traces and spans collected from your services. This gives you a detailed, per-request breakdown of what each operation actually costs.

The Overall Process

Cost computation happens in two stages:
  1. Trace collection and resource identification — As your application runs, Unomiq collects OpenTelemetry traces and identifies the cloud resources involved in each span (e.g., which database job ran, which API service handled a request, which LLM model was called).
  2. Billing matching — Unomiq takes the identified resources and matches them against your cloud billing data to assign a dollar cost to each span.

Supported Service Types

Costs are computed for three types of services, each with a different matching approach.

Database Jobs

For database jobs with unique identifiers (e.g., GCP > BigQuery), Unomiq matches the job ID from the trace directly to the corresponding line item in your billing data. This is a one-to-one match — the full cost of the billing line item is attributed to the span that triggered it.

API Requests (e.g., Cloud Run)

API services like Cloud Run handle many requests concurrently on the same underlying resource. A single billing line item may cover dozens or hundreds of requests that ran during the same time window. To handle this, Unomiq uses proportional cost distribution based on how much time each request overlapped with the billing window:
  1. Identify overlapping requests — For each billing line item, Unomiq finds all API request spans that were active during that billing window (i.e., the request started before the billing window ended and finished after the billing window started).
  2. Measure overlap duration — For each request, the actual overlap time is calculated. If a request started before the billing window or ended after it, only the portion within the window counts.
  3. Distribute costs proportionally — Each request receives a share of the billing cost proportional to its overlap time relative to the total overlap of all requests in that window.
A billing window covers 1 hour with a total cost of $10. During that window:
  • Request A ran for 2 seconds
  • Request B ran for 3 seconds
Total overlap = 5 seconds. Costs are distributed as:
RequestOverlapCost
Request A2s10×(2/5)=10 × (2/5) = **4.00**
Request B3s10×(3/5)=10 × (3/5) = **6.00**
Usage metrics (e.g., CPU seconds, memory usage) are distributed using the same proportional method.

LLM Calls (e.g., AI Model Inference)

For LLM calls, costs are computed based on token usage and the model’s published pricing. The cost formula accounts for three components:
ComponentRate
Input tokens (non-cached)Model’s prompt rate
Cached input tokensModel’s cache-read rate (lower than fresh input)
Output and reasoning tokensModel’s completion rate
The total cost for an LLM call is:
cost = (fresh_input_tokens × prompt_price)
     + (cached_input_tokens × cache_read_price)
     + (output_tokens + reasoning_tokens) × completion_price
Model pricing is maintained in a pricing catalog that is kept up to date with current rates.

Currency and Units

  • All costs are normalized to US dollars (USD), regardless of the original billing currency.
  • Usage metrics retain their original units (e.g., tokens, byte-seconds, request count) so you can see both the cost and the underlying consumption.

Incremental Processing

Cost computation runs incrementally. Each run picks up only the spans that have arrived since the last run, so there is no reprocessing of data you’ve already seen. If processing is delayed for any reason, the system automatically backfills all unprocessed spans on the next run.