The Overall Process
Cost computation happens in two stages:- Trace collection and resource identification — As your application runs, Unomiq collects OpenTelemetry traces and identifies the cloud resources involved in each span (e.g., which database job ran, which API service handled a request, which LLM model was called).
- Billing matching — Unomiq takes the identified resources and matches them against your cloud billing data to assign a dollar cost to each span.
Supported Service Types
Costs are computed for three types of services, each with a different matching approach.Database Jobs
For database jobs with unique identifiers (e.g., GCP > BigQuery), Unomiq matches the job ID from the trace directly to the corresponding line item in your billing data. This is a one-to-one match — the full cost of the billing line item is attributed to the span that triggered it.API Requests (e.g., Cloud Run)
API services like Cloud Run handle many requests concurrently on the same underlying resource. A single billing line item may cover dozens or hundreds of requests that ran during the same time window. To handle this, Unomiq uses proportional cost distribution based on how much time each request overlapped with the billing window:- Identify overlapping requests — For each billing line item, Unomiq finds all API request spans that were active during that billing window (i.e., the request started before the billing window ended and finished after the billing window started).
- Measure overlap duration — For each request, the actual overlap time is calculated. If a request started before the billing window or ended after it, only the portion within the window counts.
- Distribute costs proportionally — Each request receives a share of the billing cost proportional to its overlap time relative to the total overlap of all requests in that window.
Example: proportional cost distribution
Example: proportional cost distribution
A billing window covers 1 hour with a total cost of $10. During that window:
Usage metrics (e.g., CPU seconds, memory usage) are distributed using the same proportional method.
- Request A ran for 2 seconds
- Request B ran for 3 seconds
| Request | Overlap | Cost |
|---|---|---|
| Request A | 2s | 4.00** |
| Request B | 3s | 6.00** |
LLM Calls (e.g., AI Model Inference)
For LLM calls, costs are computed based on token usage and the model’s published pricing. The cost formula accounts for three components:| Component | Rate |
|---|---|
| Input tokens (non-cached) | Model’s prompt rate |
| Cached input tokens | Model’s cache-read rate (lower than fresh input) |
| Output and reasoning tokens | Model’s completion rate |
Currency and Units
- All costs are normalized to US dollars (USD), regardless of the original billing currency.
- Usage metrics retain their original units (e.g., tokens, byte-seconds, request count) so you can see both the cost and the underlying consumption.