Performance Tuning & Cost Management

A runbook for optimizing latency and cost using NjiraAI's tiered model cascade.

NjiraAI evaluates policies using an intelligent model cascade. This guide explains how operators can tune performance by configuring the appropriate intelligence tier for different agent workloads.

Intelligence Tiers

We offer three tiers of model capability. The specific LLMs backing these tiers are updated regularly by Njira to the latest state-of-the-art versions to guarantee best-in-class safety.

Tier Characteristics Recommended Usage Underlying Models (Example)
FAST High speed, low latency, lowest cost. Use for real-time chat filtering, simple PII detection, or extremely high-volume, low-risk tooling. gpt-5-mini, claude-haiku-4.5
STANDARD (Default) Balanced performance and intelligence. The default. Use for most complex policy logic and standard context-aware safety evaluation. gpt-5.2, claude-sonnet-4.5
STRONG Maximum reasoning capability. Highest latency and cost. Use exclusively for nuanced threat detection (e.g., sophisticated prompt injection) or secure financial auditing tools. gpt-5.2-pro, claude-opus-4.5

Operational Setup: Selecting a Tier

You can specify the desired tier per-request or per-tool to optimize your cost/latency profile.

Via Headers (Proxy Gateway)

When routing agents through the Njira Gateway proxy, pass the X-Njira-Tier header. This is the simplest way to tune performance without changing SDK code:

curl https://gateway.njira.ai/v1/chat/completions \
  -H "Authorization: Bearer <your-key>" \
  -H "X-Njira-Tier: fast" \
  ...

Via SDK Metadata

If you are using deep SDK integrations for pre/post boundaries, include the tier in the metadata payload:

response = njira.audit(
    content="...",
    metadata={"tier": "strong"}
)

Triage: When to Tune Tiers

If your agent workflows are experiencing unacceptable latencies:

  1. Check the p95 evaluation latency in the Usage dashboard.
  2. If the latency is heavily driven by the STANDARD policy evaluation, attempt down-tiering the specific tool to FAST.
  3. Verify Safety: Monitor the newly configured FAST tool in Shadow mode first to ensure the smaller model is not generating false negatives (missing actual threats).

If no tier is explicitly specified in the request context, the system defaults to STANDARD to ensure a safe, robust baseline for all unconfigured traffic.