Skip to main content
Models are the trained detectors that identify artifacts in audio. Each model is created from a training job and can be used for inference.

What is a model?

A model is the output of a successful training job:
  • Trained on your annotated audio data
  • Detects the artifact types you defined
  • Produces timestamped detections with confidence scores
Training Job (completed)

    Model Created

    Inference Ready

Model structure

FieldDescription
idUnique identifier
nameOptional display name
descriptionOptional description
artifact_typesList of detectable artifact types
training_job_idThe job that created this model
eval_metricsValidation metrics (precision, recall, F1)
is_activeWhether the model is active
created_atWhen the model was created

Listing models

Python
response = requests.get(
    f"{BASE_URL}/api/v1/models",
    headers={"X-API-Key": API_KEY}
)
models = response.json()

for model in models["items"]:
    print(f"{model['name'] or model['id']}")
    print(f"  Types: {model['artifact_types']}")
    print(f"  F1: {model['eval_metrics']['f1_score']:.2f}")

Filter by status

Python
# Only active models
response = requests.get(
    f"{BASE_URL}/api/v1/models",
    headers={"X-API-Key": API_KEY},
    params={"is_active": True}
)

Model metrics

Models include evaluation metrics from training:
{
  "eval_metrics": {
    "precision": 0.89,
    "recall": 0.85,
    "f1_score": 0.87,
    "per_class": {
      "glitch": {
        "precision": 0.92,
        "recall": 0.88,
        "f1_score": 0.90,
        "support": 150
      },
      "long_pause": {
        "precision": 0.85,
        "recall": 0.82,
        "f1_score": 0.83,
        "support": 80
      }
    }
  }
}

Understanding metrics

MetricWhat it measuresGood values
PrecisionOf detected artifacts, how many are real?> 0.8
RecallOf real artifacts, how many were detected?> 0.8
F1 ScoreBalance of precision and recall> 0.8
SupportNumber of validation examplesHigher = more reliable

Per-class metrics

Check metrics for each artifact type:
Python
for artifact_type, metrics in model["eval_metrics"]["per_class"].items():
    print(f"{artifact_type}:")
    print(f"  Precision: {metrics['precision']:.2f}")
    print(f"  Recall: {metrics['recall']:.2f}")
    print(f"  Support: {metrics['support']}")

Managing models

Update name and description

Python
response = requests.patch(
    f"{BASE_URL}/api/v1/models/{model_id}",
    headers={"X-API-Key": API_KEY},
    json={
        "name": "TTS Glitch Detector v2",
        "description": "Trained on 2 hours of production audio, Jan 2024"
    }
)

Activate/deactivate

Deactivate models you no longer use:
Python
# Deactivate
requests.patch(
    f"{BASE_URL}/api/v1/models/{model_id}",
    headers={"X-API-Key": API_KEY},
    json={"is_active": False}
)

# Reactivate
requests.patch(
    f"{BASE_URL}/api/v1/models/{model_id}",
    headers={"X-API-Key": API_KEY},
    json={"is_active": True}
)

Delete (archive)

Deleting archives the model (soft delete):
Python
requests.delete(
    f"{BASE_URL}/api/v1/models/{model_id}",
    headers={"X-API-Key": API_KEY}
)
Archived models cannot be used for inference but are retained for audit purposes.

Selecting models for inference

By metrics

Choose the model with best performance:
Python
response = requests.get(
    f"{BASE_URL}/api/v1/models",
    headers={"X-API-Key": API_KEY},
    params={"is_active": True}
)
models = response.json()["items"]

# Find best F1 score
best_model = max(models, key=lambda m: m["eval_metrics"]["f1_score"])
print(f"Best model: {best_model['id']} (F1: {best_model['eval_metrics']['f1_score']:.2f})")

By artifact type

Choose based on per-class performance:
Python
# Find best model for detecting glitches
def glitch_f1(model):
    return model["eval_metrics"]["per_class"].get("glitch", {}).get("f1_score", 0)

best_glitch_model = max(models, key=glitch_f1)

By recency

Use the most recently trained model:
Python
from datetime import datetime

def parse_date(model):
    return datetime.fromisoformat(model["created_at"].replace("Z", "+00:00"))

latest_model = max(models, key=parse_date)

Model versioning

Track model versions through naming:
Python
# After training, update the name with version info
response = requests.patch(
    f"{BASE_URL}/api/v1/models/{model_id}",
    headers={"X-API-Key": API_KEY},
    json={
        "name": f"TTS Detector v{version}",
        "description": f"""
        Trained: {datetime.now().isoformat()}
        Dataset: {dataset_id}
        Annotation set: v{annotation_set_version}
        F1: {metrics['f1_score']:.2f}
        """
    }
)

Model comparison

Compare models trained on different data or configurations:
Python
def compare_models(model_ids):
    """Compare metrics across models."""
    results = []

    for model_id in model_ids:
        response = requests.get(
            f"{BASE_URL}/api/v1/models/{model_id}",
            headers={"X-API-Key": API_KEY}
        )
        model = response.json()

        results.append({
            "id": model_id,
            "name": model["name"],
            "precision": model["eval_metrics"]["precision"],
            "recall": model["eval_metrics"]["recall"],
            "f1_score": model["eval_metrics"]["f1_score"]
        })

    return results

comparison = compare_models(["model-1", "model-2", "model-3"])
for m in comparison:
    print(f"{m['name']}: P={m['precision']:.2f} R={m['recall']:.2f} F1={m['f1_score']:.2f}")

Best practices

Name models clearly

Include version and key characteristics:
"TTS Glitch Detector v2 - High Recall"
"Voice Agent Echo Detection - Production"
"EN-US Pronunciation Model v1.3"

Document training details

Use the description field:
Python
{
    "description": """
    Training config:
    - Epochs: 30
    - Batch size: 64
    - Learning rate: 0.0005

    Training data:
    - 150 audio files (2.5 hours)
    - 520 annotations
    - Annotation set v3

    Notes:
    - Improved recall for short glitches by using smaller chunk size
    """
}

Keep old models

Don’t delete models immediately when training new ones:
  • Compare performance before switching
  • Rollback if the new model underperforms
  • Track improvements over time

Monitor in production

Track model performance over time:
  • Log detection rates
  • Compare against human review
  • Retrain when performance degrades