Prerequisites
Before training, ensure:- Annotation set is published: Draft sets cannot be used for training
- All audio is processed: Every file must have
processing_status: ready - Sufficient data: At least 5 minutes of total audio
- Coverage per type: At least 1 annotation per artifact type you want to detect
Creating a training job
Python
Training configuration
| Parameter | Default | Range | Description |
|---|---|---|---|
artifact_types | (required) | - | List of artifact types to train on |
epochs | 20 | 1-100 | Training iterations over the dataset |
validation_split | 0.2 | 0.1-0.5 | Fraction of data held out for validation |
learning_rate | 0.001 | 0.0001-0.1 | Step size for optimizer |
batch_size | 32 | 8-256 | Samples per training step |
chunk_size_ms | 500 | - | Audio chunk size for processing |
stride_ms | 250 | - | Overlap between chunks |
Choosing artifact types
You can train on all artifact types in your dataset or a subset:Python
Epochs
More epochs can improve accuracy but risks overfitting:- Small datasets (< 30 minutes): 10-20 epochs
- Medium datasets (30-120 minutes): 20-40 epochs
- Large datasets (> 120 minutes): 30-50 epochs
Validation split
A portion of your data is held out to measure model performance:0.2(20%): Good default for most datasets0.1(10%): Use for very large datasets0.3(30%): Use for small datasets to get reliable metrics
Learning rate
Controls how aggressively the model updates:0.001: Good default0.0001: For fine-tuning or unstable training0.01: For faster training on large datasets
Batch size
Larger batches train faster but use more memory:32: Good default16: For limited memory64-128: For large datasets
Monitoring training
Job status
Poll the training job to check progress:Python
Training job statuses
| Status | Description |
|---|---|
pending | Job created, waiting to start |
queued | Waiting for compute resources |
training | Actively training |
completed | Training finished successfully |
failed | Training failed (check error_message) |
cancelled | Cancelled by user |
Understanding metrics
After training completes, the job includes evaluation metrics:Key metrics
| Metric | Description | Ideal |
|---|---|---|
| Precision | Of detected artifacts, how many are real | Higher = fewer false positives |
| Recall | Of real artifacts, how many were detected | Higher = fewer missed artifacts |
| F1 Score | Harmonic mean of precision and recall | Balance of both |
| Support | Number of validation examples | More = more reliable metrics |
Interpreting results
- High precision, low recall: Model is conservative, missing some artifacts
- Low precision, high recall: Model is aggressive, flagging too much
- Both high: Good model performance
- Both low: Need more training data or different hyperparameters
Working with models
List models
After training completes, a model is automatically created:Python
Get model details
Python
Update model
Add a name and description for easier identification:Python
Active vs inactive
Models can be marked active or inactive:Python
Delete model
Deleting archives the model (soft delete):Python
Cancelling training
Cancel a pending or running job:Python
When to retrain
Consider retraining when:- Adding new artifact types: Train a new model with the additional types
- Model performance drops: Audio characteristics may have changed
- More data available: More examples generally improve accuracy
- Tuning thresholds isn’t enough: If you’re constantly adjusting inference thresholds, the model may need improvement
Troubleshooting
”Annotation set must be published"
Python
"Not enough audio data”
Ensure you have at least 5 minutes of processed audio:Python
“Audio files not ready”
Wait for all audio to finish processing:Python
