- Delay time: The time spent waiting for a worker to become available. This includes the cold start time if a new worker needs to be spun up.
- Execution time: The time the GPU takes to process the request once the worker has received the job.
Send a test request
To gather initial metrics, usecurl to send a request to your endpoint. This will initiate the job and return a request ID that you can use to poll for status.
/status endpoint to get the delay time and execution time:
Automate benchmarking
To get a statistically significant view of your worker’s performance, you should automate the benchmarking process. The following Python script sends multiple requests and calculates the minimum, maximum, and average times for both delay and execution.benchmark.py