Latency and throughput in ML

 What is the difference between latency and throughput in ML?

Latency is a measurement in Machine Learning to determine the performance of various models for a specific application. Latency refers to the time taken to process one unit of data provided only one unit of data is processed at a time.

We'll define the latency as the time required to execute a pipeline (e.g. 10 ms) and throughput as the number of items processed in such a pipeline (e.g. 100 imgs/sec). In this particular example throughput and latency yield are the same. A machine learning pipeline could be made faster.


Comments

Popular posts from this blog

Platforms for building and deploying conversational AI agents

Better prompts to harness capability of cgpt

Which AI is suitable for you - Gemini, microsoft copilot, deep seek or claude?