Skip to main content
Version: 0.20.x

Model Memory Calculator


Overview

The TitanML Model Memory Calculator is an open-source tool that helps users determine whether their machine learning model can run on their available hardware. The tool supports two modes: a Standard Calculator and a more advanced Prefill Chunking Calculator. This document outlines the functionalities of both calculators, explains the underlying formulas, and demonstrates how to use them effectively.

You can also access the Model Memory Calculator directly on Hugging Face or on GitHub Pages.

Purpose of the Tool

This tool serves several purposes:

  • Memory Management: Assists users in managing memory requirements efficiently to prevent out-of-memory errors.
  • Optimization: Helps in optimizing batch sizes, model configurations, and deployment strategies based on hardware capabilities.
  • Cost Management: Reduces cloud computing costs by optimizing hardware and model configurations.

The calculator caters to a wide range of users, including those who want to ensure that their current hardware supports a specific machine learning model and those looking to purchase hardware that meets their model requirements.


1. Standard Calculator

Introduction

The Standard Calculator allows users to estimate the memory footprint of their machine learning model based on key variables such as the number of model parameters, hidden size, number of layers, and hardware constraints. This basic estimation helps users gauge if their hardware can support the model.

Key Formulas

  1. Model Memory Needed:
    The memory required to store model parameters depends on the precision used.

    Model Memory Needed=Model Parameter Size×Precision\text{Model Memory Needed} = \text{Model Parameter Size} \times \text{Precision}
  2. Available Memory:
    The memory remaining after accounting for the model's memory usage.

    Available Memory=Device MemoryModel Memory Needed\text{Available Memory} = \text{Device Memory} - \text{Model Memory Needed}
  3. Model Memory Per Input:
    Calculated based on the model’s hidden size and the number of layers.

    Model Memory Per Input=4×Hidden Size×Number of Layers1×109\text{Model Memory Per Input} = \frac{4 \times \text{Hidden Size} \times \text{Number of Layers}}{1 \times 10^9}
  4. Maximum Input Size:
    Determines the maximum input size the device can handle without exceeding memory limits.

    Maximum Input Size=Device MemoryModel Memory Per InputModel Memory Per Input×Input Size\text{Maximum Input Size} = \frac{\text{Device Memory} - \text{Model Memory Per Input}}{\text{Model Memory Per Input} \times \text{Input Size}}

Usage Scenarios

  • Optimizing Batch Size: Using the standard calculator, users can understand the maximum input size and adjust their batch sizes accordingly.
  • Deployment Planning: For inference tasks, the standard calculator allows users to plan deployments on resource-constrained devices by estimating the memory footprint.

Standard Calculator Standard Calculator


2. Prefill Chunking Calculator

Introduction

The Prefill Chunking Calculator is designed for advanced memory optimization. It provides insights into memory usage during the activation of large models. This mode is particularly useful when models require chunked prefill during inference, which minimizes memory usage by breaking down large inputs into smaller chunks.

Key Formulas

  1. Activations Memory:
    This formula calculates the memory needed for activations during chunked prefill.

    Activations=Maximum Chunk Size×2×max(2×Intermediate Size,4×Hidden Size)\text{Activations} = \text{Maximum Chunk Size} \times 2 \times \max(2 \times \text{Intermediate Size}, 4 \times \text{Hidden Size})
  2. Valid Memory:
    Determines if the available memory can handle the prefill chunking process.

    Valid MemoryModel Memory Needed+(Model Parameter Size×Precision)+KV\text{Valid Memory} \geq \text{Model Memory Needed} + (\text{Model Parameter Size} \times \text{Precision}) + \text{KV}

Usage Scenarios

  • Chunking for Large Models: For models with large memory footprints, chunking is essential to break down the input into manageable sizes.
  • Memory Efficiency: By using the Prefill Chunking Calculator, users can optimize model inference on devices with limited memory by adjusting chunk sizes.

Prefill Chunking Calculator Prefill Chunking Calculator


Conclusion

The TitanML Model Memory Calculator is an indispensable tool for any machine learning practitioner working with memory-constrained hardware. By providing both standard and advanced memory calculations, it enables users to optimize model performance, plan deployments efficiently, and manage memory resources cost-effectively.

Whether you are an experienced practitioner or someone looking to explore machine learning deployment, the TitanML Model Memory Calculator offers an intuitive and reliable solution to your memory estimation needs.