Skip to main content

0.11.0

  • Added support for reranking & classification models.
  • Added CUDA graph LRU caching to cap memory overheads when using CUDA graphs.
  • Reduce size of GPU image by over half
  • Fix bug where vertex integration couldn't find CUDA driver.
  • Fix bug where synchronization issues could arise when using multi-gpu