Software Supreme created an easy to integrate library, coupled with an application for testing that uses CUDA driver API to optimally leverage the performance of a multi-gpu NVIDIA system. Using low-level optimizations for reducing divergence and improving register pressure we were able to beat the performance of hashcat for all required algorithms.