Profile CUDA kernels with one command, zero GPU setup

technoabsurdist 10 hours ago

We've been doing lots of GPU kernel profiling and optimization on cloud infrastructure, but without local GPU hardware, that meant constant SSH juggling: upload code, compile remotely, profile kernels, download results, repeat. Or, work entirely on cloud which is expensive, slow, and annoying. We were spending more time managing infrastructure than writing the kernels we wanted to optimize.

So we built Chisel: one command to run profiling commands on any kernel. Zero local GPU hardware required.

Next up we're planning to build a web dashboard for visualizing results, simultaneous profiling across multiple GPU types, and automatic resource cleanup. But please let us know what you would like to see in this project.

Available via PyPI: pip install chisel-cli

Github: https://github.com/Herdora/chisel

We're actively developing and would love community feedback. Feature requests and contributions always welcome!