We developed a framework that dynamically balances computation, memory, and network resources at both the graph and operator levels. By strategically overlapping computation and communication, tuning CUDA kernel parameters, and adapting these settings as GPU frequency changes, we achieved up to 23% energy savings in key training stages compared to previous methods-without sacrificing performance. Our results show that energy-optimal configurations often differ from those that minimize execution time, highlighting the need for dynamic, frequency-aware optimization in energy-efficient AI training.