HPC Andragogy: Automating Batch Scheduler Feedback

Kyriakos Tsoukalas

Volume 16, Issue 1 (March 2025), pp. 57–61

https://doi.org/10.22369/issn.2153-4136/16/1/11

PDF icon Download PDF

BibTeX
@article{jocse-16-1-11,
  author={Kyriakos Tsoukalas},
  title={HPC Andragogy: Automating Batch Scheduler Feedback},
  journal={The Journal of Computational Science Education},
  year=2025,
  month=mar,
  volume=16,
  issue=1,
  pages={57--61},
  doi={https://doi.org/10.22369/issn.2153-4136/16/1/11}
}
Copied to clipboard!

This paper proposes a monitoring system that emails feedback to users about submitted jobs and has the capability to stop and resubmit jobs to a batch scheduler. The proposed system has been implemented for a small supercomputing environment with a mix of high-performance and high-throughput computing jobs. User feedback includes alerts for over- and under-utilization of CPU and physical memory. This paper also discusses how predefined system thresholds were chosen and proposes three algorithms. An algorithm for the proposed monitoring system and two algorithms for the prediction of CPU and physical memory utilization. The latter algorithms are based on users' input of the identification string (job ID) of a similar job that should have finished execution without errors. Lastly, a git repository is shared to make the code accessible for review.