Hi Everybody,
I'm currently in the process of gathering some information about various HPC Schedulers out in the wild. My users and primary investors have helped me to compile a list of feature requirements that they need from a scheduler. I've poured over the SGE docs on the main site and have only gotten some preliminary information about the features I'm looking for. So I figured the best way to find out if SGE supports some of this stuff would be to ask those who use it, and thus here I am. If anyone can provide me some insight/details into anything on this list, it would be greatly appreciated.
Thanks
--
Jason
Scheduler Requirements:
- some sort of group based shared accounting based on quotas or priority adjustment
- period based accounting preferred
- accounting for idle nodes that idle as a result of large jobs queued and waiting to run
- groups charged for % of the machine used not % of the active machine
- scheduling of nodes that do not effect the above mentioned group based accounting
- user/group level access control to queues and machines based on attributes
- on demand power capabilities
- cancel and resubmit preemption
- reports and statistics kept on the following:
- machine load (based on the full machine possible)
- node level load, uptime, downtime, etc. trendable over time and montiorable for point-in-time
- group level usage from the above mentioned accounting system
- user level usage from the above mentioned accounting system.
- Interactive Jobs
- Job Arrays
SGE Info
The best place to start is the Beginner's Guide to SGE. To answer your questions specifically:
- some sort of group based shared accounting based on quotas or priority adjustment
= Yes
- period based accounting preferred
= Yes
- accounting for idle nodes that idle as a result of large jobs queued and waiting to run
= You might be able to pull that out of the accounting database, but it's not something that we report directly.
- groups charged for % of the machine used not % of the active machine
= We account both CPU time and wallclock time.
- scheduling of nodes that do not effect the above mentioned group based accounting
= Not sure I understand this point. When you talk about accounting, are you talking about writing logs of usage, or are you talking about tracking usage for use in fair-share scheduling?
- user/group level access control to queues and machines based on attributes
= See if Resource Quota Sets are what you want.
- on demand power capabilities
= As of 6.2u3, this is supported through the Service Domain Manager module
- cancel and resubmit preemption
= Yes, but it's a different concept of preemption from what you find in LSF, et al. What you really want is probably coming in about 6 months.
- reports and statistics kept on the following:
- machine load (based on the full machine possible)
= Yes
- node level load, uptime, downtime, etc. trendable over time and montiorable for point-in-time
= Yes
- group level usage from the above mentioned accounting system
= Yes
- user level usage from the above mentioned accounting system.
= Yes
- Interactive Jobs
= Yes
- Job Arrays
= Yes