SGE Scheduler Information

Hi Everybody,
I'm currently in the process of gathering some information about various HPC Schedulers out in the wild. My users and primary investors have helped me to compile a list of feature requirements that they need from a scheduler. I've poured over the SGE docs on the main site and have only gotten some preliminary information about the features I'm looking for. So I figured the best way to find out if SGE supports some of this stuff would be to ask those who use it, and thus here I am. If anyone can provide me some insight/details into anything on this list, it would be greatly appreciated.

Thanks
--
Jason

Scheduler Requirements:

- some sort of group based shared accounting based on quotas or priority adjustment
- period based accounting preferred
- accounting for idle nodes that idle as a result of large jobs queued and waiting to run
- groups charged for % of the machine used not % of the active machine
- scheduling of nodes that do not effect the above mentioned group based accounting
- user/group level access control to queues and machines based on attributes
- on demand power capabilities
- cancel and resubmit preemption
- reports and statistics kept on the following:
- machine load (based on the full machine possible)
- node level load, uptime, downtime, etc. trendable over time and montiorable for point-in-time
- group level usage from the above mentioned accounting system
- user level usage from the above mentioned accounting system.
- Interactive Jobs
- Job Arrays

SGE Info

The best place to start is the Beginner's Guide to SGE.  To answer your questions specifically:

- some sort of group based shared accounting based on quotas or priority adjustment
= Yes

- period based accounting preferred

= Yes

- accounting for idle nodes that idle as a result of large jobs queued and waiting to run

= You might be able to pull that out of the accounting database, but it's not something that we report directly.

 

- groups charged for % of the machine used not % of the active machine

= We account both CPU time and wallclock time.

 

- scheduling of nodes that do not effect the above mentioned group based accounting

= Not sure I understand this point.  When you talk about accounting, are you talking about writing logs of usage, or are you talking about tracking usage for use in fair-share scheduling?

 

- user/group level access control to queues and machines based on attributes

= See if Resource Quota Sets are what you want.

 

- on demand power capabilities

= As of 6.2u3, this is supported through the Service Domain Manager module

 

- cancel and resubmit preemption

= Yes, but it's a different concept of preemption from what you find in LSF, et al.  What you really want is probably coming in about 6 months.

 

- reports and statistics kept on the following:

- machine load (based on the full machine possible)

= Yes

 

- node level load, uptime, downtime, etc. trendable over time and montiorable for point-in-time

= Yes

 

- group level usage from the above mentioned accounting system

= Yes

 

- user level usage from the above mentioned accounting system.

= Yes

 

- Interactive Jobs

= Yes

 

- Job Arrays

= Yes