Dear All,
We employ an SGE on a Linux Intel Cluster since 2 months or so. Starting last week, the share tree numbers started "misbehaving".
- [Halflife is set to 90days.] Users that ran simulations in the last 2 weeks are not in the share tree anymore (they were 10days ago), although they ran something via qsub and although "qacct -o" says they did.
- The combined usage of leaves does not add up to that of nodes. One default node has a combined usage of 8.3e10 while all the leaves have (1.4e6, 7.5e6 and 1.6e6).
We have no idea where it comes from. Has something similar be observed before (And what were the fixes?)?
If you would need more information, which would you need?
Many thanks in advance for the help.
Steffen