SGE Job Status and Return Values

Hi all,

I have developed an application that makes use of a grid environment to expedite execution when possible. I am currently porting to SGE 6.2 from Apple's Xgrid and have two questions which I have not yet been able to answer:

1) Is there a reliable way to monitor job execution so that I know when a job is finished? Currently, after submitting jobs to the grid, my application polls the sent jobs one by one using qstat -j . While the jobs are pending or running, I get some output but when a job has finished, all I get is an error message that the job id does not exist. What I'm looking for is a command that will tell me, based on job id, what's the status of the job, whether pending, running or finished or anything else. In Xgrid, this is accomplished with xgrid -job attributes -id .

2) Is there a way, after submitting a binary job in the grid, to get it's return value (aka exit code)? My application currently relies on this to determine if the execution of a job was successful and branches accordingly. In case it's not possible, what would be best strategy to determine job success or failure? I've tried monitoring the content of the jobname.o and jobname.e files that are produced but that leads to inconsistent results: some jobs finish successfully while still having stderr output.

Thanks and have a good weekend!

Burt

SGE Job Status through Drmaa

hi All,
Even with the Drmaa API I am not able to get the status of a job once it gets executed completely. When the job finishes, an error message that the job id does not exist gets displayed.
Any suggestions in this regard would be highly appreciable.
Thanks

Re: Enter the DRMAA

Daniel,

Thanks, this would be perfect if my application was written in C or Java. It's neither. The app in a collection of PHP scripts that are using OS calls with qsub and qstat to submit and monitor jobs. I'm afraid there's no way I can rewrite my app at this time.

Any other suggestions?

Enter the DRMAA

I think what you're looking for is DRMAA.  It's an API for job submission and management, and it answers both your questions.  See here:

http://gridengine.sunsource.net/howto/drmaa.html

or here:

http://gridengine.sunsource.net/howto/drmaa_java.html

Daniel