Sun Grid Engine Family

This group is a home away from home for Grid Engine and Sun Grid Engine users on grid.org.  The objective of this group is to discuss Sun Grid Engine issues, questions, best practices, and needs in a community setting.

Some problems while installing SGE

Hello everyone,

I'm currently trying to install an SGE (6.2u5) cluster for my research internship, but I'm having some problems. I'm hoping somebody can give me a little help here.

I'm trying to install a very small cluster (1 master, 3 execution hosts). To install the master I followed the procedure provided by sun (http://wikis.sun.com/display/gridengine62u5/How+to+Install+the+Master+Ho...). So far, no problems.
I also managed to install an execution deamon on the same host (http://wikis.sun.com/display/gridengine62u5/How+to+Install+Execution+Hos...) and submit a job to it.

Installing open source grid engine 6.2 (both u4 and u5): qmaster does not start

Hi
I've been trying all day to install sge6.2, I've tried both u4 and u5, on my cluster (16nodes); actually it does not even install the master node! I am using the gui_installer but the problem can be easily reproduced also out of the installer

it basically consists on two failures:

1) the daemon (launched by hand e.g. sgeroot/bin/arch/sge_qmaster or by /etc/init.d/sgemaster.p6444) dies after half a minute (I can see it appears in the running jobs, eg by "top" but after a while it disappears)

2) qping does not find the daemon, even in the first 30 seconds when it is alive:

2nd Call for Chapters - Computational & Data Grids: Principles, Designs, and Applications

---------------------------------------------------------------------------
Please accept our apologies if you receive multiple copies of this Call for Chapters
---------------------------------------------------------------------------

Dear Colleagues,

We would like to introduce you to a forthcoming edited book which focuses on the "Computational and Data Grids: Principles, Designs, and Applications" and to invite you to submit a chapter proposal or a paper before December 30.

Problem with architecture

Hello,

I was installing the software on a fedora system and when I try to add any host to the list, all of them appears me on this state:

Resolvable: Temporary state. After the host is resolved, the GUI installer immediately tries to retrieve the host's architecture, if there are available threads in the resolve pool.

Any suggestions?

Thanks in advance

Ur

Share Tree numbers don't match

Dear All,
We employ an SGE on a Linux Intel Cluster since 2 months or so. Starting last week, the share tree numbers started "misbehaving".

- [Halflife is set to 90days.] Users that ran simulations in the last 2 weeks are not in the share tree anymore (they were 10days ago), although they ran something via qsub and although "qacct -o" says they did.

- The combined usage of leaves does not add up to that of nodes. One default node has a combined usage of 8.3e10 while all the leaves have (1.4e6, 7.5e6 and 1.6e6).

Xlib: connection refused

New to the group, sorry if this question has been asked before. I googled a lot and could not find good answer. I will admit up front that a lot my confusion probably stems from not having a very good understanding of x windows and security/accessibility set up of x11 forwarding. Anyway, here goes, if any of this does not make sense please ask me and I will try to clarify...

Question

Hello,

I have a lot of problems during the instalation of sge on Windows (especially with the parameters of the openssh), I would like to known if it's possible to install the master host on linux, and the "clients" into a windows machines.
I need it for my job, and I'm desperate :)

thanks in advance

Break the application silo!

Next week Univa will be at Oracle Open World demonstrating a policy-enabled, dynamically managed Oracle E-Business Suite private cloud. Why talk about what you can when you can see it?

http://www.univaud.com/hpc/products/oracle-demo.php

Special thanks to our partners Sun, Oracle and Zeus Technology

 

qmaster won't start

Hello All,

I've installed gridengine 6.2u2 from binaries on a linux x86_64 (CentOS 5.3). I've used the GUI installer. The installer lit "processing" for the first host for a few minutes then failed (and failed all other hosts by dependency). Checking the file in install_logs shows:


starting sge_qmaster

sge_qmaster start problem

Reached 5min timeout, while waiting for qmaster PID file.
sge_qmaster daemon didn't start. Please check your
autoinstall configuration file! Installation failed!

newbie: have a Linux + Mac cluster?

hello all,

i was a user of SGE in another institute, and now i'm trying to install in the institute i'm working now. for testing purposes, i have now a linux machine and a mac machine. i would like to set-up the linux box as host master, and mac as an execution host. is that possible? i tried to install both hosts at once with the gui installer as suggested in wikis.sun.com, but the mac machine is always "resolvable".

thanks for any advice,
paula

6.2u3 exclusive host scheduling feature

I'm looking for suggestions on how to best accomplish: request multiple cores on a single system. I was going down the PE path and then noticed the new 6.2u3 feature 'exclusive host scheduling'. It sounds like the new feature would work, but how would I get around having some systems with (4)cores and then some systems with (16)cores? In other words if a job uses the exclusive host feature of (4)cores but ends up on the system with (16) cores, doesn't the feature then freeze out the 12 cores on the (16)core system?

Thanks, P

SGE Scheduler Information

license management question

Hello,
I'm trying to get SunGrid to manage licenses, like the pam_crash example.
Though it seem to track the resource on a host by host basis, instead of on a Q basis.
I have a small setup, 3 hosts, 3 licenses. I'm trying to limit the number to 2, but it takes 2 per host. I tried setting consumable to Jobs as well as yes, but same difference.
Hope this question makes sense, and there's a way to get this done.
Host group?
Cheers,
Ric

SGE execution host installation issue

Hi everyone, I hope you can help me with a problem ive been having.

Im new the the Grid concept and have only just begun to experience it over the last 2 weeks. My task is to set up 3 VMs running ubuntu 9.04, with one functioning as a qmaster and the other two as execution hosts. So far (and with great difficulty) ive managed to install the qmaster on one of the VMs.

Ive completed all the install pre-requisites such as setting up password-less SSH between machines and setting up an NFS. The problem im getting is that when I try to run the install_execd script i get the following error:

SGE Job Status and Return Values

Hi all,

I have developed an application that makes use of a grid environment to expedite execution when possible. I am currently porting to SGE 6.2 from Apple's Xgrid and have two questions which I have not yet been able to answer:

Unable to install execution node

I have installed sge62u2_1 qmaster successfully on the master node. I am now getting below mentioned error while trying to install execution node.

------------------------------- Error --------------------------------------------------------------------------

Checking hostname resolving
---------------------------

Cannot contact qmaster. The command failed:

./bin/lx24-amd64/qconf -sh

The error message was:

error: could not get environment variable SGE_QMASTER_PORT or service "sge_qmaster"

You can fix the problem now or abort the installation procedure.

install_execd

Hi Friends,

I have used sge5.3 previously. In that setup /usr/sge on master node was exported thru nfs on all the execution nodes. Thus the grid binaries were available to whole of the cluster.

Now I am planning to use sge62u2_1. In this setup can I have separate /usr/sge directory for each execution node instead of sharing it from the master node. If yes what mechanism can be used and can I have detailed setp up doc for the same ?

I am getting below mentioned error when trying to run install_execd on one of execution nodes which does not have /usr/sge shared from the master node.

External load information in SGE

Hi

is there any option to assign external values to complex entry in SGE 6.2. i have created a complex and i want to assign external license availability to this complex

my requirement is, no job should allow to run if there is no license available. please give some idea to solve this issue

--
Thanks&Regards
********************
Nobin

qmaster and execd on same host

Hi all,

I'm installing SGE on a OS X 10.5 server and, as these things usually go, I'm encountering some unforeseen and undocumented prompts. I've been scratching my head a little too much, time to ask for help.

I've gone through the complete qmaster installation on the server. I set the location of SGE_ROOT=/Network/Groups/sge62, a directory that is available to all workstations in my network, and SGE_CLUSTER_NAME=MEDICSGRID.

Univa - Interview about Intel Cluster Ready

The second video from Intel. Univa again describes the value the program brings to us as a middleware vendor and our end-users and partners.

http://software.intel.com/en-us/videos/hpc-univa-ud-intel-cluster-ready/

 

How Univa leverages Intel Cluster Ready

This video was filmed in Intel Studios in Santa Clara. In this video we describe how a small company like Univa can take advantage of Intel Cluster Ready. With ICR Univa is able to spend engineering time solving numerous yet-to-be-solved software management issues in a cluster. 

http://www.clusterconnection.com/2009/03/univa-ud-intel%C2%AE-cluster-re...

 

 

Problem during the instalation

I was installing this software for the first time, and I had a problem during the installation. I explain a little the steps that I followed:
First of all, I work with windows 2000 SP4 on an active directory domain, but my first idea was to install on a work group.

1) Install the windows services for unix, I prepare the local users and the local groups.
2) Install OpenSSH, and I tested It correctly.
3) Install Sungrid, here is my problem

Univa wins Honorable Mention at BioIT Best Practices Awards!

Univa wins Honorable Mention at BioIT Best Practices Awards for our work with UniCloud and our partner (customer) Pathwork Dx.

UniCloud: HPC in the cloud info: http://www.univaud.com/hpc/products/unicloud.php

Customer Case Study: http://www.univaud.com/hpc/cs-data-processing-amazon-ec2.php

 

Univa to Present at Sun HPC Consortium in Hamburg, Germany

Univa will present at Sun HPC Consortium in Hamburg, Germany.

http://events-at-sun.com/hpc-hamburg09/agenda.php

4:45pm Sunday, June 21, 2009

Sun Grid Engine Question

I have never used Grid Computing and I've been looking into SGE. I believe i have a grasp on the SGE concept and capabilities, but i was wondering how it works in terms of the executables. For instance, if i have C++ code that i want to be able to run on a grid, do I need to distribute that code on each server in the grid?

Thanks in advance or any assistance.

SGE setup

Hi,

I am a SGE newbie.
I try to set up SGe on ONE of our server (16 CPUs) to enable users to share ressources (mainly CPUs) on this (only ONE) server. This server is dedicated to developement and biological computing.

I want to manage 3 queues : one for "short" computing jobs (less than 2 hours), one for "medium" computing jobs (less than one day) and one for long computing jobs.

I have created 3 queues (set with default parameters exceped for slot and hard time limit).

n/a

More Proof It Can Be Done

The Open HPC Management Interoperability group  has been quietly plugging away on project deliverables around enabling DRM migration. After SC there will numerous postings and updates.

in the meantime....here is a sneak peek: http://gridengine.info/2008/11/12/lsf-to-sge-migration-workshop-at-sc08

 

n/a

'qsub -sync y' in Torque?

Dear All,

I'm using Torque and I'd like to find there an option in qsub equivalent to '-sync y' in SGE.

Thanks,

David

n/a

n/a

n/a

Simple Processor/Core Affinity in SGE/UniCluster

Hi All, 

 Recently we have received some questions on how to support processor/core binding in SGE on UniCluster.  Some other  commercial batch schedulers out there are claiming that they are the 'only batch system that can schedule to processor/cores' - well I'm happy to say that what they are saying is 'complete bunk'.

SGE does not bind jobs to processors or cores out of the box but if you do a little reading and digging  (google is your friend) then you will find the Linux taskset(1) command.

From the man page .....

n/a

Sun Grid Engine 6.2 released

The 6.2 release of the Sun Grid Engine software is now available for
download from the Sun Download Center! (The 6.2 courtesy binaries for
the Grid Engine open source project aren't available yet. It will
probably be another week or two.) You can find more information on my
blog: http://blogs.sun.com/templedf/entry/sun_grid_engine_6_2

tags for Sun Grid Engine Family

Syndicate content