# SGE6.2_Ubuntu_20.04_Installation_guide Example SGE master node installation * Hostname: NODENAME ## Pre-requisites Remove entry 127.0.0.1 (or 127.0.1.1) pointing to the hostname from ```/etc/hosts/``` ```bash 127.0.0.1 localhost #127.0.1.1 NODENAME # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ``` ## SGE master installation Copy the content of this repo in the host, and with ```sudo```, do: ```bash mkdir /opt/sge6-2/ cp ge62u5.tar.gz install_sge.sh /opt/sge6-2/ cd /opt/sge6-2/ chmod +x install_sge.sh ./install_sge.sh ``` We have in the folder the following structure: ```bash . ├── 3rd_party ├── bin ├── catman ├── ckpt ├── doc ├── dtrace ├── examples ├── ge6.2u5 ├── ge62u5.tar.gz ├── include ├── install_execd ├── install_qmaster ├── install_sge.sh ├── inst_sge ├── lib ├── man ├── mpi ├── pvm ├── qmon ├── start_gui_installer ├── util └── utilbin ``` Now we proceed with the SGE master installation: ```bash cd /opt/sge6-2/ ./install_qmaster ``` The questions answers' sequence is the following: ```bash Do you agree with that license? (y/n) [n] >> y Hit to continue >> Do you want to install Grid Engine under an user id other than >root< (y/n) [y] >> n Hit to continue >> If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit to use default [/opt/sge6-2] >> /opt/sge6-2 Hit to continue >> (default: 2) >> 2 Hit to continue >> (default: 2) >> Hit to continue >> Enter cell name [default] >> default Enter new cluster name or hit to use default [p6444] >> cluster_name Hit to continue >> Enter a qmaster spool directory [/opt/sge6-2/default/spool/qmaster] >> Hit to continue >> Are you going to install Windows Execution Hosts? (y/n) [n] >> n Did you install this version with >pkgadd< or did you already verify and set the file permissions of your distribution (enter: y) (y/n) [y] >> y We do not verify file permissions. Hit to continue >> Are all hosts of your cluster in a single DNS domain (y/n) [y] >> y Hit to continue >> Do you want to enable the JMX MBean server (y/n) [y] >> y Enter JAVA_HOME (use "none" when none available) [] >> none Please enter additional JVM arguments (optional, default is [-Xmx256m]) >> -Xmx256m Please enter an unused port number for the JMX MBean server [6446] >> 6446 Enable JMX SSL server authentication (y/n) [y] >> y Enable JMX SSL client authentication (y/n) [y] >> y Enter JMX SSL server keystore path [/var/sgeCA/sge_qmaster/default/private/keystore] >> /var/sgeCA/sge_qmaster/default/private/keystore Enter JMX SSL server keystore pw (at least 6 characters) >> ****** Using the following JMX MBean server settings. libjvm_path >jvm_missing< Additional JVM arguments >-Xmx256m< JMX port >6446< JMX ssl >true< JMX client ssl >true< JMX server keystore >/var/sgeCA/sge_qmaster/default/private/keystore< JMX server keystore pw >******< Do you want to use these data (y/n) [y] >> y Hit to continue >> Hit to continue >> Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> berkeleydb Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >> n Hit to continue >> Default: [/opt/sge6-2/default/spool/spooldb] >> /opt/sge6-2/default/spool/spooldb Hit to continue >> Please enter a range [20000-20100] >> 20000-20100 Using >20000-20100< as gid range. Hit to continue >> Default: [/opt/sge6-2/default/spool] >> /opt/sge6-2/default/spool Default: [none] >> none Do you want to change the configuration parameters (y/n) [n] >> n Hit to continue >> ``` The following error is expected, everything is fine: ```bash util/sgeCA/sge_ca: 1: eval: lx24-amd64=/opt/sge6-2/lib/lx24-amd64:-amd64: not found util/sgeCA/sge_ca: 1749: export: lx24-amd64: bad variable name Error: Cannot create keystore /var/sgeCA/sge_qmaster/default/private/keystore util/sgeCA/sge_ca: 1: eval: lx24-amd64=/opt/sge6-2/lib/lx24-amd64:-amd64: not found util/sgeCA/sge_ca: 1749: export: lx24-amd64: bad variable name ./inst_sge: 1204: cannot create /var/sgeCA/sge_qmaster/default/private/keystore.password: Directory nonexistent chown: usuario inválido: «default» ``` To use the cluster commands (qsub, qstat, etc.), some variables should be included in the environment. This should be included in the ```.bashrc``` of the user that will use the cluster. ```bash source /opt/sge6-2/default/common/settings.sh ``` To start the sgemaster process: ```bash $:/opt/sge6-2/default/common/sgemaster starting sge_qmaster ``` Then, ```qstat -f``` should not show any error: ```bash $ qstat -f $ ``` ### SGE master setup We will add the host as an execute node: ```bash qconf -ae ``` Include here the hostname: ```bash hostname NODENAME load_scaling NONE complex_values NONE user_lists NONE xuser_lists NONE projects NONE xprojects NONE usage_scaling NONE report_variables NONE ``` We create now the host list: ```bash qconf -ahgrp ``` Add the host list ```@allhosts``` with this host: ```bash group_name @allhosts hostlist NODENAME ``` Add a new queue: ```bash qconf -aq ``` Change the values properly: ```bash qname all.q hostlist @allhosts seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 19 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list make rerun FALSE slots 1,[NODENAME=24] tmpdir /tmp shell /bin/bash prolog NONE epilog NONE shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY ``` Add user, in this case, ```NEWUSER```: ```bash qconf -auser ``` ``` name NEWUSER oticket 0 fshare 0 delete_time 0 default_project NONE ``` Add a new user list, with this ```NEWUSER```: ```bash qconf -au NEWUSER NEWUSERS ``` We should modify the queue to include this user list: ```bash qconf -mq all.q ``` ```bash ... notify 00:00:60 owner_list NONE user_lists NEWUSERS xuser_lists NONE subordinate_list NONE ... ``` Launch the execute daemon (```sgeexcd```): ```bash $:/opt/sge6-2/default/common# ./sgeexecd starting sge_execd ``` If the queue is in some weird state, you can run to clean its state: ```bash qmod -c all.q@NODENAME ``` We can restart the services, just in case: ```bash $: cd /opt/sge6-2/default/common# $: ./sgeexecd stop Shutting down Grid Engine execution daemon $: cd /opt/sge6-2/default/common $: ./sgemaster stop shutting down Grid Engine qmaster $: cd /opt/sge6-2/default/common $: ./sgemaster start starting sge_qmaster $: cd /opt/sge6-2/default/common $: ./sgeexecd start starting sge_execd ``` Add the host to the submit hosts' list: ```bash qconf -as NODENAME NODENAME added to submit host list ``` ## Useful commands Cluster overview ```bash qstat -f ``` Overview of the process of some user ```bash qstat -u NEWUSER ``` Get job information ```bash qstat -j JOB_ID ``` Get information from a finished job ```bash qacct -j JOB_ID ``` Cancel job ```bash qdel JOB_ID ``` Hold a queued job ```bash qhold JOB_ID ``` Release a job in hold state ```bash qrls JOB_ID ``` Modify requirements from a job in the queue ```bash #When launched, cpu_slots=2, change it to 1 qalter -l 'cpu_slots=1,h_vmem=infinity,virtual_free=51200M' JOB_ID ```