How It Works

HPC provides 2 VM appliances that you can deploy on your chosen platform (e.g. libvirt, public cloud). These appliances provide the tools needed to create, deploy and manage multiple HPC & AI stacks ("clusters"). The Director & Domain sit in a "Domain Network", the director creates controllers.

  • Director: Configures, deploys & manages your clusters
  • Domain: Identity server for user & host management
  • Controller: The service provider for a cluster
  • Login: End-user access to a cluster
  • Compute: Resources for running workloads

Get Started with v1.4

GHPC has both an example and a regular deployment. Read on to find more about each method to determine the best for you.

Example Deployment

GHPC has the option of doing an "Example Deployment" which empowers a quick preview of the GHPC stack. For regular deployments, scroll down past these instructions.

The Example Deployment creates a Director, Domain, Cluster Controller, Cluster Login and 2 Cluster Computes nodes all as VMS entirely within the VM host.

Prerequisites

Ensure your platform is ready for appliance deployment. We recommend having at least 50GB of available disk space in your libvirt root directory and at least 10 cores & 32GB RAM available/unused for VMs.

Method

To create an Example Deployment run our quickstart script which will:

  1. Download the Director & Domain images
  2. Launch the Director & Domain
  3. Preconfigure the Site & Cluster Information
  4. Create VMs for Cluster Controller, Login & Compute Nodes

Run the below script on your VM host as a user with permission to create resources

        
curl -L https://repo.openflighthpc.org/ghpc/v1.4/ghpc-example.sh |/bin/bash
        
        

Once complete the stack can be accessed from the VM host (default password: p4ssw0rd1;):

  • As a site admin to the Director: ssh root@192.168.37.10 and begin modifying diskless images and managing IPA (on Domain)
  • As a cluster admin to the Demo Cluster Controller (via the Demo Cluster Login) to begin managing the SLURM scheduler:
  •                 
    ssh demoadmin@192.168.37.20
        ssh controller
                    
                    
  • As a cluster user to the Demo Cluster Login: ssh demouser@192.168.37.20 and begin using the SLURM scheduler:

The Example Deployment should not be used beyond review/testing

To delete/remove the Example Deployment simply run the script with the --destroy argument, such as:

        
curl -L https://repo.openflighthpc.org/ghpc/v1.4/ghpc-example.sh |/bin/bash -s -- --destroy
        
        

Regular Deployment

The regular deployment of GHPC is intended for creating a diskless cluster stack of bare metal machines. This deployment prepares your VM host with the appliances needed to manage and deploy these resources.

Prerequisites

Ensure your platform is ready for appliance deployment, to be prepared you will need a libvirt virtualisation host with physical connections to your "External" network and "Cluster" network. Make a note of the bridge names for each network on your libvirt host, as these will be required during setup. We recommend having at least 50GB of available disk space in your libvirt root directory and at least 4 cores & 8GB RAM available/unused for VMs.

The "External" network: This gives access to the rest of your site or even to the Internet depending on your organisation's setup. The Director VM and Login nodes are connected to it and this is how they are accessed.

The "GHPC" network: This is the network the director, domain and all cluster systems are connected to. It's separated by subnet (10.178.0.0/19 for admin connections (Director, Domain, Controllers), 10.10.0.0/19 for Clusters) but must be 1 physical network.

Method

  1. Download both the Director & Domain appliance images below to /data/ghpc-source/ on your VM host
  2. The Director image is especially large (~20GB) and therefore it's recommended to download it using a combination of screen and wget to ensure it is less likely to be disrupted. For example screen -dm wget -O /opt/vm/ghpc/director.qcow2 https://repo.openflighthpc.org/ghpc/v1.4/director.qcow2
  3. Launch Director VM
  4.             
    CLUSTER_NET=clusterbr # Host bridge name for your Cluster Network
    EXT_NET=extbr # Host Bridge name for your External Network
    virt-install --name=ghpc-director --boot uefi --ram=4096 --vcpus=2 --import --disk path=/opt/vm/ghpc/director.qcow2,format=qcow2 --os-variant=rocky9 --network bridge=${CLUSTER_NET},model=virtio --network bridge=${EXT_NET},model=virtio --noautoconsole
                
                
  5. Launch Domain (requires at least 2 cores and 4GB RAM)
  6.             
    CLUSTER_NET=clusterbr # Host bridge name for your Cluster Network
    virt-install --name=ghpc-domain --boot uefi --ram=4096 --vcpus=2 --import --disk path=/opt/vm/ghpc/domain.qcow2,format=qcow2 --os-variant=rocky9 --network bridge=${CLUSTER_NET},model=virtio --noautoconsole
                
                
  7. Log into director as root (default password: p4ssw0rd1;). The director is configured to DHCP on the external network, so you'll need to identify the address its been given by your DHCP server. Either refer to your DHCP server or login using the virsh console ghpc-director command to identify the correct IP address.
  8. Read the embedded docs
  9.             
    ghpc-docs