CS 686 Big Data

Using the Bass Cluster

The department has a cluster of dual CPU, dual core machines (for a total of four cores) called bass. To reach these machines, you must first ssh to stargate.cs.usfca.edu and then log into the nodes you want. The hostnames for these machines are bass01 – bass24 (bass01, bass02, bass03, etc. up to 24).

Passwordless ssh

To ease the development process, I highly recommend setting up passwordless ssh. Use ssh-keygen to generate a public and private ssh key pair:

[mmalensek@bass06]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home4/mmalensek/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home4/mmalensek/.ssh/id_rsa
Your public key has been saved in /home4/mmalensek/.ssh/id_rsa.pub
The key fingerprint is:
(etc)

[mmalensek@bass06]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[mmalensek@bass06]$ chmod 700 ~/.ssh
[mmalensek@bass06]$ chmod 600 ~/.ssh/authorized_keys

You can leave all of the prompts blank (including the password). Most people are okay with this approach, but you should know that doing so allows anyone who has your id_rsa file to log in as you. In other words, don’t share your id_rsa file (the public key, id_rsa.pub, is fine to share though). If you don’t want to do things this way, the alternative is using ssh-agent. See Using ssh-agent with ssh by Mark A. Hershberger to get you going.

Since our lab machines have shared home directories, you can now log into any of the machines in the department without a password.

You can also repeat this process on your own laptop and then copy the key over to stargate with:

ssh-copy-id mmalensek@stargate.cs.usfca.edu

That way you can log in from your laptop and bounce whereever you want to go without a password.

Compiling and Running Your Software

Once you’re all set with logging in, you can clone your project repository from github and compile using maven (if you used the project starter files):

[mmalensek@bass06]$ git clone https://github.com/cs686-bigdata/p1-malensek
[mmalensek@bass06]$ cd p1-malensek
[mmalensek@bass06]$ /usr/local/maven/bin/mvn clean package

This will produce a jar in the ‘target’ directory with all the project dependencies bundled. Finally, log into a few nodes and start up your components:

[mmalensek@bass01]$ java -cp ~/p1-malensek/target/dfs-1.0.jar edu.usfca.cs.dfs.Controller
[mmalensek@bass02]$ java -cp ~/p1-malensek/target/dfs-1.0.jar edu.usfca.cs.dfs.StorageNode bass01
[mmalensek@bass03]$ java -cp ~/p1-malensek/target/dfs-1.0.jar edu.usfca.cs.dfs.StorageNode bass01

In this example, my storage nodes take a command line parameter that tells them where the controller is. For this assignment, you can run the client application from your laptop, stargate, etc.