NYPD Crime #4 – Spinning Up EMR Cluster (Continued)

This is a direct continuation from the last post. The last post had already gotten way too long, so I’m starting another for organization sake. Wow, that sounded professional for a second.

Spinning It Up

Okay, so I clicked on “Create Cluster” after we finished step 4 of the AWS EMR console.

It looks like the cluster is up. That took about 10 minutes to get to this stage with the two Running statuses showing up. It still says “Waiting” up top… but I’m not really sure what that means. If I click Spark under Connections, sure enough, I see the Spark web interface come up:

What does this page mean? No clue man. That’s for everyone else to know and for me to find out. It looks like it’s all up. Let’s try to SSH into the master node (generally, the master node will have all the applications housed on it as to not interfere with the processing power and RAM of the worker nodes).

SSH

The SSH link gives us a pre-constructed SSH command for us. Awesome. The link for my box looks something like:

ssh -i ~/ec2-user.pem hadoop@ec2-34-232-50-63.compute-1.amazonaws.com

I just have to match up my ec2-user.pem on my local machine and I should be in.

~
                     ,----..                     
           .---.    /   /   \              .---. 
          /. ./|   /   .     :            /. ./| 
      .--'.  ' ;  .   /   ;.  \       .--'.  ' ; 
     /__./ \ : | .   ;   /  ` ;      /__./ \ : | 
 .--'.  '   \' . ;   |  ; \ ; |  .--'.  '   \' . 
/___/ \ |    ' ' |   :  | ; | ' /___/ \ |    ' ' 
;   \  \;      : .   |  ' ' ' : ;   \  \;      : 
 \   ;  `      | '   ;  \; /  |  \   ;  `      | 
  .   \    .\  ;  \   \  ',  /    .   \    .\  ; 
   \   \   ' \ |   ;   :    /      \   \   ' \ | 
    :   '  |--"     \   \ .'        :   '  |--"  
     \   \ ;         `---`           \   \ ;     
      '---"                           '---"

Thank you so much for the ASCII art, Amazon! You’ve certainly sold me as a customer!

Jupyter

Anyways, the first thing that comes to mind is that I tried to open jupyter notebook and, of course, I don’t have it.

sudo pip install jupyter

Done.

Here, we see Jupyter starting on port 8889 because 8888 has already been assigned to Zeppelin, I believe. When I try to go to that localhost link that Jupyter spits out, I don’t get anything. I forgot to port forward 8889. Well, forget would be the wrong word… I never even knew it would be on 8889. Let’s re-ssh in with the port forwarding -L flag

ssh -L 8889:localhost:8889 -i ~/ec2-user.pem hadoop@ec2-34-232-50-63.compute-1.amazonaws.com

Perfect! We’re in!

This would be the time I usually start to write some code… Before we do that, though, we should probably explore the other tool that we’re about to dive into: Spark.

Let’s review Spark in the next post.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s