Tag Archives: Apache Hadoop (Software)

Hands-On Big Data Part 3 – Hadoop on EMR (Amazon Web Services)

Hands-On Big Data Part 3 – Hadoop on EMR (Amazon Web Services)

Ryan Womack, Data Librarian Rutgers University http://ryanwomack.com twitter: @ryandata https://github.com/ryandata/bigdata Screencast version of workshop originally delivered at IASSIST (iassistdata.org) Annual Conference, June 2015. Part 3 steps through accessing Amazon Web Services and launching a Hadoop cluster via Amazon Elastic Map Reduce (EMR).

Read More »

Amazon EMR Masterclass

Amazon EMR Masterclass

Amazon EMR enables fast processing of large structured or unstructured datasets, and in this recorded webinar we’ll show you how to setup an Amazon EMR job flow to analyse application logs, and perform Hive queries against it. Also, we will review best practices around data file organisation on Amazon Simple Storage Service (S3), how clusters can be started from the ...

Read More »