Hadoop in Practice, Second Edition

Author Name :
Alex Holmes
Price INR :
ISBN 13 : 9789351197423
Release Date :
Dec, 2014
With CD : No
Pages : 510
Format : Paper Back
Publisher : Dreamtech Press
Categories: , Product ID: 2623


It’s always a good time to upgrade your Hadoop skills! Hadoop in Practice, Second Edition provides a collection of 104 tested, instantly useful techniques for analyzing real-time streams, moving data securely, machine learning, managing large-scale clusters, and taming big data using Hadoop. This completely revised edition covers changes and new features in Hadoop core, including MapReduce 2 and YARN. You’ll pick up hands-on best practices for integrating Spark, Kafka, and Impala with Hadoop, and get new and updated techniques for the latest versions of Flume, Sqoop, and Mahout. In short, this is the most practical, up-to-date coverage of Hadoop available.

Part 1 Background and Fundamentals


1 Hadoop in a Heartbeat

1.1 What is Hadoop?

1.2 Getting your hands dirty with MapReduce

1.3 Summary


2 Introductions to Yarn

2.1 YARN overview

2.2 YARN and MapReduce

2.3 YARN applications

2.4 Summary


Part 2 Data Logistics


3 Data Serialization-Working with Text and Beyond

3.1 Understanding inputs and outputs in MapReduce

3.2 Processing common serialization formats

3.3 Big data serialization formats

3.4 Columnar storage

3.5 Custom file formats

3.6 Chapter summary


4 Organizing and Optimizing Data in Hdfs

4.1 Data organization

4.2 Efficient storage with compression

4.3 Chapter summary


5 Moving Data into and out of Hadoop

5.1 Key elements of data movement

5.2 Moving data into Hadoop

5.3 Moving data into Hadoop

5.4 Moving data out of Hadoop

5.5 Chapter summary


Part 3 Big Data Patterns


6 Applying MapReduce Patterns to Big Data

6.1 Joining

6.2 Sorting

6.3 Sampling

6.4 Chapter summary


7 Utilizing Data Structures and Algorithms at Scale

7.1 Modeling data and solving problems with graphs

7.2 Modeling data and solving problems with graphs

7.3 Bloom filters

7.4 HyperLogLog

7.5 Chapter summary


8 Tuning, Debugging and Testing

8.1 Measure, measure, measure

8.2 Tuning MapReduce

8.3 Debugging

8.4 Testing MapReduce jobs

8.5 Chapter summary


Part 4 Beyond MapReduce


9 SQL on Hadoop

9.1 Hive

9.2 Impala

9.3 Spark SQL

9.4 Chapter summary


10 Writing a Yarn Application

10.1 Fundamentals of building a YARN application

10.2 Building a YARN application to collect cluster statistics

10.3 Additional YARN application capabilities

10.4 YARN programming abstractions

10.5 Summary

Alex Holmes works on tough big-data problems. He is a software engineer, author, speaker, and blogger specializing in large-scale Hadoop projects.

Phone: +91-11-43551180
Get In Touch
close slider