Just Enough Python

Overview

Cloudera University’s Python training course will teach you the key language concepts and programming techniques you need so that you can concentrate on the subjects covered in Cloudera’s developer courses without also having to learn a complex programming language and a new programming paradigm on the fly. 

Immersive Training

Through instructor-led discussion, as well as hands-on exercises, participants will learn:

  • How to define, assign, and access variables
  • Which collection types are commonly used, how they differ, and how to use them
  • How to control program flow using conditional statements, looping, iteration, and exception handling
  • How to define and use both named and anonymous (Lambda) functions
  • How to organize code into separate modules
  • How to use important features of standard Python libraries, including mathematical and regular expression support 

Audience and prerequisites

Prior knowledge of Hadoop is not required. Since this course is intended for developers who do not yet have the prerequisite skills writing code in Python, basic programming experience in at least one commonly-used programming language (ideally Java, but Ruby, Perl, Scala, C, C++, PHP, or Javascript will suffice) is assumed. 

Please note that this course does not teach big data concepts, nor does it cover how to use Cloudera software. Instead, it is meant as a precursor for one of our developer-focused training courses that provide those skills. 

Just Enough Scala

Overview

Scala is a programming language that is a superset of Java, blending the object-oriented and the functional programming paradigms. The language is complex and could take a semester or more to master. This class focuses only on the elements that are necessary to be able to program in Cloudera’s training courses. 

Immersive Training

Through instructor-led discussion or OnDemand videos, as well as hands-on exercises, participants will learn:

  • What Scala is and how it differs from languages such as Java or Python
  • Why Scala is a good choice for Spark programming
  • How to use key language features such as data types, collections, and flow control
  • How to implement functional programming solutions in Scala
  • How to work with Scala classes, packages, and libraries Working with libraries 

Audience and prerequisites

Basic knowledge of programming concepts such as objects, conditional statements, and looping is required. This course is best suited to students with Java programming experience. Those with experience in another language may prefer the Just Enough Python course. Basic knowledge of Linux is assumed.

Please note that this course does not teach big data concepts, nor does it cover how to use Cloudera software. Instead, it is meant as a precursor for one of our developer-focused training courses that provide those skills. 

Apache HBase Training

Overview

Take your knowledge to the next level with Cloudera Training for Apache HBase. Cloudera Educational Services’ three-day training course enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second.

Hands-on Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

  • The use cases and usage occasions for HBase, Hadoop, and RDBMS
  • Using the HBase shell to directly manipulate HBase tables
  • Designing optimal HBase schemas for efficient data storage and recovery
  • How to connect to HBase using the Java API to insert and retrieve data in real time
  • Best practices for identifying and resolving performance bottlenecks

Audience and prerequisites

This course is appropriate for developers and administrators who intend to use HBase. Prior experience with databases and data modeling is helpful, but not required. Knowledge of Java is assumed. Prior knowledge of Hadoop is not required, but Cloudera Developer Training for Spark and Hadoop provides an excellent foundation for this course.

Course Contents

Introduction

Introduction to Hadoop and HBase

  • Introducing Hadoop
  • Core Hadoop Components
  • What Is HBase?
  • Why Use HBase?
  • Strengths of HBase
  • HBase in Production
  • Weaknesses of HBase

HBase Tables

  • HBase Concepts
  • HBase Table Fundamentals
  • Thinking About Table Design

HBase Shell

  • Creating Tables with the HBase Shell
  • Working with Tables
  • Working with Table Data

HBase Architecture Fundamentals

  • HBase Regions
  • HBase Cluster Architecture
  • HBase and HDFS Data Locality

HBase Schema Design

  • General Design Considerations
  • Application-Centric Design
  • Designing HBase Row Keys
  • Other HBase Table Features

Basic Data Access with the HBase API

  • Options to Access HBase Data
  • Creating and Deleting HBase Tables
  • Retrieving Data with Get
  • Retrieving Data with Scan
  • Inserting and Updating Data
  • Deleting Data

More Advanced HBase API Features

  • Filtering Scans
  • Best Practices
  • HBase Coprocessors

HBase Write Path

  • HBase Write Path
  • Compaction
  • Splits

HBase Read Path

  • How HBase Reads Data
  • Block Caches for Reading

HBase Performance Tuning

  • Column Family Considerations
  • Schema Design Considerations
  • Configuring for Caching
  • Memory Considerations
  • Dealing with Time Series and Sequential Data
  • Pre-Splitting Regions

HBase Administration and Cluster Management

  • HBase Daemons
  • ZooKeeper Considerations
  • HBase High Availability
  • Using the HBase Balancer
  • Fixing Tables with hbck
  • HBase Security

HBase Replication and Backup

  • HBase Replication
  • HBase Backup
  • MapReduce and HBase Clusters

Using Hive and Impala with HBase

  • How to Use Hive and Impala to Access HBase

Conclusion

Appendix A: Accessing Data with Python and Thrift

  • Thrift Usage
  • Working with Tables
  • Getting and Putting Data
  • Scanning Data
  • Deleting Data
  • Counters
  • Filters

Appendix B: OpenTSDB