Technologies: Cloudera Trainings
Just Enough Python
Overview
Cloudera University’s Python training course will teach you the key language concepts and programming techniques you need so that you can concentrate on the subjects covered in Cloudera’s developer courses without also having to learn a complex programming language and a new programming paradigm on the fly.
Immersive Training
Through instructor-led discussion, as well as hands-on exercises, participants will learn:
- How to define, assign, and access variables
- Which collection types are commonly used, how they differ, and how to use them
- How to control program flow using conditional statements, looping, iteration, and exception handling
- How to define and use both named and anonymous (Lambda) functions
- How to organize code into separate modules
- How to use important features of standard Python libraries, including mathematical and regular expression support
Audience and prerequisites
Prior knowledge of Hadoop is not required. Since this course is intended for developers who do not yet have the prerequisite skills writing code in Python, basic programming experience in at least one commonly-used programming language (ideally Java, but Ruby, Perl, Scala, C, C++, PHP, or Javascript will suffice) is assumed.
Please note that this course does not teach big data concepts, nor does it cover how to use Cloudera software. Instead, it is meant as a precursor for one of our developer-focused training courses that provide those skills.
Just Enough Scala
Overview
Scala is a programming language that is a superset of Java, blending the object-oriented and the functional programming paradigms. The language is complex and could take a semester or more to master. This class focuses only on the elements that are necessary to be able to program in Cloudera’s training courses.
Immersive Training
Through instructor-led discussion or OnDemand videos, as well as hands-on exercises, participants will learn:
- What Scala is and how it differs from languages such as Java or Python
- Why Scala is a good choice for Spark programming
- How to use key language features such as data types, collections, and flow control
- How to implement functional programming solutions in Scala
- How to work with Scala classes, packages, and libraries Working with libraries
Audience and prerequisites
Basic knowledge of programming concepts such as objects, conditional statements, and looping is required. This course is best suited to students with Java programming experience. Those with experience in another language may prefer the Just Enough Python course. Basic knowledge of Linux is assumed.
Please note that this course does not teach big data concepts, nor does it cover how to use Cloudera software. Instead, it is meant as a precursor for one of our developer-focused training courses that provide those skills.
Apache HBase Training
Overview
Take your knowledge to the next level with Cloudera Training for Apache HBase. Cloudera Educational Services’ three-day training course enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second.
Hands-on Hadoop
Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:
- The use cases and usage occasions for HBase, Hadoop, and RDBMS
- Using the HBase shell to directly manipulate HBase tables
- Designing optimal HBase schemas for efficient data storage and recovery
- How to connect to HBase using the Java API to insert and retrieve data in real time
- Best practices for identifying and resolving performance bottlenecks
Audience and prerequisites
This course is appropriate for developers and administrators who intend to use HBase. Prior experience with databases and data modeling is helpful, but not required. Knowledge of Java is assumed. Prior knowledge of Hadoop is not required, but Cloudera Developer Training for Spark and Hadoop provides an excellent foundation for this course.
Course Contents
Introduction
Introduction to Hadoop and HBase
- Introducing Hadoop
- Core Hadoop Components
- What Is HBase?
- Why Use HBase?
- Strengths of HBase
- HBase in Production
- Weaknesses of HBase
HBase Tables
- HBase Concepts
- HBase Table Fundamentals
- Thinking About Table Design
HBase Shell
- Creating Tables with the HBase Shell
- Working with Tables
- Working with Table Data
HBase Architecture Fundamentals
- HBase Regions
- HBase Cluster Architecture
- HBase and HDFS Data Locality
HBase Schema Design
- General Design Considerations
- Application-Centric Design
- Designing HBase Row Keys
- Other HBase Table Features
Basic Data Access with the HBase API
- Options to Access HBase Data
- Creating and Deleting HBase Tables
- Retrieving Data with Get
- Retrieving Data with Scan
- Inserting and Updating Data
- Deleting Data
More Advanced HBase API Features
- Filtering Scans
- Best Practices
- HBase Coprocessors
HBase Write Path
- HBase Write Path
- Compaction
- Splits
HBase Read Path
- How HBase Reads Data
- Block Caches for Reading
HBase Performance Tuning
- Column Family Considerations
- Schema Design Considerations
- Configuring for Caching
- Memory Considerations
- Dealing with Time Series and Sequential Data
- Pre-Splitting Regions
HBase Administration and Cluster Management
- HBase Daemons
- ZooKeeper Considerations
- HBase High Availability
- Using the HBase Balancer
- Fixing Tables with hbck
- HBase Security
HBase Replication and Backup
- HBase Replication
- HBase Backup
- MapReduce and HBase Clusters
Using Hive and Impala with HBase
- How to Use Hive and Impala to Access HBase
Conclusion
Appendix A: Accessing Data with Python and Thrift
- Thrift Usage
- Working with Tables
- Getting and Putting Data
- Scanning Data
- Deleting Data
- Counters
- Filters
Appendix B: OpenTSDB