Short courses

Big Data and Analytics

  • Date: - -
  • Venue: Gold Crest, Mwanza
  • Cost: TZS 3,250,000
  • Contact: 0715677873 | oscar.mashauri@udsm.ac.tz
  • Register

Analysis of Big Data allows analysts, researchers and business users to make better and faster decisions using data that was previously inaccessible or unusable. Big data and analytics have brought an entirely new era of data-driven insights to companies in all industries. Big data analytics is the use of advanced analytic techniques against very large, diverse data sets that include structured, semi-structured and unstructured data, from different sources, and in different sizes.

Businesses can use advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics and natural language processing to gain new insights from previously untapped data sources independently or together with existing enterprise data. Those who skilled in traditional business intelligence (BI) and data warehousing (DW) represent a fantastic pool of resources to help businesses adopt this new generation of technologies.

Introduction

  • What is Big Data
  • Big Data Era
  • How is Big Data used?
  • Big Data ethics
  • Challeges and solutions

Apache Hadoop

  • Hadoop overview
  • Understand Hadoop File System (HDFS)
  • System Architecture & Components
  • Data structure
  • Setup development environment
  • Introding MapReduce

Data storage and Querying

  • HBase & Hive overview
  • Understand HBase
  • Understand Hive
  • Functions and Architecture
  • Processes and Operations

Data Extract and Transform

  • Understand Apache Pig
  • Load data with Apache Pig
  • Extract data with spark
  • Pig & Spark data operations

Resources coordination and Workflows

  • Undestand MapReduce2.0/YARN
  • Working with Oozie
  • Working with Sqoop
  • Introducing ZooKeeper

Big Data processing Frameworks

  • Batch processing vs real-time processing
  • Understand Apache Spark
  • Introduction Apache Flink
  • Apache Storm
  • Apache Kafka overview

Programming language for Data Analysis

  • Understing Python
  • Introduction to Scala
  • R in Data Science

Data Analysis

  • Overview
  • Understanding Data Analysis
  • Setting up the environment
  • HDFS Data formatting and preparations
  • Read and write data
  • optimizing data operations
  • Ingest streaming services
  • Data querying and Vizualizing
  • Real-time data analysis
  • Machine learning using MLIB
  • Integrate Business Intelligence (BI) tools
  • From Data Warehousing to Big Data
  • Data Visualization in deepth

Vendors solution

  • Cloudera
  • Pivotal
  • IBM

Use Case Project

  • Problem definition
  • Prepare data ETL
  • Analysing total,average and highest values
  • Descriptive and Prescriptive analytics
  • Make decision base on results
  • Conclution

Log in