Section1:BigData Analytics introduction
-
BigDataoverview
-
CharacteristicsofApacheSpark
-
UsersandUseCasesofApacheSpark
-
JobExecutionFlowandSparkExecution
-
CompletePictureofApacheSpark
-
WhySparkwithPython
-
ApachesparkArchitecture
-
BigData Analyticsinindustry
Section2:Using Hadoop’sCore:HDFS and Map Reduce
-
HDFS:Whatitis,andhowitworks
-
MapReduce:Whatitis,andhowitworks
-
HowMapReducedistributesprocessing
-
HDFScommands
Section3:Spark Databox Cloud Lab
-
HowtoaccessSpark Databoxcloudlab?
-
StepbyStepinstructiontoaccesscloud BigdataLab.
Section4:Dataanalyticslifecycle
-
DataDiscovery
-
DataPreparation
-
DataModelPlanning
-
DataModelBuilding
-
DataInsights
Section5:python3.0(CrashCourse)
-
EnvironmentSetup
-
DecisionMaking
-
Loopsand Number
-
Strings
-
Lists
-
Tuples
-
Dictionary
-
Date andTime
-
Regex
-
Functions
-
Modules
-
FilesI/O
-
Exceptions
-
Multi-Threading
-
Set
-
LamdaFunction
Section6:PySpark
-
IntroductiontoSparkContext
-
EnvironmentSetup
-
SparkRDD
-
sparkCaching
-
CommonTransformationsandActions
-
SparkFunctions
-
Key-ValuePairs
-
AggregateFunctions
-
WorkingwithAggregateFunctions
-
JoinsinSpark
-
Spark DataFrame
Section7:AdvancedSparkProgramming
-
Spark SharedVariables
-
CustomAccumulator
-
SparkandFaultTolerance
-
Broadcastvariables
-
NumericRDDOperations
-
Per-PartitionOperations
Section8:RunningSparkjobsonCluster
-
SparkRuntimeArchitecture
-
SparkDriver
-
Executors
-
ClusterManagers
-
ConnectingSparkToDifferentFileSystemandPerformETL,(ExtrationTransformationandLoading)
-
ConnectingSparkToDataBasesandPerformETL(ExtrationTransformationandLoading)
-
Spark StorageLevel
-
SparkSerializers
-
Spark-SubmitandClusterExplanation
-
PerformanceTuning
Section9:PySparkStreamingatScale
-
IntroductiontoSpark Streaming
-
PySparkStreamingwithApacheKafka
-
Real-worldPracticalusecases
-
OperationsOnStreamingDataframesandDatasets
-
WindowOperations