Login

OTP sent to

Data Stage

Home > Courses > Data Stage

Data Stage

Datastage

Duration
60 Hours

Course Description


        DataStage is a data integration tool used for designing, developing, and running jobs that move and transform data, often within the context of ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) processes. It facilitates data extraction, cleansing, transformation, and loading from various sources into target systems like data warehouses or applications. DataStage is part of IBM Cloud Pak for Data and offers both on-premises and cloud-based deployment options. 

Course Outline For Data Stage

 Introduction to DataStage

  • Unit objectives
  • What is DataStage?
  • What is IBM InfoSphere DataStage?
  • What is Information Server?
  • Information Server backbone
  • Information Server Web Console
  • DataStage architecture
  • DataStage Clients
  • DataStage Administrator 
  • DataStage Designer  
  • DataStage Director  
  • Developing in DataStage 
  • DataStage project repository 
  • Types of DataStage jobs
  • Design elements of parallel jobs 
  • Job Parallelism 
  • Pipeline parallelism
  • Partition parallelism 
  • Three-node partitioning  
  • Job design versus execution 
  • Configuration file  
  • Example Configuration File
  • Checkpoint

Deployment 

  • Unit objectives 
  • What gets deployed 
  • Deployment: Everything on one machine 
  • Deployment: DataStage on separate machine 
  • Information Server Startup  
  • Starting Information Server on Windows 
  • Verifying that Information Server is running  
  • Web Console Login Window 
  • Exercise 2 Log into the Information Server Web Console 

DataStage Administration

  • Unit objectives 
  • Managing DataStage Users 
  • Information Server Web Console - Administration 
  • Opening the Administration Web Console  
  • User and Group Management  
  • Assigning DataStage roles  
  • DataStage credentials
  • DataStage Administrator
  • Logging onto DataStage Administrator
  • DataStage Administrator Projects Tab 
  • DataStage Administrator General tab
  • Environment variables 
  • Environment reporting variables 
  • DataStage Administrator Permissions tab 
  • Adding users and groups 
  • Specify DataStage role  
  • DataStage Administrator Logs tab 
  • DataStage Administrator Parallel tab 
  • Checkpoint  
  • Exercise 3 Administering DataStage 

Working With Metadata 

  • Unit objectives 
  • Logging onto Designer 
  • Designer work area 
  • DataStage Import / Export
  • Repository window 
  • Import and export 
  • Export procedure
  • Export window
  • Import procedure  
  • Import options 
  • Importing Table Definitions 
  • Source and target metadata
  • Sequential file import procedure 
  • Importing sequential metadata  
  • Sequential import window 
  • Specify format
  • Edit column names and types
  • Extended properties window
  • Table definition in the repository
  • Checkpoint  
  • Exercise 4 Importing and exporting DataStage objects 
  • Exercise 4 Import a table definition 
  • Parameter Sets and Values Files 
  • Parameter sets
  • Creating a parameter set 
  • Defining the parameters 
  • Loading a parameter set into a job 
  • Using parameter set parameters 
  • Running jobs with parameter set parameters 
  • Exercise 5 Creating parallel jobs 

Accessing Sequential Data 

  • Unit objectives 
  • Sequential File Stage 
  • How sequential data is handled 
  • Features of the Sequential File stage 
  • Sequential file format example   
  • Job design with Sequential File stages  
  • Sequential File stage properties 
  • Format tab 
  • Columns tab 
  • Reading sequential files using a file pattern 
  • Multiple readers 
  • Writing to a sequential file
  • Reject Links   
  • Source and target reject links 
  • Setting the Reject Mode property 
  • Copy Stage 
  • Copy stage example
  • Copy stage Mappings
  • Reading and Writing Null Values to a Sequential File 
  • Data Set Stage 
  • Job with a target Data Set stage
  • Data Set Management utility
  • Data and schema displayed
  • Exercise 6 Reading and writing to sequential file, copy stages 

Partitioning and Collecting 

  • Unit objectives 
  • Partitioning
  • Stage partitioning 
  • Collecting
  • Round Robin and Random partitioning 
  • Entire partitioning 
  • Hash partitioning  
  • Modulus partitioning 
  • Auto partitioning   
  • Specifying Stage Partitioning 
  • Partitioning / Collecting link icons
  • Configuration file 
  • Parallel job compilation  
  • Exercise 7 Partitioning and collecting

Combining Data 

  • Unit objectives 
  • Combining data 
  • Lookup, Join, Merge stages 
  • Lookup Stage   
  • Lookup Stage features 

Group Processing Stages  

  • Unit objectives  
  • Group processing stages 
  • Sort Stage 
  • Sorting data  Sorting alternatives 
  • In-Stage sorting 
  • Stable sort illustration  
  • Sort stage Properties tab 
  • Specifying the sort keys  
  • Sort options  
  • Partition sorts 
  • Aggregator Stage  
  • Job with Aggregator stage
  • Aggregation types 
  • Output Mapping tab
  • Output Columns tab  
  • Calculation aggregation type 
  • Grouping methods
  • Method = Hash 
  • Method = Sort 
  • Remove Duplicates Stage 
  • Removing duplicates 
  • Remove Duplicates stage job  
  • Remove Duplicates stage properties 
  • Exercise 9Sort,Aggregatior,Removingduplicates  Stages
  • Unit summary 

Transformer Stage 

  • Unit objectives 
  • Introduction to the Transformer Stage 
  • Job with a Transformer stage 
  • Inside the Transformer stage
  • Transformer stage elements - 1
  • Transformer stage elements - 2
  • Constraints 
  • Constraints example 
  • Defining a constraint 
  • Using the expression editor
  • Otherwise links for data integrity
  • Otherwise link example
  • Specifying the link ordering 
  • Specify the otherwise link constraint
  • Derivations  
  • Derivation targets 
  • Stage variables 
  • Stage variable definitions
  • Building a derivation
  • Defining a derivation 
  • IF THEN ELSE derivation
  • String functions and operators 
  • Null Handling in the Transformer 
  • Null handling  
  • Transformer stage reject link  
  • Loop processing
  • Functions used in loop processing
  • Inside the Transformer stage  
  • Transformer Group Processing   
  • Job results  
  • Transformer logic 
  • Parallel Job Debugger 
  • Setting breakpoints 
  • Editing breakpoints  
  • Running a parallel job in the debugger
  • Exercise 10 Transformer Stage ,Group processing / PX Debugger

Repository Functions  

  • Unit objectives    
  • Searching the Repository  
  • Quick find    
  • Found results 
  • Advanced Find window
  • Advanced Find options 
  • Using the found results 
  • Impact Analysis 
  • Performing an impact analysis 
  • Table definition Locator tab
  • Exercise 11 Connector stages with multiple input links  
  • Unit summary 

Parallel Palette

  • Databases stages
  • Oracle Database
  • Dynamic RDBMS
  • ODBC
  • SQL Server
  • Teradata
  • File Stages
  • Sequential File
  • Dataset
  • Lookup File set
  • Dev/Debug Stages
  • Peek
  • Head
  • Tail
  • Row Generator
  • Column Generator
  • Processing Stages
  • Aggregator
  • Copy
  • Compress
  • Expand
  • Filter
  • Modify
  • Sort
  • Switch
  • Lookup
  • Join
  • Marge
  • Change Capture
  • Change Apply
  • Compare
  • Difference
  • Funnel
  • Remove Duplicate
  • Surrogate Key Generator
  • Pivot stage
  • Transformer
  • Containers
  • Shared Containers
  • Local Containers

Job Control 

  • Unit objectives  
  • What is a job sequence?  
  • Basics for creating a job sequence  
  • Job sequence stages 
  • Job sequence example 
  • Job sequence properties 
  • Job Activity stage properties 
  • Job Activity trigger 
  • Notification Activity stage
  • Wait for File stage
  • Sequencer stage  
  • Nested Condition stage 
  • Loop stages
  • Exception Handler stage 
  • Restart 
  • Enable restart 
  • Disable checkpoint for a Stage 
  • Checkpoint 
  • Exercise 13 Build and run a job sequence  

Datastage CLI(command line integration)

  • Running a job from the command line(procedure with example)
  • Commands for controlling InfoSphere DataStage jobs
  • Commands for administering projects
  • Commands for importing from dsx files
  • Commands for checking and repairing projects
  • Invoking a DataStage job
  • UNIX typical pipe usage
  • Wrapper script for a DataStage job
  • Simple sort DataStage job

Basic xml processing

  • XML Input
  • XML Transformer
  • XML Output
  • Creating the job (XML stage)
  • Examples of transforming XML data (Hierarchical Data stage)
  • Creating XML Files Using Hierarchical Stage in IBM Datastage
  • How to read xml file
  • multiple data sets in one XML input
Enquire Now