Friday, August 7, 2009
Informatica PowerCenter 8.x Architecture and Connectivity
Informatica PowerCenter Architecture & Connectivity
PowerCenter 7.x Architecture:
--> The management of the Repository is done through the Repository Server. The Repository server can be managed and administered through Repository Server Administrative Console interface. All the client tools and Informatica Server communicate with the Repository through Repository Server.
Repository Server Administration Console is a client tool used to create/maintain repositories and configure Repository Servers. All the tasks like starting a repository, backup/restore and upgrade are performed using this tool.
One Repository Server can manage multiple repositories. For each repository there is one Repository Agent. Repository Agent is a multi-threaded process which inserts, updates and retrieves metadata from Informatica Repository
PowerCenter 8.x Architecture:
--> In PowerCenter 8.x, the client-server architecture is enhanced to Service Oriented architecture. So Repository Server becomes Repository Service and Informatica Server becomes Integration Service. These are the Application Services. Other type of service is Core service that manages the PowerCenter environment.
Repository Server Admin Console is transformed to web-based Administration Console. Administration Console manages the entire PowerCenter environment.
Repository Service manages communication of all components (client tools and Integration Service, Grid) with the Repository. One Repository Service can manage only one Repository but one Repository Server managing multiple repositories in 7.x.
Repository Service processes connect natively to the Repository. All the client tools and other services access Repository via Repository services over TCP/IP.
The integration service accesses Sources and targets natively or via ODBC drivers (Data Direct).
Differences between 7.x and 8.x
1) 7. X
Two server components - Repository Server (for managing communication of different Informatica client and server tools with Metadata repository) and Informatica Server (ETL engine of Informatica suite)
8. X
Architecture enhanced to Service-oriented architecture Application Services and Core Services perform ETL and manage the environment Repository Service manages connectivity with Metadata rep Integration Service is the ETL engine.
2) 7.x
Versioning is in-built in the suite. In addition, there are built-in configuration management tools in order to migrate components between environments such as Dev, Test and Prod supporting a lights out migration
8. X Enhanced Versioned Objects Explicit checkout of objects possible from this version onwards.
3) 7.x
Metadata repository can have data in English only
8. X
Multilingual support for Metadata
4) 7.x
Repository Server Administrator Console is a new client tool for maintaining and managing repository and the environment.
8. X
Administration Console -browser-based utility that enables o view domain properties & performs basic domain administration tasks. Integration Services can be tied to Repository via Administration Console. Domain -collection of nodes and services, is primary unit of administration.
5) 7. X
PMREPAGENT Command-line utility now available for Repository maintenance
PMREP used for User Management PMCMD is command-line execution program for Informatica workflows
8. X
PMCMD allows you to specify the Integration Service name and domain name.
INFACMD to administer PowerCenter domains and services
INFASETUP administer PowerCenter domain and node properties
PMREP has several new commands ported from PMREPAGENT
PowerCenter 8.x architecture terminology
SOA (Service Oriented Architecture)
An application architecture in which all functions, or services, invokes software interfaces that perform business processes.
Service
A task performed by a service provider to achieve desired end results for a service consumer.
All PowerCenter services run as services on a node.
Types
Application services
Core services
Core Services
Configuration Service: Manages service and node configuration metadata.
Domain Service: Manages other services on the current node or in the domain.
Service Process Controller: Controls application services on behalf of the Domain Service.
Gateway Service: Directs service requests to the appropriate service and node.
Log Service: Accumulates log events from the domain, core and application services, workflows, and sessions.
Licensing Service: Manages licensing for the domain.
Authentication Service: Authenticates domain users who log in to the Administration Console.
Admin Service: Provide services to the Administration Console
Application services
Repository services
Integration services
SAP BW services
Web Services Hub
Service Processes
The runtime instance of a service running on a node.
Each service process is a process on Windows and a daemon on UNIX. For example, each Integration Service process is represented as pmserver.exe on Windows, and each Repository Service process is represented as pmrepagent.exe on Windows.
Domain
Collection of nodes and services. Primary unit of administration.
Node
The logical representation of a machine in a domain. Each node runs a service manager.
One node in a domain is a gateway node.
Gateway node
Routes service requests from PowerCenter Client to available nodes.
One node in domain serves as a gateway for domain.
All core services run on gateway node.
If gateway node is unavailable, domain cannot accept service requests.
With the High Availability option, multiple nodes can serve as a gateway, but only one node is the gateway at a time.
PowerCenter 8.x New Features
--> High availability
--> Grid
--> Pushdown Optimization
--> Team based development changes
--> Data profiling changes
--> Partitioning changes
High availability
The term high availability refers to the elimination of single points of failure in PowerCenter domains. When you configure high availability for a domain, the domain can continue running despite temporary network or hardware failures.
Retry of sessions and fault tolerance for automatic failover Improved recovery so that a session can be handed off to a new node and can be automatically restarted
Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. This is not the same as fault tolerant, in which redundant components are designed for continuous processing without skipping a heartbeat. High availability also refers to being able to service a component in the system without shutting down the entire operation.
Failover
• The migration of a service process or task to another node when the node running the service process becomes unavailable.
Recovery
• The automatic or manual completion of tasks after an application service is interrupted.
Resilience
• The ability for services to tolerate transient failures, such as loss of connectivity to the database or network failures.
------------------------
Grid
Grid computing uses the resources of many separate computers connected by a network (usually the internet) to solve large-scale computation problems.
A grid is group of nodes in a domain.
We can create heterogeneous grids (both UNIX and Windows machines in the same grid) Distributes to available nodes.
Session on grid: Distributes session partitions to different nodes
Workflow on grid: Distributes workflow tasks to different nodes
Grid is a service just like the Integration Service.
The Load Balancer is the component of the Integration Service that dispatches the different tasks to the nodes or the different threads to the DTM processes running on the nodes in the grid. The Load Balancer distributes tasks or threads based on node and resource availability.
----------------
Pushdown Optimization
A session option that causes the Integration Service to push some transformation logic to the source and/or target database.
You can choose source-side or target-side optimization, or both
Benefits:
• Can increase session performance
• Maintain metadata and lineage in PowerCenter repository
• Reduces movement of data (when source and target are on the same database)
Define “Full Optimization”
$$PushdownConfig mapping parameter:
The $$PushdownConfig mapping parameter lets you run the same session using the different types of pushdown optimization.
For example, you might want to use full pushdown optimization during the day, but use no pushdown optimization from midnight until 2 a.m. when the database is scheduled for routine maintenance. OR, you might want to use partial pushdown optimization during the peak hours of the day, but use full pushdown optimization from midnight until 2 a.m. when activity is low.
When sources and targets are on the same database, pushdown optimization avoids having to pull the data into PowerCenter and then pushing it back out again. This can be useful when you move data from a staging area to a data warehouse that exist on the same database.
Partial Pushdown optimization
– One or more transformations can be processed in source/target database
Full pushdown optimization
Source and target are in the same RDBMS
All transformations can be processed in database
Configuring Pushdown Optimization
Configure in Performance settings on the Properties tab in the session properties
Pushdown optimization: None, To Source, To Target, Full, $$PushdownConfig
Integration Service executes SQL against the database instead of processing the transformation logic itself Integration Service analyzes the mapping and writes one or more SQL statements based on the mapping transformation logic
When you use pushdown optimization, the Integration Service converts the expression in the transformation or in the workflow link by determining equivalent operators, variables, and functions in the database.
If there is no equivalent operator, variable, or function, the Integration Service processes the transformation logic.
For example, the Integration Service translates the aggregate function, STDDEV() to STDDEV_SAMP() on Teradata and STDEV() on Microsoft SQL Server. However, no
database supports the aggregate function, FIRST(), so the Integration Service processes any transformation that uses the FIRST() function.
You can preview which transformations are pushed to the database
You can preview the revised SQL statement for the source or target, but you cannot modify it.
Neither the revised SQL statements nor mapping changes are saved to the repository.
Pushdown OptimizationPreview from Session—Mapping Tab
Pushdown optimization supported transformations
To Source
Aggregator, Expression, Filter, Joiner, Lookup, Sorter, Union
To Target
Expression, Lookup, Target definition
Unconnected transformations do not get pushed down
Team Based Development changes
Versioning:
• Can explicitly check out objects—opening an object no longer checks it out automatically
• Can view older object versions in the workspace
Deployment:
• Can assign owners and groups to folders and deployment groups
• Can generate deployment control file (XML) to deploy folders and deployment groups with pmrep
Partitioning changes
Dynamic partitioning
• Integration Service determines the number of partitions to create at run time
• Integration Service scales the number of session partitions based on factors such as source database partitions or the number of nodes in a grid
• Useful if volume of data increases over time, or you add more CPUs
Developer New Features
User-Defined Functions
• Can create user-defined functions to use in transformations and workflow tasks
• Build complex expressions and reuse them
• Available to other repository users
• Include the functions in expressions or other user-defined functions
• Include any valid function except aggregate functions
• Two types:
• Public: Callable from any transformation expression
• Private: Only callable from another user-defined function
Creating User-Defined Functions
Choose Tools > User-Defined Functions
User-Defined Function Prefix : :UDF.TRIM(NAME)
Custom Functions
• Function created outside of PowerCenter using the Custom Functions API (shipped with PowerCenter)
• API uses C programming language
• Share custom functions with others
• Add to a repository as a plug-in
• Use in mapping and workflow expressions like native functions
User-Defined v. Custom Functions
User-Defined
• Created in the Designer
• Repository object
• Use in mapping or workflow expressions
• Same name
• Available to all folders
Custom
• Created outside the Client
• Repository plug-in
• Use in mapping or workflow expressions
• Unique name
• Available to all folders
User-defined functions can have the same name as existing, built-in functions. Custom function names must be unique. That is, they cannot be the same as built-in functions. User-defined functions can have the same name because the Client includes UDF: before the function name to make it unique.
Usability Enhancements
Propagate port description
• In the Designer, you can edit a port description and propagate the description to other transformations in the mapping
Unicode Repository
• Store metadata from multiple languages in the same repository
• Choose UTF-8 as the repository code page
• Repository database code page must be UTF-8
Command Line Programs
infacmd
• New program to administer application services and service processes.
• Perform tasks such as enabling and disabling services and purging log events.
pmrepagent
• Discontinued. Use replacement commands in pmrep.
pmrep
• Includes former pmrepagent commands. Also includes new syntax to connect to a domain.
pmcmd
• Updated to support new Integration Service functionality.
Example infacmd Commands
• AddLicense
• EnableService
• GetLog
• GetServiceStatus
• RemoveNode
• UpdateNode
Subscribe to:
Post Comments (Atom)
Hi prasad this material is very good what you posted about informatica 7.x and 8.x.Really thanks to you.Thanks for ever.
ReplyDeleteRavi.
Great Prasad. It was wonderful content. This is what I was actually looking for. You did a great job.
ReplyDeleteThis is nice article
ReplyDeleteInformatica online training
Really Good blog post.provided a helpful information.I hope that you will post more updates like this Informatica Online Course
ReplyDeleteI wish to indicate because of you only to bail me out of this specific trouble. As a consequence of checking through the net and meeting systems that were not beneficial, I thought my life was finished.
ReplyDeletehealth and safety course in chennai
It’s always so sweet and also full of a lot of fun for me personally and my office colleagues to search you blog a minimum of thrice in a week to see the new guidance you have got.
ReplyDeleteiosh course in chennai
Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
ReplyDeleteData Science Training in Chennai
Robotic Process Automation Training in Chennai
Cloud Computing Training in Chennai
Data Warehousing Training in Chennai
Dev Ops Training in Chennai
Great article dude.You made a clear explanation for all the features andthe comparisons you made was awesome.IT such a mind blowing article where i came to know lots of new contents.
ReplyDeletemobile service centre
mobile service center in chennai
mobile service center chennai
oneplus service center chennai
oneplus service center in chennai
oneplus service centre chennai
oneplus service centre
Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
ReplyDeleteBest PHP Training Institute in Chennai|PHP Course in chennai
Best .Net Training Institute in Chennai
Powerbi Training in Chennai
R Programming Training in Chennai
Javascript Training in Chennai
I have to voice my passion for your kindness giving support to those people that should have guidance on this important matter.
ReplyDeleteAI training chennai | AI training class chennai
Cloud computing training | cloud computing class chennai
Thank you so much for providing information and details about Informatica and some of its unusual concepts.
ReplyDeleteInformatica Read Rest API