Thursday, August 13, 2009

SQL - VIEWS

VIEWS

VIEW is a logical representation of another table or combination of tables. A view derives its data from the tables on which it is based. These tables are called base tables. Base tables might in turn be actual tables or might be views themselves. All operations performed on a view actually affect the base table of the view.


A view can be thought of as a stored query or a virtual table. You can use views in most places where a table can be used.
For example, the employees table has several columns and numerous rows of information. If you want users to see only five of these columns or only specific rows, then you can create a view of that table for other users to access.

You can query views, and with some restrictions you can update, insert into, and delete from views. All operations performed on a view actually affect data in some base table of the view and are subject to the integrity constraints and triggers of the base tables.

You cannot explicitly define triggers on views, but you can define them for the underlying base tables referenced by the view. Oracle does support definition of logical constraints on views.

Unlike a table, a view is not allocated any storage space, nor does a view actually contain data. Rather, a view is defined by a query that extracts or derives data from the tables that the view references. These tables are called base tables. Base tables can in turn be actual tables or can be views themselves (including materialized views). Because a view is based on other objects, a view requires no storage other than storage for the definition of the view (the stored query) in the data dictionary.

USES OF VIEWS


1) Provide an additional level of table security by restricting access to a predetermined set of rows or columns of a table.

2) Hide data complexity
For example, a single view can be defined with a join, which is a collection of related columns or rows in multiple tables. However, the view hides the fact that this information actually originates from several tables.

3) Simplify statements for the user
For example, views allow users to select information from multiple tables without actually knowing how to perform a join.


4) Present the data in a different perspective from that of the base table
For example, the columns of a view can be renamed without affecting the tables on which the view is based.


5) Save complex queries
For example, a query can perform extensive calculations with table information. By saving this query as a view, you can perform the calculations each time the view is queried.


Creating view


Syntax:


CREATE [OR REPLACE] [FORCE] [NOFORCE] VIEW view_nameAS subquery[WITH CHECK OPTION [CONSTRAINT constraint]]


FORCE : creates the view regardless of whether the view's base tables exist or the owner of the schema containing the view has privileges on them.


NOFORCE : creates the view only if the base tables exist and the owner of the schema containing the view has privileges on them. The default is NOFORCE.


PREREQUISITES:
To create a view in your own schema, you must have CREATE VIEW system privilege. To create a view in another user's schema, you must have CREATE ANY VIEW system privilege.

Example:


To create a view CSStudent (computer science students) from the Student table, enter:

SQL> CREATE OR REPLACE VIEW CSStudent AS SELECT * FROM Student WHERE Major = 'CS';

Eg2:
The following statement creates a view on a subset of data in the emp table:

CREATE VIEW sales_staff AS SELECT empno, ename, deptno FROM emp WHERE deptno = 10 WITH CHECK OPTION CONSTRAINT sales_staff_cnst;

Creating Views with check option


The query that defines the sales_staff view references only rows in department 10. Furthermore, the CHECK OPTION creates the view with the constraint (named sales_staff_cnst) that INSERT and UPDATE statements issued against the view cannot result in rows that the view cannot select. For example, the following INSERT statement successfully inserts a row into the emp table by means of the sales_staff view, which contains all rows with department number 10:

INSERT INTO sales_staff VALUES (7584, 'OSTER', 10);

However, the following INSERT statement returns an error because it attempts to insert a row for department number 30, which cannot be selected using the sales_staff view:

INSERT INTO sales_staff VALUES (7591, 'WILLIAMS', 30);

The view could optionally have been constructed specifying the WITH READ ONLY clause, which prevents any updates, inserts, or deletes from being done to the base table through the view. If no WITH clause is specified, the view, with some restrictions, is inherently updatable.

Different types of VIEWS


1) Simple Views

2) Complex Views

Simple Views:

The view which is created by selecting only one table.A Simple view does not contain functions.You can always perform DML operations through Simple views.

2)Complex Views:

The view which is created by selecting more than one tables. A Complex view contains functions.You cannot always perform DML through Complex views.

Rules for performing DML Operations on a view:


1) You can perform DML operations on Simple views.

2) You cannot always perform DML through Complex views.

3) You cannot delete a row if the view contains the following:

Group functions such as SUM, MIN, MAX, AVG and ......A GROUP BY clause.The DISTINCT keyword.

4) You cannot update data in a view if it contains,
Group functions such as SUM, MIN, MAX, AVG and ......A GROUP BY clause.The DISTINCT keyword.Columns defined by expressions such as SALARY*1.2The ROWNUM pseudo column.

5) You cannot insert data in a view if it contains,
Group functions such as SUM, MIN, MAX, AVG and ......A GROUP BY clause.The DISTINCT keyword.Columns defined by expressions such as SALARY*1.2The ROWNUM pseudo column.There are NOT NULL columns in the base tables that are not selected by the view.


Updating a Join View

An updatable join view (also referred to as a modifiable join view) is a view that contains more than one table in the top-level FROM clause of the SELECT statement, and is not restricted by the WITH READ ONLY clause.

The rules for updatable join views are as follows.

1) Any INSERT, UPDATE, or DELETE operation on a join view can modify only one underlying base table at a time.

2) All updatable columns of a join view must map to columns of a key-preserved table. If the view is defined with the WITH CHECK OPTION clause, then all join columns and all columns of repeated tables are not updatable.

3) Rows from a join view can be deleted as long as there is exactly one key-preserved table in the join.

4) An INSERT statement must not explicitly or implicitly refer to the columns of a non-key-preserved table.

PL/SQL - TRIGGERS



TRIGGERS

A database trigger is a stored procedure that is fired when an INSERT, UPDATE, or DELETE statements is issued against the associate table. A trigger defines an action the database should take when some database related event occurs. A trigger can include SQL and PL/SQL statements to executor as a unit and can invoke other stored procedures.

USES OF TRIGGERS

1) Prevent invalid transactions

2) Enforce complex security authorizations

3) Enforce complex business rules

4) Provide auditing

5) Maintain synchronous table replicates

6) Modify table data when DML statements are issued against views

7) Publish information about database events, user events, and SQL statements to subscribing applications

PARTS OF A TRIGGER

A database trigger has three parts:

1) Triggering event or Statement.

2) Trigger constraint (Optional)

3) Trigger action

1) Triggering Event or Statement:

A triggering event or statement is the SQL statement that causes a trigger to be fired. Atriggering event can be an INSERT, UPDATE, or DELETE statement for a specific table.

2) Trigger Constraint or Restriction

A trigger restriction specifies a Boolean (logical) expression that must be TRUE for thetrigger to fire. The trigger action is not executed if the trigger restriction evaluates to FALSE. Atrigger restriction is an option available for triggers that are fired for each row. Its functionis to conditionally control the execution of a trigger. A trigger restriction is specified using aWHEN clause. It is an optional part of trigger.

3) Trigger Action

A trigger action is the procedure (PL/SQL block) that contains the SQL statements andPL/SQL code to be executed when a triggering statement is issued and the trigger restriction evaluates to TRUE.

Types of Triggers

A trigger’s type is defined by the type of triggering transaction and by the level atwhich the trigger is executed. Oracle has the following types of triggers depending on thedifferent applications:

1)Row Level Triggers

2)Statement Level Triggers

3)Before Triggers

4) After Triggers

Row Level Triggers

Row level triggers execute once for each row in a transaction. The commands of row leveltriggers are executed on all rows that are affected by the command that enables thetrigger. For example, if an UPDATE statement updates multiple rows of a table, a row triggeris fired once for each row affected by the UPDATE statement. If the triggering statementaffects no rows, the trigger is not executed at all. Row level triggers are created using theFOR EACH ROW clause in the CREATE TRIGGER command.

Statement Level Triggers

Statement level triggers are triggered only once for each transaction. Statement level trigger are thedefault type of trigger created via the CREATE TRIGGER command. Consider a case where therequirement is to prevent the DELETE operation during Sunday. For this whenever DELETEstatement deletes records, there must be PL/SQL block that will be fired only once by DELETEstatement to check that day must not be Sunday by referencing system date.

Before and After Trigger

Since triggers are executed by events, they may be set to occur immediately before or after those events. When a trigger is defined, you can specify whether the trigger must occur before or after the triggering event i.e. INSERT, UPDATE, or DELETE commands. BEFORE triggers are used, the trigger action should determine whether or not the triggering statement should be allowed to complete. By
using a BEFORE trigger, you can eliminate unnecessary processing of the triggering statement. For example: To prevent deletion on Sunday,Statement level before trigger is used on DELETEstatement. BEFORE triggers are used to derive specific column values before completing a triggering INSERT Or UPDATE statement. AFTER trigger executes the trigger action after the triggering statement is executed. AFTER triggers are used, when the triggering statement is set to completebefore executing the trigger action. For example: To perform cascade delete operation, it means that user deletes the record from one table, the corresponding records in other tables are also deleted automatically by a trigger which fired after the execution of delete statement issued by the user.

Statement Level

BEFORE option

Oracle fires the trigger only once, before executing the triggering statement.

AFTER option

Oracle fires the trigger only once, after executing the trigger statement.

Row Level

BEFORE option

Oracle fires the trigger before modifying each row affected by the triggering statement.

AFTER option

Oracle fires the trigger after modifying each row affected by the triggering statement.

Instead of triggers

Instead of triggers was first featured in Oracle 8. This was something new in the world of triggers. These are triggers that are defined on a view rather than on a table. Can use INSTEAD OF trigger to tell Oracle what to do instead of performing the actions that invoked the trigger.

Syntax of creating trigger

CREATE OR REPLACE TRIGGER trigger_name
BEFOREAFTER
DELETEINSERTUPDATE [OF column_name, ]
ON table_name
[REFERENCING {OLD AS old, NEW AS new}]
[FOR EACH ROW [WHEN condition]
DECLARE
variable declaration;
Constant declaration;
BEGIN
PL/SQL subprogram body;
EXCEPTON exception PL/SQL BLOCK;
END;



REFERENCING

specified correlation names. The user could use the Correlation names in the PL/SQL block and WHEN clause of a row trigger to refer specifically to old and new values of the current row. The default correlation names are OLD and NEW. If the row is associated with a table named OLD or NEW, this clause can be used to specify different names to avoid confusion between the table name and the
correlation name. WHEN Clausespecifies the trigger restriction. This condition has to be satisfied to fire the trigger. This condition can be specified for the ROW TRIGGER.

Example

To Create a trigger on the emp table, which store the empno and operation in tableauditor for each operation i.e. Insert, Update and Delete.

CREATE OR REPLACE TRIGGER EMP_AUDIT
AFTER INSERT OR UPDATE OR DELETE ON EMP FOR EACH ROW
BEGIN
IF INSERTING THEN
INSERT INTO AUDITOR VALUES(:NEW.EMPNO,'INSERT');

ELSIF UPDATING THEN
INSERT INTO AUDITOR VALUES(:NEW.EMPNO,'UPDATE');

ELSIF DELETING THEN
INSERT INTO AUDITOR VALUES(:OLD.EMPNO,'DELETE');

END IF;

END;

Note: The same functionality can be achieved with the BEFORE type trigger also.

Enabling and Disabling Triggers

When a trigger is created it is automatically enabled and is triggered whenever the triggering command and the execution command is true. An enabled trigger executes the trigger body if the triggering statement is issued. To disable the execution use the ALTER TRIGGER Command with the DISABLE clause. A disable trigger does not execute the trigger body even if the triggeringstatement is issued. We can disable / enable the trigger by the following

syntax:ALTER TRIGGER DISABLE / ENABLE

Dropping Triggers


Triggers may be dropped via the drop trigger command. In order to drop a trigger, you musteither own the trigger or have the DROP ANY TRIGGER system privilege.

Syntax: DROP TRIGGER trigger_name;


Example:DROP TRIGGER rec_check;

Database Trigger V/s Procedure

Triggers don't accept parameters but procedures can accept parameters.
A trigger is automatically executed without any action required by the user. A store procedure on the other hand needs to be explicitly invoked.

Saturday, August 8, 2009

SQL - JOINS



SQL - JOINS:

1) Inner join :

An Inner join essentially combines the records from two tables based on a give join – predicate. An Inner join does require each record in the two joining tables to have a matching column. It returns all rows from both the tables where there is a match.

SQL specifies two different syntactical ways to express joins.

The first called explicit join notation uses the keyword JOIN and second one is implicit join notation.

The implicit join notation lists the tables for joining in the FROM clause of a SELECT statement using commas to separate them.


Select * from emp, dept where emp.deptno = dept.deptno; -- Implicit Join

Select * from emp INNER JOIN dept ON emp.depno = dept.deptno; -- Explicit Join



2) Types of Inner joins

Equi – join
Non equi join
Natural join
Cartesian join



3) Equi join

An equi join is a join statement that uses an equivalence operation to match rows from different tables.

Select * from emp, dept where emp.deptno = dept. deptno and job = ‘CLERK’;


4) Non equi join

A non equi join is a join statement that uses an unequal operation ( i.e <>, >, <=, BETWEEN etc) to match rows from different tables.

Select e.ename, e.sal, s.grade from emp e, salgrade s where e.sal between s.losal and s.hisal;


5) Natural join

A natural join is a join statement that compares the common columns of both tables with each other. One should check whether common columns exist in both tables before doing a natural join.

Select * from emp NATURAL JOIN dept;


6) Cartesian join

A Cartesian join is a join of every row of one table to every row of another table. This normally happens when no matching join columns are specified.


Select * from emp, dept;


7) Outer join

An outer join returns non matched rows also from the table with the outer join operator (+). Missing values are filled with NULL values.


(+) sign indicates that in case the column contains a null it should also be included.


8) Left outer join

It preserves the unmatched rows from the first (left) table, joining them with a NULL row in the shape of the second (right) table.

Select e.ename, d.dname from emp e, dept d where e.deptno = d.deptno (+); (LOJ)

Select e.ename, d.dname from emp e LEFT OUTER JOIN dept d ON e.deptno = d.deptno;


9) Right outer join


It preserves the unmatched rows from second(right) table, joining them with a NULL row in the shape of the first (righ) table.

Select e.ename, d.dname from emp e, dept d where e.deptno (+) = d.deptno; (ROJ)
It displays : operations(40)



Select e.ename, d.dname from emp e RIGHT OUTER JOIN dept d ON e.deptno = d.deptno;


10) Full outer join

A full outer join also returns non matching rows from both the tables left and right.

Select e.ename, d.dname from emp e FULL OUTER JOIN dept d ON e.deptno = d.deptno;


11) Self join

A self join is a join in which a table is joined with itself.

Select e1.ename, e2.ename from emp e1, emp e2 where e1.mgr = e2.empno;

Select e1.ename, e2.ename from emp e1 join emp e2 on e1.mgr = e2.empno;




PL/SQL - Sotred Functions


PL/SQL FUNCTIONS :

A function is a named pl/sql block that returns a value.

A function can be stored in the database as a schema object for repeated execution.


Syntax :

Create or replace function function_name
Parameter mode datatype
----
Return datatype
Is | as
Pl/sql block


Advantages:

--> Increase efficiency of queries by performing functions in the query rather than in the application.

--> Permits calculations that are too complex.

Eg:

Create or replace function get_sal
(p_id in emp.empno%type)
Return number
Is
V_sal emp.sal%type := 0;
Begin
Select sal into v_sal from emp where empno = p_id;
Return v_sal;
End;

Variable g_sal number;

Execute :g_sal := get_sal(117);

Print g_sal;


A function may accept one or many parameters but must return a single value.



Executing functions:

--> Invoke a function as part of a pl/sql expression.
--> Create a variable to hold the return value.




Invoking functions in SQL expressions :

Create or replace function tax (p_value in number)
Return number
Is
Begin
Return (p_value * 0.08);
End;

Select empno, ename, sal, tax(sal) from emp where empno=7900;



Locations to call user defined functions

--> Select list of select command.
--> Condition of the where and having clause.
--> Connect by, start with, order by and group by clauses.
--> values clauses of insert command.
--> set clause of the update command.


PL/SQL user defined functions can be called from any where, where built in functions can be called.


Restrictions on calling functions from SQL expressions :

--> Must be a stored function.

--> Accept only IN parameters.

--> Accept only valid SQL data types, not pl/sql specific types, as parameters.

--> Return data types that are valid sql data types not pl/sql specific types.

--> Functions called from sql expressions cannot contain DML statements.

--> Functions called from update or delete statements on a table T cannot contain a DML on the same table T.

--> Functions called from a DML statement on a table T cannot query the same table.


-->We must own or have the execute privilege.


Notes:

--> Only stored functions are callable from sql statements. Stored procedures cannot be called.


Removing functions

--> Drop function function_name;




Procedures Vs Functions :



Procedure :

--> Execute as a PL/SQL statement.

--> No return clause in the header.

--> Can return none, one or many values.

--> Can contain a return statement.

Function :

--> Invokes as part of an expression.

--> Must contain a return clause in the header.

--> Must return a single value.

--> Must contain at least one return statement.



PL/SQL - Stored Procedures



PL/SQL SUBPROGRAMS


A subprogram is a names PL/SQL block that can accept parameters and be invoked from a calling environment.


Is of two ways


A Procedure that performs an action


A Function that computes a value.


--> Provides modularity, reusability, extensibility, and maintainability.


--> It can be compiled and stored in the database.


--> Modularization is the process of breaking up large blocks of code into smaller groups of code called modules.



Structure of PL/SQL subprograms


IS | AS

Declaration section


BEGIN

Executable section


EXCEPTION


Exception section

End;


The header determines


--> The PL/SQL subprogram type that is either a procedure or a function.


--> The name of the subprogram.


-->The parameter list, if one exists.


--> The RETURN clause, which applies only to functions.



Procedure



A procedure is a type of subprogram that performs an action.

A procedure can be stored in the database as a schema object, for repeated execution.



Syntax :

CREATE [ OR REPLACE ] PROCEDURE procedure_name

[ (parameter1 [mode1] datatype1,

[parameter2] [mode2] datatyep2,

…..) ]

IS | AS

PL/SQL Block;



Parameter is name of a pl/sql variable whose value is passed to or populated by the calling environment.


Mode: type of argument: IN(default), OUT, IN OUT


Formal parameters :


Variables declared in the parameter list of a subprogram specification.


Actual parameters:


Variables or expressions referenced in the parameter list of subprogram call.

Eg: raise_sal( emno, 2000);



Procedural parameter modes:



IN : Passes a constant value from the calling environment into the procedure.


OUT : Passes a value from the procedure to calling environment.


IN OUT : Passes a value from the calling environment into the procedure and a possibly different value from the procedure back to the calling environment using the same parameter.



Eg: (IN)



CREATE OR REPLACE PROCEDURE raise_sal

(p_id IN emp.empno%TYPE)

IS

BEGIN

UPDATE emp

SET sal = sal*1.10

WHERE empno = p_id;

END;



Executing a procedure :


EXECUTE raise_sal(7900);




Eg: (OUT)



CREATE OR REPLACE PROCEDURE query_emp

(p_id IN emp.empno%TYPE,

P_name OUT emp.ename%TYPE,

P_sal OUT emp.sal%TYPE,

P_comm OUT emp.comm%TYPE)

IS

BEGIN

SELECT ename, sal, comm

INTO p_name, p_sal, p_comm

FROM emp

WHERE empno = p_id;

END;



Viewing OUT parameters:



Declare host variables

Variable g_name varchar2(25)

Variable g_sal number

Variable g_comm number

EXECUTE query_emp (7900, :g_name, g_sal, :g_comm);

Print g_name g_sal g_comm

Eg: (IN OUT)

CREATE OR REPLACE PROCEDURE format_phone

(p_phone_no IN OUT varchar2)

IS

BEGIN

P_phone_no := ‘{‘ || substr(p_phone_no, 1, 3) ||

‘)’ || substr(p_phone_no, 4,3) ||

‘-‘ || substr(p_phone_no, 7);

END;

Variable g_phone_no varchar2(15)

BEGIN

:g_phone_no := ‘8006330575’;

END;

/


Exec format_phone(:g_phone_no)

Print g_phone_no




Methods for passing parameters:


Positional : List actual parameters in the same order as formal parameters.


Named : List actual parameters in arbitrary order by associating each with its corresponding

formal parameter.


Combination : List some as positional and some as named.



Eg: Default option for parameters.



Create or replace procedure add_dept

(p_name In dept.deptname%type default ‘unknown’,

P_loc in dept.loc%type default 1700)

Is

Begin

Insert into dept(dpetno, deptname, loc)

Values(dept_seq.nextval, p_name, p_loc);

End;


We can assign default values only to parameters of the IN mode.


OUT and IN OUT parameters are not permitted to have default values.



Examples of passing parameters



Begin

Add_dept;

Add_dept(‘TRAINING’, 2500);

Add_dept(p_loc => 2400, p_name => ‘EDUCATION’);

Add_dept(p_loc => 1200);

End;

/

Select deptno, deptname, loc from dept;



All the positional parameters should precede the named parameters in a subprogram call.



Declaring subprograms :



Create or replace procedure leave_emp

(p_id in emp.empno%type)

Is

Procedure log_exect

Is

Begin

Insert into log_table( user_id, log_date)

Values(user, sysdate)

End log_exect;

Begin

Delete from emp where empno = p_id;

Log_exect;

End leave_emp;



Invoking a procedure from anonymous pl/sql block :



Declare

V_id number := 163;

Begin

Raise_sal(v_id);

Commit;

----

End;



Invoking a procedure from another procedure :



Create or replace procedure process_emps

Is

Cursor emp_cursor is

Select empno from emp;

Begin

For emp_rec in emp_cursor

Loop

Raise_sal(emp_rec.empno);

End loop;

Commit;

End;



Handling exceptions :



Create procedure ins_dept( p_locid number) is

V_did number(4);

Begin

Dbms_output.put_line(‘Procedure started’);

Insert into dept values(5, ‘dept 5’, 145, p_locid);

Select deptno into v_did from emp where empno=7900;

End;

Create procedure ins_loc( p_lid number, p_city varchar2)

Is

V_city varchar2(30);

V_dname varchar2(30);

Begin

Dbms_output.put_line (‘main procedure pro_loc’);

Insert into locations (locid, city) values(p_lid, p_city);

Select city into v_city from locations where locid = p_lid;

Dbms_output.put_line( ‘inserted city’ || v_city);

Dbms_output.put_line(‘Invoking the procedure pro_ins’);

Ins_dept(p_lid);

Exception

When no_data_found then

Dbms_output.put_line(‘No such dept/loc for any employee’);

End;



Removing procedures:



Drop procedure procedure_name;



Drop procedure raise_sal;


Local subprograms are programs that are defined within the declaration section of another program.



Friday, August 7, 2009

Informatica - Joiner Transformation

Joiner

Use the Joiner transformation to join source data from two related heterogeneous sources residing in different locations or file systems.


The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.

limitations on the pipelines you connect to the Joiner transformation:

-->You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.

--> You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation.

Join Types

The Joiner transformation supports the following types of joins:
--> Normal
--> Master Outer
--> Detail Outer
--> Full Outer

Note: A normal or master outer join performs faster than a full outer or detail outer join.

Normal Join

With a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the condition.

Master Outer Join

A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.

Detail Outer Join

A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.

Full Outer Join

A full outer join keeps all rows of data from both the master and detail sources.

Caching Master Rows

When the Integration Service processes a Joiner transformation, it reads rows from both sources concurrently and builds the index and data cache based on the master rows. The Integration Service then performs the join based on the detail source data and the cache data. The number of rows the Integration Service stores in the cache depends on the partitioning scheme, the source data, and whether you configure the Joiner transformation for sorted input. To improve performance for an unsorted Joiner transformation, use the source with fewer rows as the master source. To improve performance for a sorted Joiner transformation, use the source with fewer duplicate key values as the master.

Note:

The Joiner transformation does not match null values. For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does not join the two rows. To join rows with null values, replace null input with default values, and then join on the default values.

Informatica - Aggregator Transformation


Aggregator

The Aggregator transformation performs aggregate calculations, such as averages and sums. The Integration Service performs aggregate calculations as it reads and stores data group and row data in an aggregate cache.


Aggregate cache. The Integration Service stores data in the aggregate cache until it completes aggregate calculations. It stores group values in an index cache and row data in the data cache.


Group by port. Indicate how to create groups. The port can be any input, input/output, output, or variable port. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.


Sorted input. Select this option to improve session performance. To use sorted input, you must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.


Sorted Input Conditions


Do not use sorted input if either of the following conditions are true:

-->The aggregate expression uses nested aggregate functions.

-->The session uses incremental aggregation.

If you use sorted input and do not sort data correctly, the session fails.

To use sorted input, you pass sorted data through the Aggregator.Data must be sorted in the following ways:

-->By the Aggregator group by ports, in the order they appear in the Aggregator transformation.

-->Using the same sort order configured for the session.


Note: The Integration Service uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. You do not need to configure cache memory for Aggregator transformations that use sorted ports.


--> The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions.


Conditional Clauses
Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
For example, use the following expression to calculate the total commissions of employees who exceeded their quarterly quota:SUM( COMMISSION, COMMISSION > QUOTA )


Non-Aggregate Functions
You can also use non-aggregate functions in the aggregate expression.
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))


Incremental Aggregation
After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.

Informatica - Stored Procedure Transformation



Stored Procedure Transformation

A stored procedure is a precompiled collection of Transact-SQL, PL-SQL or other database procedural statements and optional flow control statements, similar to an executable script. Stored procedures are stored and run within the database.

You might use stored procedures to complete the following tasks:

--> Check the status of a target database before loading data into it.
--> Determine if enough space exists in a database.
--> Perform a specialized calculation.
--> Drop and recreate indexes.

Stored procedures also provide error handling and logging necessary for critical tasks.

You might use a stored procedure to perform a query or calculation that you would otherwise make part of a mapping.

There are three types of data that pass between the Integration Service and the stored procedure:

Input/Output Parameters

For many stored procedures, you provide a value and receive a value in return. These values are known as input and output parameters.

Return Values

The Stored Procedure transformation captures return values in a similar manner as input/output parameters, depending on the method that the input/output parameters are captured.
If a stored procedure returns a result set rather than a single return value, the Stored Procedure transformation takes only the first value returned from the procedure.

Status Codes

Status codes provide error handling for the Integration Service during a workflow. The stored procedure issues a status code that notifies whether or not the stored procedure completed successfully. The Integration Service uses it to determine whether to continue running the session or stop.



Connected Stored Procedure

The flow of data through a mapping in connected mode also passes through the Stored Procedure transformation. All data entering the transformation through the input ports affects the stored procedure. You should use a connected Stored Procedure transformation when you need data from an input port sent as an input parameter to the stored procedure, or the results of a stored procedure sent as an output parameter to another transformation.



We can go for Connected when we want to

--> Run a stored procedure every time a row passes through the Stored Procedure transformation.
--> Pass parameters to the stored procedure and receive a single output parameter.
--> Pass parameters to the stored procedure and receive multiple output parameters.


Un Connected Stored Procedure

The unconnected Stored Procedure transformation is not connected directly to the flow of the mapping. It either runs before or after the session, or is called by an expression in another transformation in the mapping.

We can go for Connected when we want to

--> Run a stored procedure before or after a session.
--> Run a stored procedure once during a mapping, such as pre- or post-session.
--> Run nested stored procedures.
--> Call multiple times within a mapping.


Stored Procedure Types

Normal. The stored procedure runs where the transformation exists in the mapping on a row-by-row basis. This is useful for calling the stored procedure for each row of data that passes through the mapping, such as running a calculation against an input port. Connected stored procedures run only in normal mode.

Pre-load of the Source: Before the session retrieves data from the source, the stored procedure runs. This is useful for verifying the existence of tables or performing joins of data in a temporary table.

Post-load of the Source: After the session retrieves data from the source, the stored procedure runs. This is useful for removing temporary tables.

Pre-load of the Target: Before the session sends data to the target, the stored procedure runs. This is useful for verifying target tables or disk space on the target system.

Post-load of the Target: After the session sends data to the target, the stored procedure runs. This is useful for re-creating indexes on the database.

You can run more than one Stored Procedure transformation in different modes in the same mapping. For example, a pre-load source stored procedure can check table integrity, a normal stored procedure can populate the table, and a post-load stored procedure can rebuild indexes in the database.



--> Stored Procedure transformations are created as Normal type by default, which means that they run during the mapping, not before or after the session.

-->If stored procedures exist in the database that do not contain parameters or return values, they appear in a folder called PROCEDURES otherwise in the FUNCTIONS folder.



Configuring an Unconnected Transformation

When using an unconnected Stored Procedure transformation in an expression, you need a method of returning the value of output parameters to a port.

--> Assign the output value to a local variable.
--> Assign the output value to the system variable PROC_RESULT.

By using PROC_RESULT, you assign the value of the return parameter directly to an output port, which can apply directly to a target. You can also combine the two options by assigning one output parameter as PROC_RESULT, and the other parameter as a variable.

Use PROC_RESULT only within an expression. If you do not use PROC_RESULT or a variable, the port containing the expression captures a NULL. You cannot use PROC_RESULT in a connected Lookup transformation or call text.

If the stored procedure returns a single output parameter or a return value (but not both), you should use the reserved variable PROC_RESULT as the output variable. In the previous example, the expression would appear as:
:SP.GET_NAME_FROM_ID(inID, PROC_RESULT)

inID can be either an input port for the transformation or a variable in the transformation. The value of PROC_RESULT is applied to the output port for the expression.

If the stored procedure returns multiple output parameters, you must create variables for each output parameter. For example, if you create a port called varOUTPUT2 for the stored procedure expression, and a variable called varOUTPUT1, the expression appears as:

:SP.GET_NAME_FROM_ID(inID, varOUTPUT1, PROC_RESULT)




Calling a Pre- or Post-Session Stored Procedure

You may want to run a stored procedure once per session. For example, if you need to verify that tables exist in a target database before running a mapping, a pre-load target stored procedure can check the tables, and then either continue running the workflow or stop it. You can run a stored procedure on the source, target, or any other connected database.




Stored Procedure Properties --> Call Text

This is the name of the stored procedure, followed by all applicable input parameters in parentheses. If there are no input parameters, you must include an empty pair of parentheses, or the call to the stored procedure fails.
For example, to call a stored procedure called check_disk_space, enter the following text:
check_disk_space()

if the stored procedure check_disk_space required a machine name as an input parameter, enter the following text:
check_disk_space(oracle_db)

When passing a datetime value through a pre- or post-session stored procedure, the value must be in the Informatica default date format and enclosed in double quotes as follows:
SP(“12/31/2000 11:45:59”)


Error Handling

Sometimes a stored procedure returns a database error, such as “divide by zero” or “no more rows.” The final result of a database error during a stored procedure depends on when the stored procedure takes place and how the session is configured.
You can configure the session to either stop or continue running the session upon encountering a pre- or post-session stored procedure error. By default, the Integration Service stops a session when a pre- or post-session stored procedure database error occurs.

Session --> Config Object --> On Stored Procedure error --> Stop, Continue


If the database returns an error for a particular row, the Integration Service skips the row and continues to the next row. As with other row transformation errors, the skipped row appears in the session log.

Notes

--> A single output parameter is returned using the variable PROC_RESULT.

--> When you call a stored procedure in an expression, use the :SP reference qualifier.

--> Use PROC_RESULT to apply the output parameter of a stored procedure expression directly to a target. You cannot use a variable for the output parameter to pass the results directly to a target. Use a local variable to pass the results to an output port within the same transformation.

--> Nested stored procedures allow passing the return value of one stored procedure as the input parameter of another stored procedure. For example, if you have the following two stored procedures:




−get_employee_id (employee_name)
−get_employee_salary (employee_id)
And the return value for get_employee_id is an employee ID number, the syntax for a nested stored procedure is:
:sp.get_employee_salary (:sp.get_employee_id (employee_name))

--> Do not use single quotes around string parameters. If the input parameter does not contain spaces, do not use any quotes. If the input parameter contains spaces, use double quotes.


Informatica PowerCenter 8.x Architecture and Connectivity


Informatica PowerCenter Architecture & Connectivity


PowerCenter 7.x Architecture:

--> The management of the Repository is done through the Repository Server. The Repository server can be managed and administered through Repository Server Administrative Console interface. All the client tools and Informatica Server communicate with the Repository through Repository Server.
Repository Server Administration Console is a client tool used to create/maintain repositories and configure Repository Servers. All the tasks like starting a repository, backup/restore and upgrade are performed using this tool.
One Repository Server can manage multiple repositories. For each repository there is one Repository Agent. Repository Agent is a multi-threaded process which inserts, updates and retrieves metadata from Informatica Repository


PowerCenter 8.x Architecture:

--> In PowerCenter 8.x, the client-server architecture is enhanced to Service Oriented architecture. So Repository Server becomes Repository Service and Informatica Server becomes Integration Service. These are the Application Services. Other type of service is Core service that manages the PowerCenter environment.
Repository Server Admin Console is transformed to web-based Administration Console. Administration Console manages the entire PowerCenter environment.

Repository Service manages communication of all components (client tools and Integration Service, Grid) with the Repository. One Repository Service can manage only one Repository but one Repository Server managing multiple repositories in 7.x.
Repository Service processes connect natively to the Repository. All the client tools and other services access Repository via Repository services over TCP/IP.
The integration service accesses Sources and targets natively or via ODBC drivers (Data Direct).


Differences between 7.x and 8.x

1) 7. X
Two server components - Repository Server (for managing communication of different Informatica client and server tools with Metadata repository) and Informatica Server (ETL engine of Informatica suite)

8. X
Architecture enhanced to Service-oriented architecture Application Services and Core Services perform ETL and manage the environment Repository Service manages connectivity with Metadata rep Integration Service is the ETL engine.

2) 7.x
Versioning is in-built in the suite. In addition, there are built-in configuration management tools in order to migrate components between environments such as Dev, Test and Prod supporting a lights out migration

8. X Enhanced Versioned Objects Explicit checkout of objects possible from this version onwards.



3) 7.x
Metadata repository can have data in English only

8. X
Multilingual support for Metadata

4) 7.x
Repository Server Administrator Console is a new client tool for maintaining and managing repository and the environment.

8. X
Administration Console -browser-based utility that enables o view domain properties & performs basic domain administration tasks. Integration Services can be tied to Repository via Administration Console. Domain -collection of nodes and services, is primary unit of administration.


5) 7. X


PMREPAGENT Command-line utility now available for Repository maintenance
PMREP used for User Management PMCMD is command-line execution program for Informatica workflows

8. X
PMCMD allows you to specify the Integration Service name and domain name.
INFACMD to administer PowerCenter domains and services
INFASETUP administer PowerCenter domain and node properties
PMREP has several new commands ported from PMREPAGENT




PowerCenter 8.x architecture terminology

SOA (Service Oriented Architecture)

An application architecture in which all functions, or services, invokes software interfaces that perform business processes.

Service

A task performed by a service provider to achieve desired end results for a service consumer.
All PowerCenter services run as services on a node.

Types
Application services
Core services






Core Services

Configuration Service: Manages service and node configuration metadata.
Domain Service: Manages other services on the current node or in the domain.
Service Process Controller: Controls application services on behalf of the Domain Service.
Gateway Service: Directs service requests to the appropriate service and node.
Log Service: Accumulates log events from the domain, core and application services, workflows, and sessions.
Licensing Service: Manages licensing for the domain.
Authentication Service: Authenticates domain users who log in to the Administration Console.
Admin Service: Provide services to the Administration Console

Application services

Repository services
Integration services
SAP BW services
Web Services Hub


Service Processes

The runtime instance of a service running on a node.
Each service process is a process on Windows and a daemon on UNIX. For example, each Integration Service process is represented as pmserver.exe on Windows, and each Repository Service process is represented as pmrepagent.exe on Windows.


Domain

Collection of nodes and services. Primary unit of administration.

Node
The logical representation of a machine in a domain. Each node runs a service manager.
One node in a domain is a gateway node.


Gateway node

Routes service requests from PowerCenter Client to available nodes.
One node in domain serves as a gateway for domain.
All core services run on gateway node.
If gateway node is unavailable, domain cannot accept service requests.
With the High Availability option, multiple nodes can serve as a gateway, but only one node is the gateway at a time.







PowerCenter 8.x New Features

--> High availability
--> Grid
--> Pushdown Optimization
--> Team based development changes
--> Data profiling changes
--> Partitioning changes



High availability

The term high availability refers to the elimination of single points of failure in PowerCenter domains. When you configure high availability for a domain, the domain can continue running despite temporary network or hardware failures.
Retry of sessions and fault tolerance for automatic failover Improved recovery so that a session can be handed off to a new node and can be automatically restarted

Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. This is not the same as fault tolerant, in which redundant components are designed for continuous processing without skipping a heartbeat. High availability also refers to being able to service a component in the system without shutting down the entire operation.

Failover
• The migration of a service process or task to another node when the node running the service process becomes unavailable.

Recovery

• The automatic or manual completion of tasks after an application service is interrupted.

Resilience
• The ability for services to tolerate transient failures, such as loss of connectivity to the database or network failures.

------------------------

Grid

Grid computing uses the resources of many separate computers connected by a network (usually the
internet) to solve large-scale computation problems.
A grid is group of nodes in a domain.
We can create heterogeneous grids (both UNIX and Windows machines in the same grid) Distributes to available nodes.

Session on grid: Distributes session partitions to different nodes

Workflow on grid: Distributes workflow tasks to different nodes

Grid is a service just like the Integration Service.
The Load Balancer is the component of the Integration Service that dispatches the different tasks to the nodes or the different threads to the DTM processes running on the nodes in the grid. The Load Balancer distributes tasks or threads based on node and resource availability.


----------------

Pushdown Optimization

A session option that causes the Integration Service to push some transformation logic to the source and/or target database.
You can choose source-side or target-side optimization, or both

Benefits:
• Can increase session performance
• Maintain metadata and lineage in PowerCenter repository
• Reduces movement of data (when source and target are on the same database)

Define “Full Optimization”

$$PushdownConfig mapping parameter:
The $$PushdownConfig mapping parameter lets you run the same session using the different types of pushdown optimization.
For example, you might want to use full pushdown optimization during the day, but use no pushdown optimization from midnight until 2 a.m. when the database is scheduled for routine maintenance. OR, you might want to use partial pushdown optimization during the peak hours of the day, but use full pushdown optimization from midnight until 2 a.m. when activity is low.

When sources and targets are on the same database, pushdown optimization avoids having to pull the data into PowerCenter and then pushing it back out again. This can be useful when you move data from a staging area to a data warehouse that exist on the same database.


Partial Pushdown optimization
– One or more transformations can be processed in source/target database

Full pushdown optimization
Source and target are in the same RDBMS
All transformations can be processed in database


Configuring Pushdown Optimization
Configure in Performance settings on the Properties tab in the session properties

Pushdown optimization: None, To Source, To Target, Full, $$PushdownConfig

Integration Service executes SQL against the database instead of processing the transformation logic itself Integration Service analyzes the mapping and writes one or more SQL statements based on the mapping transformation logic

When you use pushdown optimization, the Integration Service converts the expression in the transformation or in the workflow link by determining equivalent operators, variables, and functions in the database.
If there is no equivalent operator, variable, or function, the Integration Service processes the transformation logic.
For example, the Integration Service translates the aggregate function, STDDEV() to STDDEV_SAMP() on Teradata and STDEV() on Microsoft SQL Server. However, no
database supports the aggregate function, FIRST(), so the Integration Service processes any transformation that uses the FIRST() function.

You can preview which transformations are pushed to the database
You can preview the revised SQL statement for the source or target, but you cannot modify it.
Neither the revised SQL statements nor mapping changes are saved to the repository.


Pushdown OptimizationPreview from Session—Mapping Tab


Pushdown optimization supported transformations

To Source
Aggregator, Expression, Filter, Joiner, Lookup, Sorter, Union

To Target
Expression, Lookup, Target definition

Unconnected transformations do not get pushed down


Team Based Development changes

Versioning:
• Can explicitly check out objects—opening an object no longer checks it out automatically
• Can view older object versions in the workspace


Deployment:

• Can assign owners and groups to folders and deployment groups
• Can generate deployment control file (XML) to deploy folders and deployment groups with pmrep

Partitioning changes

Dynamic partitioning

• Integration Service determines the number of partitions to create at run time
• Integration Service scales the number of session partitions based on factors such as source database partitions or the number of nodes in a grid
• Useful if volume of data increases over time, or you add more CPUs


Developer New Features

User-Defined Functions

• Can create user-defined functions to use in transformations and workflow tasks
• Build complex expressions and reuse them
• Available to other repository users
• Include the functions in expressions or other user-defined functions
• Include any valid function except aggregate functions

• Two types:

• Public: Callable from any transformation expression
• Private: Only callable from another user-defined function

Creating User-Defined Functions
Choose Tools > User-Defined Functions
User-Defined Function Prefix : :UDF.TRIM(NAME)


Custom Functions

• Function created outside of PowerCenter using the Custom Functions API (shipped with PowerCenter)
• API uses C programming language
• Share custom functions with others
• Add to a repository as a plug-in
• Use in mapping and workflow expressions like native functions


User-Defined v. Custom Functions

User-Defined
• Created in the Designer
• Repository object
• Use in mapping or workflow expressions
• Same name
• Available to all folders

Custom
• Created outside the Client
• Repository plug-in
• Use in mapping or workflow expressions
• Unique name
• Available to all folders

User-defined functions can have the same name as existing, built-in functions. Custom function names must be unique. That is, they cannot be the same as built-in functions. User-defined functions can have the same name because the Client includes UDF: before the function name to make it unique.



Usability Enhancements

Propagate port description
• In the Designer, you can edit a port description and propagate the description to other transformations in the mapping

Unicode Repository
• Store metadata from multiple languages in the same repository
• Choose UTF-8 as the repository code page
• Repository database code page must be UTF-8

Command Line Programs

infacmd
• New program to administer application services and service processes.
• Perform tasks such as enabling and disabling services and purging log events.

pmrepagent
• Discontinued. Use replacement commands in pmrep.
pmrep
• Includes former pmrepagent commands. Also includes new syntax to connect to a domain.
pmcmd
• Updated to support new Integration Service functionality.



Example infacmd Commands

• AddLicense
• EnableService
• GetLog
• GetServiceStatus
• RemoveNode
• UpdateNode