Tuesday, September 3, 2013

UNIX - Command to count the number of times a string appears in a file




grep -c "input_string" filename

UNIX - Creating read-only file in UNIX



1. Create file using touch file_name.


2. Giving Read-only permission to the file by 

chmod 400 file_name.

Monday, September 2, 2013

Informatica - Difference between STOP and ABORT



Informatica - Difference between STOP and ABORT



1. Process:
The main difference between STOP and ABORT is process timeout period.

STOP:
It will stop reading from the source. But it will continue updating/committing changes in the target.

ABORT:It is same as that of STOP. But ABORT has the timeout period of 60 seconds. If the session fails to update/commit the changes in the target before 60 seconds then the session will be compulsarily aborted by terminating the DTM process thread.

2. Memory release:

STOP:
STOP will release the memory block that was occupied by the session properly.

ABORT:
ABORT will not release the memory immediately (taken care of other memory release mechanisms).


3. Consistency:

STOP: Stop will try to rollback to ensure the consistency of data.

ABORT: It will kill the process immediately and can not be rolled back. (Equivalent to UNIX kill -9)



Informatica 9 Architecture

Informatica 9 Architecture




DWH - Snowflake Schema


Snowflake schema


The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. "Snow flaking" is a method of normalizing the dimension tables in a STAR schema. The principle behind snow flaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.

Snowflake schemas are generally used when a dimensional table becomes very big and when a star schema can’t represent the complexity of a data structure. For example if a PRODUCT dimension table contains millions of rows, the use of snowflake schemas should significantly improve performance by moving out some data to other table (with BRANDS for instance). 


The problem is that the more normalized the dimension table is, the more complicated SQL joins must be issued to query them. This is because in order for a query to be answered, many tables need to be joined and aggregates generated.