Data Masking: Introduction to Key Masking

This type of data masking is used when a deterministic and repeatable output for a key value is needed. This means same input across multiple table columns will return the same output.

It is helpful in maintaining the foreign key integrity across tables even when masking the key value. The controlling input is the seed value which ensures that for an input value + seed value combination same output value is generated every time.

String, numeric and datatypes values can be masked with this technique.

Configuration
1. Select the port in the masking properties tabs which you want to mask and select the key masking type from the masking column.

2. After you select the masking type you will get the options to configure the masking rules for this type of masking in the same dialogue box

3. Select the Seed Value radio button and enter any value between 1 and 1000. Use the same seed value across different columns where you want the input value to return same masked output. The seed value can also be parameterized and supplied to the mapping with the help of a mapping parameter.

You can also specify a default value for the seed value in the XML file at the following location

<PowerCenter Installation Directory>\infa_shared\SrcFiles\defaultValue.xml

If the default value is not between 1 and 1000 the Integration Service assigns a value of 725.

4. Specify the Mask Format for ex – **DDD+AAAAAA**

Mask format limits each character in the output column to an alphabet, numeric or alphanumeric.
The example mask format is for a string column of length 10. It means replace first three characters with digits between 0 and 9, keep the third character as it is and replace last 6 characters with alphabetical characters from a to z.

Mask format characters and their description
A – Alphabetic characters from a to z or A to Z
D – Digits 0 to 9
N – Alphanumeric characters, a to z, A to Z and 0 to 9
X – Any character, alphanumeric or symbols
+ – No Masking
R – Remaining characters can be any character, must be the last character in the mask format

 
5. You can optionally specify which strings in the source value to mask and which ones to ignore. If this is blank the transformation will mask all the characters occurring in the input value.

6. Also you can specify the character set to use to replace the input values with. Usually we use **ABCDEFGHIJKLMNOPQRSTUVWXYZ** as the value for the result String Characters field.

 

Feel free to comment below if you need any help in configuring the Key Data masking transformation.

Basic Profanity filter in Objective-C

Below is an Objective-C function which will return YES if there are no profane words in the input text string else it will return NO. You can use it to check if any profane words are entered by the user.

-(BOOL)profaneWordsFilterInText:(NSString *)inText{

NSArray * listOfProfaneWords = @[@”word1″,  @”word2″,];                  NSError *error = NULL;

NSString *profaneWord = [[NSString alloc] init];

for (profaneWord in listOfProfaneWords) {

 NSRegularExpression * regex = [NSRegularExpressionregularExpressionWithPattern:[NSStringstringWithFormat:@”\\b%@\\b”, profaneWord ] options:NSRegularExpressionCaseInsensitiveerror:&error];

NSUInteger numberOfMatches = [regex numberOfMatchesInString:inText   options:range:NSMakeRange(0, [inText length])];

if (numberOfMatches > 0) {

return NO;

}

}

returnYES;

}

You can replace the word1, word2 and so on in the array listOfProfaneWords with the list of bad words from the link below:-

http://www.bannedwordlist.com   (smearWords.txt)

Feel free to leave a comment if you need any more information.

 

Introduction to Data Masking transformation in Informatica

Data Masking is a passive transformation added to Informatica version 8.6 which can be used to mask confidential data when transferring data to untrusted destinations.

You pass in the columns to be masked to the input group and then configure the masking formats for each input port. The resulting output ports will contain data that looks realistic but is actually masked.

It is very useful in situation where you need to keep the non-prod environment data in sync with prod data for maintaining high quality test environments.

It also helps to maintain referential integrity between the data but still mask the key which is used for referential integrity.

Masking rules are dependent on the source data type and the masking type configured for the port.

Each masking type has different masking rules and associated properties that you need to configure.

Below mentioned masking types are available:-
1)  Key Masking
Masks the key value and produces deterministic results for a combination of input key value, masking rule and seed value.

2)  Substitution Masking
Replaces the input column data with a value from a dictionary file or table

3)  Dependent Masking
Replaces the value of the input column based on the value of another column

4)  Random Masking
Replaces input column value with random values for same source data and masking rule

5)  Expression Masking
Uses informatica transformation functions to mask and replace data in source columns

6)  Special Mask formats
Replaces realistic fake values for fields like credit cards, addresses , and social security numbers etc.

7) No Masking

Default is No masking

After selecting the masking type you have to select the available masking rules and configure them according to your requirements.

Examples coming in the next post.

 

How to handle high precision dates in Informatica? the ones with milliseconds.

Whenever you import a source from Oracle with a timestamp field, Informatica automatically imports that field as date with precision of 26 and scale 6. This means that any digits in the milliseconds range will get truncated while reading or writing to the table/file.

This means losing the intended precision of data resulting in incorrect comparisons in the transformation process.

The way to resolve this is to increase the precision of the source or target definition to precision 29 and scale 9 after the source/target is imported into Informatica. This will handle the digits in milliseconds without converting them to all zeros.

In addition to this the timestamp field for which you need high precision values needs to be transformed using the correct format in to_timestamp(column_name,’MM-DD-YYYY HH24:MI:SS:FF6′) function in the source qualifier query override.

How to pass a parameter file to an Informatica workflow at run time

The name and path to the parameter file is often hardcoded in the workflow or session properties tab. There is no option to parameterize the parameter file at the workflow or session level. Understandably so, as its the parameter file itself.

However there are times when you need to invoke the same workflow with different set of parameters. And, you cannot go and change the production code every time you need to do so.

To solve this problem, Informatica provides an option to specify the parameter file as an argument with its pmcmd command.

As you might know pmcmd is the command line interface exposed by Informatica Powercenter to control the workflow execution. Using this command in this scenario can help you in passing in different parameter files every time you invoke the workflow.

Below is the example usage of this command with parameter file option:-

pmcmd start workflow -service <IS_nameOfIntegrationService>
                                                    -user <username>
                                                    -password <password>
                                                    -[-folder <folder_name>]
                                                    -[-paramfile <path\nameOfParamFile>]
                                                    <workflow_name>
(remove the angle and square brackets)

Common practice is to create a shell script or batch script which invokes the workflow by pmcmd command with relevant parameter file.

Why do Informatica workflows go into a Waiting state?

Sometimes Informatica workflows when triggered can go into a waiting state instead of a running state. This usually happens when there are a number of workflows running in parallel.

Informatica admin console has a property where the Admin can define the maximum number of concurrent sessions that can run on the server. It is decided based on the capacity of the server by the Admin. However by default it is set to 10 which should suffice most production scenarios.

This property limits the maximum number of workflows that can run at any given time. Increasing the value can allow more number of sessions to be executed in parallel but should be done only after considering the current capacity and load on the server.

How to create a password protected .zip file using Informatica

In Informatica there is no inherent option to zip files. But sometimes there are requirements which need a compressed file with password protection to be sent over the internet to external vendors

In order to fulfill this requirement you should have a compression tool, which has a command line interface, installed on your Informatica server.

One such tool most commonly used on windows platform is WinZip. It has a very well defined command line interface which can be used in the post success scripts section of the session to zip and password protect the file.

Below is an example to zip and password protect all files in the current folder to a file named test.zip

wzzip -spassword d:\temp\test.zip *.*

[You guessed it right the password set in above case will be “password” and you can set it as you like]

Have fun with zipping and protecting your Informatica generated files 🙂

 

How to avoid Null = Null comparisons in Informatica lookup transformation

Informatica lookup transformation by default evaluates Null = Null condition to be true. This means if you do not explicitly take care of passing Not Null values to lookup input ports then the lookup might return random records from the lookup source. It is not certain if this is a bug or done intentionally by Informatica.

People from Oracle background might not realize this until it is too late because in Oracle Null = Null is evaluated to False.

It is a good coding practice to explicitly check all lookup input ports in an expression transformation for Null values before passing them to the lookup transformation.

The most common way of doing this is to use NVL function to replace null values with default characters. It will ensure none of the input ports are Null.

This is a kind of bug which can easily slip through to production environment. In order to avoid this it is advisable to have this in the review items checklist.

 

How to reduce the size of cache files created by an Informatica session

Informatica at run time creates different types of cache files depending on the transformations used in the mapping and configuration of the session.

Sometimes the cache file size can get so huge that the whole disk gets used up which causes fatal errors in other parallel running workflows and sessions. Huge cache files are also a performance bottleneck and the design should factor in steps to minimize the cache file size.

Following two steps can help in reducing the cache file size created by Informatica sessions:-

1. Joiner Cache and Aggregator cache can be reduced greatly if you provide them sorted inputs and check the sorted input flag in the config tab. Sorted input can be provided by passing data through a sorter transformation or sorting at database level in Source qualifier.

2. Lookup cache can be reduced if you only include the columns needed in the condition, return or output ports of the lookup i.e. by removing any other columns from the ports tab. Further improvement in lookup cache can be achieved by using sql query override to limit the rows by using a where clause to limit the overall lookup dataset.

Hope these two steps help you in preventing any unexpected failures at runtime due to space constraints

How to return multiple columns from an Unconnected Lookup in Informatica

A lookup transformation configured as an Unconnected Lookup can only return one port. However, there are times when you need to quickly modify a mapping to return more than one port from the unconnected lookup.

There are two ways to do this:-
1. To convert the unconnected lookup to a connected lookup in the pipeline and then take out all the required ports

2. To use Lookup Sql query override to concatenate multiple columns with a common delimiter and return the concatenated value.

The second option is like a hack and saves your time required to change the configuration of a lookup from unconnected to connected.

However, a point to remember is that it only works if the lookup is on a database table. In case of lookups on flat files the sql query override option is not available.

Once you concatenate multiple fields using sql query override you can again separate them out in the calling expression transformation by using substr and instr functions of Informatica and assign the resulting values to individual ports or variables.

Hope this helps and do let me know if you need more details on this technique.