Basic Profanity filter in Objective-C

Below is an Objective-C function which will return YES if there are no profane words in the input text string else it will return NO. You can use it to check if any profane words are entered by the user.

-(BOOL)profaneWordsFilterInText:(NSString *)inText{

NSArray * listOfProfaneWords = @[@”word1″,  @”word2″,];                  NSError *error = NULL;

NSString *profaneWord = [[NSString alloc] init];

for (profaneWord in listOfProfaneWords) {

 NSRegularExpression * regex = [NSRegularExpressionregularExpressionWithPattern:[NSStringstringWithFormat:@”\\b%@\\b”, profaneWord ] options:NSRegularExpressionCaseInsensitiveerror:&error];

NSUInteger numberOfMatches = [regex numberOfMatchesInString:inText   options:range:NSMakeRange(0, [inText length])];

if (numberOfMatches > 0) {

return NO;

}

}

returnYES;

}

You can replace the word1, word2 and so on in the array listOfProfaneWords with the list of bad words from the link below:-

http://www.bannedwordlist.com   (smearWords.txt)

Feel free to leave a comment if you need any more information.

 

Introduction to Data Masking transformation in Informatica

Data Masking is a passive transformation added to Informatica version 8.6 which can be used to mask confidential data when transferring data to untrusted destinations.

You pass in the columns to be masked to the input group and then configure the masking formats for each input port. The resulting output ports will contain data that looks realistic but is actually masked.

It is very useful in situation where you need to keep the non-prod environment data in sync with prod data for maintaining high quality test environments.

It also helps to maintain referential integrity between the data but still mask the key which is used for referential integrity.

Masking rules are dependent on the source data type and the masking type configured for the port.

Each masking type has different masking rules and associated properties that you need to configure.

Below mentioned masking types are available:-
1)  Key Masking
Masks the key value and produces deterministic results for a combination of input key value, masking rule and seed value.

2)  Substitution Masking
Replaces the input column data with a value from a dictionary file or table

3)  Dependent Masking
Replaces the value of the input column based on the value of another column

4)  Random Masking
Replaces input column value with random values for same source data and masking rule

5)  Expression Masking
Uses informatica transformation functions to mask and replace data in source columns

6)  Special Mask formats
Replaces realistic fake values for fields like credit cards, addresses , and social security numbers etc.

7) No Masking

Default is No masking

After selecting the masking type you have to select the available masking rules and configure them according to your requirements.

Examples coming in the next post.

 

How to handle high precision dates in Informatica? the ones with milliseconds.

Whenever you import a source from Oracle with a timestamp field, Informatica automatically imports that field as date with precision of 26 and scale 6. This means that any digits in the milliseconds range will get truncated while reading or writing to the table/file.

This means losing the intended precision of data resulting in incorrect comparisons in the transformation process.

The way to resolve this is to increase the precision of the source or target definition to precision 29 and scale 9 after the source/target is imported into Informatica. This will handle the digits in milliseconds without converting them to all zeros.

In addition to this the timestamp field for which you need high precision values needs to be transformed using the correct format in to_timestamp(column_name,’MM-DD-YYYY HH24:MI:SS:FF6′) function in the source qualifier query override.

How to pass a parameter file to an Informatica workflow at run time

The name and path to the parameter file is often hardcoded in the workflow or session properties tab. There is no option to parameterize the parameter file at the workflow or session level. Understandably so, as its the parameter file itself.

However there are times when you need to invoke the same workflow with different set of parameters. And, you cannot go and change the production code every time you need to do so.

To solve this problem, Informatica provides an option to specify the parameter file as an argument with its pmcmd command.

As you might know pmcmd is the command line interface exposed by Informatica Powercenter to control the workflow execution. Using this command in this scenario can help you in passing in different parameter files every time you invoke the workflow.

Below is the example usage of this command with parameter file option:-

pmcmd start workflow -service <IS_nameOfIntegrationService>
                                                    -user <username>
                                                    -password <password>
                                                    -[-folder <folder_name>]
                                                    -[-paramfile <path\nameOfParamFile>]
                                                    <workflow_name>
(remove the angle and square brackets)

Common practice is to create a shell script or batch script which invokes the workflow by pmcmd command with relevant parameter file.

Why do Informatica workflows go into a Waiting state?

Sometimes Informatica workflows when triggered can go into a waiting state instead of a running state. This usually happens when there are a number of workflows running in parallel.

Informatica admin console has a property where the Admin can define the maximum number of concurrent sessions that can run on the server. It is decided based on the capacity of the server by the Admin. However by default it is set to 10 which should suffice most production scenarios.

This property limits the maximum number of workflows that can run at any given time. Increasing the value can allow more number of sessions to be executed in parallel but should be done only after considering the current capacity and load on the server.

How to create a password protected .zip file using Informatica

In Informatica there is no inherent option to zip files. But sometimes there are requirements which need a compressed file with password protection to be sent over the internet to external vendors

In order to fulfill this requirement you should have a compression tool, which has a command line interface, installed on your Informatica server.

One such tool most commonly used on windows platform is WinZip. It has a very well defined command line interface which can be used in the post success scripts section of the session to zip and password protect the file.

Below is an example to zip and password protect all files in the current folder to a file named test.zip

wzzip -spassword d:\temp\test.zip *.*

[You guessed it right the password set in above case will be “password” and you can set it as you like]

Have fun with zipping and protecting your Informatica generated files 🙂

 

How to avoid Null = Null comparisons in Informatica lookup transformation

Informatica lookup transformation by default evaluates Null = Null condition to be true. This means if you do not explicitly take care of passing Not Null values to lookup input ports then the lookup might return random records from the lookup source. It is not certain if this is a bug or done intentionally by Informatica.

People from Oracle background might not realize this until it is too late because in Oracle Null = Null is evaluated to False.

It is a good coding practice to explicitly check all lookup input ports in an expression transformation for Null values before passing them to the lookup transformation.

The most common way of doing this is to use NVL function to replace null values with default characters. It will ensure none of the input ports are Null.

This is a kind of bug which can easily slip through to production environment. In order to avoid this it is advisable to have this in the review items checklist.

 

How to reduce the size of cache files created by an Informatica session

Informatica at run time creates different types of cache files depending on the transformations used in the mapping and configuration of the session.

Sometimes the cache file size can get so huge that the whole disk gets used up which causes fatal errors in other parallel running workflows and sessions. Huge cache files are also a performance bottleneck and the design should factor in steps to minimize the cache file size.

Following two steps can help in reducing the cache file size created by Informatica sessions:-

1. Joiner Cache and Aggregator cache can be reduced greatly if you provide them sorted inputs and check the sorted input flag in the config tab. Sorted input can be provided by passing data through a sorter transformation or sorting at database level in Source qualifier.

2. Lookup cache can be reduced if you only include the columns needed in the condition, return or output ports of the lookup i.e. by removing any other columns from the ports tab. Further improvement in lookup cache can be achieved by using sql query override to limit the rows by using a where clause to limit the overall lookup dataset.

Hope these two steps help you in preventing any unexpected failures at runtime due to space constraints

How to return multiple columns from an Unconnected Lookup in Informatica

A lookup transformation configured as an Unconnected Lookup can only return one port. However, there are times when you need to quickly modify a mapping to return more than one port from the unconnected lookup.

There are two ways to do this:-
1. To convert the unconnected lookup to a connected lookup in the pipeline and then take out all the required ports

2. To use Lookup Sql query override to concatenate multiple columns with a common delimiter and return the concatenated value.

The second option is like a hack and saves your time required to change the configuration of a lookup from unconnected to connected.

However, a point to remember is that it only works if the lookup is on a database table. In case of lookups on flat files the sql query override option is not available.

Once you concatenate multiple fields using sql query override you can again separate them out in the calling expression transformation by using substr and instr functions of Informatica and assign the resulting values to individual ports or variables.

Hope this helps and do let me know if you need more details on this technique.

What happens when a Return port is not specified in an Unconnected Informatica Lookup

An unconnected lookup in Informatica can return only one port’s value. This is usually the port which is marked as ‘R’ in the Ports tab of the lookup transformation

But if there is only one Output port in the lookup transformation and all the other other ports are the ones used in the condition of the lookup, the unconnected lookup by default returns the output port. There is no need of specifying the return port.

However if there are multiple output ports, it is mandatory to specify the return port.