What is PowerCenter Integration Service ?
The PowerCenter Integration Service moves data from sources to targets based on PowerCenter workflow, session and mapping related metadata stored in a PowerCenter repository. When a workflow starts, the PowerCenter Integration Service retrieves mapping, session and workflow related metadata from the repository. It extracts data from the mapping sources and stores the data in memory while it applies the transformation rules configured in the mapping.
The PowerCenter Integration Service loads the transformed data into one or more targets. To move data from sources to targets, the PowerCenter Integration Service uses components are PowerCenter Integration Service Process, Load Balancer and Data Transformation Manager process (DTM):
PowerCenter Integration Service process: Integration Service starts one or more service processes to run and monitor workflows. When a workflow run, the Integration Service process starts and locks the workflow, runs the workflow tasks, and starts the process to run sessions.
Load Balancer: It uses the Load Balancer to dispatch tasks. The Load Balancer dispatches tasks to achieve optimal performance. It may dispatch tasks to a single node or across the nodes in a grid.
Data Transformation Manager (DTM) process: It starts a DTM process to run each Session and Command task within a workflow. The DTM process performs session validations, creates threads to initialise the session, read, write, and transform data, and handles pre and post session operations.
Informatica PowerCenter Integration Service Connectivity
The PowerCenter Integration Service is a repository client. It connects to the PowerCenter Repository Service to retrieve workflow and mapping metadata from the repository database. When the PowerCenter Integration Service process requests a repository connection, the request is routed through the master gateway, which sends back PowerCenter Repository Service information to the PowerCenter Integration Service process. The PowerCenter Integration Service process connects to the PowerCenter Repository Service. The PowerCenter Repository Service connects to the repository and performs repository metadata transactions for the client application.
The PowerCenter Workflow Manager communicates with the PowerCenter Integration Service process over a TCP/IP connection. The PowerCenter Workflow Manager communicates with the PowerCenter Integration Service process each time you schedule or edit a workflow, display workflow details, and request workflow and session logs. Use the connection information defined for the domain to access the PowerCenter Integration Service from the PowerCenter Workflow Manager.
The PowerCenter Integration Service process connects to the source or target database using ODBC or native drivers. The PowerCenter Integration Service process maintains a database connection pool for stored procedures or lookup databases in a workflow. The PowerCenter Integration Service process allows an unlimited number of connections to lookup or stored procedure databases. If a database user does not have permission for the number of connections a session requires, the session fails. You can optionally set a parameter to limit the database connections. For a session, the PowerCenter Integration Service process holds the connection as long as it needs to read data from source tables or write data to target tables.
PowerCenter Integration Service
The PowerCenter Integration Service is an application service that runs sessions and workflows.
Integration Service Process
The PowerCenter Integration Service starts a PowerCenter Integration Service process to run and monitor workflows. The PowerCenter Integration Service process is also known as the pmserver process. The PowerCenter Integration Service process accepts requests from the PowerCenter Client and from pmcmd.
It performs the following tasks:
- Manage workflow scheduling.
- Lock and read the workflow.
- Read the parameter file.
- Create the workflow log.
- Run workflow tasks and evaluates the conditional links connecting tasks.
- Start the DTM process or processes to run the session. ·
- Write historical run information to the repository. ·
- Send post-session email in the event of a DTM failure.
Load Balancer
The Load Balancer is the object of the PowerCenter Integration Service and that dispatches tasks to achieve optimal performance and scalability. When you run a workflow, the Load Balancer dispatches the Session, Command, and predefined EventWait tasks within the workflow. The Load Balancer matches task requirements with resource availability to identify the best node to run a task. It dispatches the task to a Integration Service process running on the node. It may dispatch tasks to a single node or across nodes.
The Load Balancer dispatches tasks in the order it receives them. When the Load Balancer needs to dispatch more Session and Command tasks than the PowerCenter Integration Service can run, it places the tasks it cannot run in a queue. When nodes become available, the Load Balancer dispatches tasks from the queue in the order determined by the workflow service level.
Load Balancer functionality
Dispatch process: The Load Balancer performs several steps to dispatch tasks.
Resources: The Load Balancer can use PowerCenter resources to determine if it can dispatch a task to a node.
Resource provision thresholds: The Load Balancer uses resource provision thresholds to determine whether it can start additional tasks on a node.
Dispatch mode: The dispatch mode determines how the Load Balancer selects nodes for dispatch. ·
Service levels : When multiple tasks are waiting in the dispatch queue, the Load Balancer uses service levels to determine the order in which to dispatch tasks from the queue.
Data Transformation Manager (DTM)
Data Transformation Manager (DTM) Process the Integration Service process starts the DTM process to run a session. The DTM process is also known as the pmdtm process. The DTM is the process associated with the session task.
Processing Threads
The DTM allocates process memory for the session and divides it into buffers. This is also known as buffer memory. The DTM uses multiple threads to process data in a session. The main DTM thread is called the master thread.
Grids
When you run a Integration Service on a grid, a master service process runs on one node and worker service processes run on the remaining nodes in the grid. The master service process runs the workflow and workflow tasks, and it distributes the Session, Command, and predefined EventWait tasks to itself and other nodes. A DTM process runs on each node where a session runs. If a session run on a grid, a worker service process can run multiple DTM processes on different nodes to distribute session threads.
Code Pages and Data Movement Modes
The PowerCenter Integration Service can move data in either ASCII or Unicode data movement mode. These modes determine how the Integration Service handles character data. You choose the data movement mode in the Integration Service configuration settings. If you want to move multibyte data, choose Unicode data movement mode. To ensure that characters are not lost during conversion from one code page to another, you must also choose the appropriate code pages for your connections.
ASCII Data Movement Mode
In ASCII data movement mode when all sources and targets are 7bit ASCII or EBCDIC character sets. In ASCII mode, the PowerCenter Integration Service recognizes 7bit ASCII and EBCDIC characters and stores each character in a single byte.
Unicode Data Movement Mode
Use Unicode data movement mode when sources or targets use 8bit or multibyte character sets and contain character data. In Unicode mode, the PowerCenter Integration Service recognizes multibyte character sets as defined by supported code pages.