Friday, September 5, 2014

SQL LOADER

SQL LOADER is an Oracle utility used to load data into table given a datafile which has the records that need to be loaded.

SQL*Loader takes data file, as well as a control file, to insert data into the table. When a Control file is executed, it can create Three (3) files called log file, bad file or reject file, discard file.
  • Log file tells you the state of the tables and indexes and the number of logical records already read from the input datafile. This information can be used to resume the load where it left off. 
  • Bad file or reject file gives you the records that were rejected because of formatting errors or because they caused Oracle errors.
  • Discard file specifies the records that do not meet any of the loading criteria like when any of the WHEN clauses specified in the control file. These records differ from rejected records.
Conventional and Direct path loader
  • The conventional path loader essentially loads the data by using standard INSERT statements. 
  • The direct path loader (DIRECT=TRUE) bypasses much of the logic involved with that, and loads directly into the Oracle data files.

Some of the restrictions with direct path loads are:

  • Loaded data will not be replicated
  • Cannot always use SQL strings for column processing in the control file (something like this will probably fail: col1 date "ddmonyyyy" "substr(:period,1,9)"). Details are in Metalink Note:230120.1. 

Structure of the data file:
The data file can be in fixed record format or variable record format.
Fixed Record Format would look like the below. In this case you give a specific position where the Control file can expect a data field:

1001       ARUN              CLERK                 1013     12/17/1980       800
1002       AMIT               SALESMAN         1006     2/20/1981         1600
1003       WARD             SALESMAN         1006     2/22/1981         1250
1004       JONY               MANAGER          1009     4/2/1981            2975
1005       MATT             SALESMAN         1006      9/28/1981         1250
1006       RAO                MANAGER           1009     5/1/1981            2850
1007       CADD             MANAGER           1009     6/9/1981            2450
1008       SUNITH          ANALYST            1004     12/9/1982          3000
1009       KING               PRESIDENT                      11/17/1981        5000
1010       GEORGE         SALESMAN         1006     9/8/1981            1500
1011       DAVI               CLERK                 1008     1/12/1983           1100
1012       JAMES            CLERK                 1006     12/3/1981           950
1013       FARUQ           ANALYST            1004     12/3/1981          3000
1014       MIKE              CLERK                  1007     1/23/1982          1300

Variable Record Format would like below where the data fields are separated by a delimiter.
Note: The Delimiter can be anything you like. In this case it is “|”

1196700|9|0|692.64
1378901|2|3900|488.62
1418700|2|2320|467.92
1418702|14|8740|4056.36
1499100|1|0|3.68
1632800|3|0|1866.66
1632900|1|70|12.64
1637600|50|0|755.5

Structure of a Control file:
Sample CTL file for loading a Variable record data file:

OPTIONS (SKIP = 1)                                 --The first row in the data file is skipped without loading
LOAD DATA
INFILE ‘$FILE’                                         -- Specify the data file  path and name
APPEND                                                    -- type of loading (INSERT, APPEND, REPLACE, TRUNCATE
INTO TABLE “APPS”.”BUDGET”         -- the table to be loaded into
FIELDS TERMINATED BY ‘|’                -- Specify the delimiter if variable format datafile
OPTIONALLY ENCLOSED BY ‘”‘        --the values of the data fields may be enclosed in “
TRAILING NULLCOLS                          --columns that are not present in the record treated as null
(ITEM_ID             “TRIM(:ITEM_ID)”,  -- Can use all SQL functions on columns
QTY                        DECIMAL EXTERNAL,
REVENUE             DECIMAL EXTERNAL,
EXT_COST            DECIMAL EXTERNAL TERMINATED BY WHITESPACE “(TRIM(:EXT_COST))”  ,
MONTH                “to_char(LAST_DAY(ADD_MONTHS(SYSDATE,-1)),’DD-MON-YY’)” ,
DIVISION_CODE       CONSTANT “AUD”  -- Can specify constant value instead of Getting value from datafile
)

OPTION statement precedes the LOAD DATA statement. The OPTIONS parameter allows you to specify runtime arguments in the control file, rather than on the command line. 

The following arguments can be specified using the OPTIONS parameter.

  • SKIP = n             -- Number of logical records to skip (Default 0)
  • LOAD = n          --Number of logical records to load (Default all)
  • ERRORS = n      --Number of errors to allow (Default 50)
  • ROWS = n  -      - Number of rows in conventional path bind array or between direct path data saves (Default: Conventional Path 64, Direct path all)
  • BINDSIZE = n   -- Size of conventional path bind array in bytes (System-dependent default)
  • SILENT = {FEEDBACK | ERRORS | DISCARDS | ALL} — Suppress messages during run (header, feedback, errors, discards, partitions, all)
  • DIRECT = {TRUE | FALSE} –Use direct path (Default FALSE)
  • PARALLEL = {TRUE | FALSE} — Perform parallel load (Default FALSE)  
LOAD DATA statement is required at the beginning of the control file.
INFILE: INFILE keyword is used to specify location of the datafile or datafiles.
INFILE* specifies that the data is found in the control file and not in an external file. INFILE
‘$FILE’, can be used to send the filepath and filename as a parameter when registered as a concurrent program.
INFILE   ‘/home/Ooracle/ad/import.csv’ specifies the filepath and the filename.

Example where datafile is an external file:

LOAD DATA
INFILE   ‘/home/Ooracle/ad/import.csv’
INTO TABLE emp_details
FIELDS TERMINATED BY “,”
( emp_num, emp_name, department_num, department_name )

Example where datafile is in the Control file:

LOAD DATA
INFILE *
INTO TABLE emp_details
FIELDS TERMINATED BY “,”
( emp_num, emp_name, department_num, department_name )
BEGINDATA
1001,ARUN,1013,Accounting
1002,AMIT,1006,Sales
1003,WARD,1006,Accounting
1004,JONY,1009,Sales
1005,MATT,1006,Accounting

Example where file name and path is sent as a parameter when registered as a concurrent program

LOAD DATA
INFILE ‘$FILE’
INTO TABLE emp_details
FIELDS TERMINATED BY “,”


( emp_num, emp_name, department_num, department_name )

TYPE OF LOADING:
  • INSERT   — If the table you are loading is empty, INSERT can be used.
  • APPEND  — If data already exists in the table, SQL*Loader appends the new rows to it. If data doesn’t already exist, the new rows are simply loaded. 
  • REPLACE — All rows in the table are deleted and the new data is loaded
  • TRUNCATE — SQL*Loader uses the SQL TRUNCATE command.



INTO TABLE is required to identify the table to be loaded into. In the above example INTO TABLE “APPS”.”BUDGET”
APPS refers to the Schema and BUDGET is the Table name.
FIELDS TERMINATED BY specifies how the data fields are terminated in the datafile.(If the file is Comma delimited or Pipe delimited etc)
OPTIONALLY ENCLOSED BY ‘”‘ specifies that data fields may also be enclosed by quotation marks.
TRAILINGNULLCOLS clause tells SQL*Loader to treat any relatively positioned columns that are not present in the record as null columns.

Loading a fixed format data file:
LOAD DATA
INFILE ‘sample.dat’
INTO TABLE emp
(     empno         POSITION(01:04)   INTEGER EXTERNAL,
      ename          POSITION(06:15)   CHAR,
      job               POSITION(17:25)   CHAR,
      mgr              POSITION(27:30)   INTEGER EXTERNAL,
      sal                POSITION(32:39)   DECIMAL EXTERNAL,
      comm           POSITION(41:48)   DECIMAL EXTERNAL,
      deptno          POSITION(50:51)   INTEGER EXTERNAL)

Steps to Run the SQL* LOADER from UNIX:
At the prompt, invoke SQL*Loader as follows: 

sqlldr USERID= scott/tiger CONTROL=  <control filename> LOG = <Log file name>

Register as concurrent Program:
  • Place the Control file in $CUSTOM_TOP/bin.
  • Define the Executable. Give the Execution Method as SQL*LOADER.
  • Define the Program. Add the Parameter for FILENAME.


Default path for Discard, bad and log files
If bad and discard file paths are not specified in the CTL file and if this SQL Loader is registered as a concurrent program, then they will be created in the directory where the regular Concurrent programs’ output files reside. You can also find the paths where the discard and bad files have been created in the log file of the SQL LOADER concurrent request.

Skip columns:
You can skip columns using the ‘FILLER’ option.
Load Data
--
--
TRAILING  NULLCOLS
(
name Filler,
Empno ,
sal
)

here the column name will be skipped.

Load a multiple files into single table:

SQL LOADER lets you load multiple data files at once into a single table. But all the data files should be of the same format. 

Here is a example:
Say you have a table named EMP which has the below structure: 
Column                Data Type
emp_num                 Number
emp_name         Varchar2(25)
department_num Number
department_name Varchar2(25)

You are trying to load the below comma delimited data files named eg.dat and eg1.dat:

eg.dat:
1001,ARUN,1013,Accounting
1002,AMIT,1006,Sales
1003,WARD,1006,Accounting
1004,JONY,1009,Sales
1005,MATT,1006,Accounting

eg1.dat:
1234,Tom,2345,Accounting
3456,Berry,8976,Accounting

The Control file should be built as below:

LOAD DATA
INFILE ‘eg.dat’    — File 1
INFILE ‘eg1.dat’  — File 2
APPEND
INTO TABLE emp
FIELDS TERMINATED BY “,”
( emp_num, emp_name, department_num, department_name )

Load a single file into multiple tables:

SQL Loader lets you load a single data file into multiple tables using “INTO TABLE” clause. 

Here is a example:
Say you have two tables named EMP and DEPT which have the below structure: 
Table Column                 Data Type
EMP emp_num                 Number
EMP emp_name         Varchar2(25)
DEPT department_num Number
DEPT department_name Varchar2(25)

You are trying to load the below comma delimited data file named eg.dat which has columns Emp_num and emp_name that need to be loaded into table EMP and columns department_num and department_name that need to be loaded into table DEPT using a single CTL file here.

eg.dat:
1001,ARUN,1013,Accounting
1002,AMIT,1006,Sales
1003,WARD,1006,Accounting
1004,JONY,1009,Sales
1005,MATT,1006,Accounting

The Control file should be built as below:

LOAD DATA
INFILE ‘eg.dat’
APPEND
INTO TABLE emp
FIELDS TERMINATED BY “,”
( emp_num, emp_name )
INTO TABLE dept
FIELDS TERMINATED BY “,”
(department_num, department_name)

You can further use WHEN clause to selectively load the records into the tables which will be explained later in this article.

 Skip a column while loading using “FILLER” and Load field in the delimited data file into two different columns in a table using “POSITION”
SQL LOADER also lets you load the same field into two different columns of the table. 

If the data file is position based, loading the same field into two different columns is pretty straight forward. You can use Position (start_pos:end_pos) keyword
If the data file is a delimited file and it has a header included in it, then this can be achieved by referring the field preceded with “:” eg description “(:emp_name)”.
If the data file is delimited file without a header included in it, Position (start_pos:end_pos) or “(:field)” will not work. This can be achieved using POSITION (1) clause which takes you to the beginning of the record. 

Example:
The requirement here is to load the field emp_name in the data field into two columns – emp_name and description of the table EMP.

Say you have a table named EMP which has the below structure: 
Column                Data Type
emp_num                Number
emp_name        Varchar2(25)
description        Varchar2(25)
department_num Number
department_name Varchar2(25)

You are trying to load the below comma delimited data file named eg.dat which has 4 fields that need to be loaded into 5 columns of the table EMP.

eg.dat:
1001,ARUN,1013,Accounting
1002,AMIT,1006,Sales
1003,WARD,1006,Accounting
1004,JONY,1009,Sales
1005,MATT,1006,Accounting

Control File:

LOAD DATA
INFILE ‘eg.dat’
APPEND
INTO TABLE emp
FIELDS TERMINATED BY “,”
(emp_num, 
emp_name, 
desc_skip FILLER POSITION(1),
description,
department_num, 
department_name)

Explanation on how SQL LOADER processes the above CTL file:
  • The first field in the data file is loaded into column emp_num of table EMP
  • The second field in the data file is loaded into column emp_name of table EMP
  • The field desc_skip enables SQL LOADER to start scanning the same record it is at from the beginning because of the clause POSITION(1) . SQL LOADER again reads the first delimited field and skips it as directed by “FILLER” keyword. 
  • Now SQL LOADER reads the second field again and loads it into description column of the table EMP.
  • SQL LOADER then reads the third field in the data file and loads into column department_num of table EMP
  • Finally the fourth field is loaded into column department_name of table EMP.


Usage of BOUNDFILLER 
BOUNDFILLER is available with Oracle 9i and above and can be used if the skipped column’s value will be required later again.

Example: 
The requirement is to load first two fields concatenated with the third field as emp_num into table emp and Fourth field as Emp_name 

Data File:
1,15,1001,ARUN
1,15,1002,AMIT
1,15,1003,WARD
1,18,1004,JONY
1,20,1005,MATT

The requirement can be achieved using the below Control File:

LOAD DATA
INFILE ‘C:\eg.dat’
APPEND
INTO TABLE EMP
FIELDS TERMINATED BY “,” 
(
Rec_skip BOUNDFILLER, 
tmp_skip BOUNDFILLER, 
Emp_num “(:Rec_skip||:tmp_skip||:emp_num)”,
Emp_name 
)

 Load the same record twice into a single table:

SQL Loader lets you load record twice using POSITION clause but you have to take into account whether the constraints defined on the table allow you to insert duplicate rows.

Below is the Control file:

LOAD DATA
INFILE ‘eg.dat’
APPEND
INTO TABLE emp
FIELDS TERMINATED BY “,”
( emp_num, emp_name, department_num, department_name )
INTO TABLE emp
FIELDS TERMINATED BY “,”
(emp_num POSITION(1),emp_name,department_num,department_name)

SQL LOADER processes the above control file this way:

  • First “INTO TABLE” clause loads the 4 fields specified in the first line of the data file into the respective columns (emp_num, emp_name, department_num, department_name)
  • Field scanning does not start over from the beginning of the record when SQL LOADER encounters the second INTO TABLE clause in the CTL file. Instead, scanning continues where it left off. Statement “emp_num POSITION(1)” in the CTL file forces the SQL LOADER to read the same record from the beginning and loads the first field in the data file into emp_num column again. 
  • The remaining fields in the first record of the data file are again loaded into respective columns emp_name, department_num, department_name. Thus the same record can be loaded multiple times into the same table using “INTO TABLE” clause.


Using WHEN to selectively load the records into the table

WHEN clause can be used to direct SQL LOADER to load the record only when the condition specified in the WHEN clause is TRUE. WHEN statement can have any number of comparisons preceded by AND. 
SQL*Loader does not allow the use of OR in the WHEN clause. 

Here is a example which illustrates how to load the records into 2 tables EMP and DEPT based on the record type specified in the data file
.
The below is delimited data file eg.dat which has the first field as the record type. The requirement here is to load all the records with record type = 1 into table EMP and all the records with record type = 2 into table DEPT and record with record type =3 which happens to be the trailer record should not be loaded. 

1,1001,ARUN
2,1013,Accounting
1,1002,AMIT
2,1006,Sales
1,1003,WARD
2,1006,Accounting
1,1004,JONY
2,1009,Sales
1,1005,MATT
2,1006,Accounting
3,10

Control File: 

LOAD DATA
INFILE ‘eg.dat’
APPEND
INTO TABLE emp
WHEN (01) = ’1′
FIELDS TERMINATED BY “,”
( rec_skip filler POSITION(1),emp_num , emp_name ) 
INTO TABLE dept
WHEN (01) = ’2′
FIELDS TERMINATED BY “,”
(rec_skip filler POSITION(1),department_num, 
department_name )

Let’s now see how SQL LOADER processes the CTL file:
  • SQL LOADER loads the records into table EMP only when first position (01) of the record which happens to be the record type is ’1′ as directed by command INTO TABLE emp WHEN (01) = ’1′
  • If condition When (01) = ’1′ holds true for the current record, then SQL LOADER gets to the beginning of the record as directed by command POSITION(1) and skips the first field which is the record type.
  • It then loads the second field into emp_num and third field into emp_name column in the table EMP.
  • SQL LOADER loads the records into table DEPT only when first position (01) of the record which happens to be the record type is ’2′ as directed by the commands - INTO TABLE dept WHEN (01) = ’2′
  • If condition When (01) = ’2′ holds true for the current record, then SQL LOADER gets to the beginning of the record as directed by command POSITION(1) and skips the first field which is the record type.
  • It then loads the second field into department_num and third field into department_name columns in the table DEPT.
  • The records with record type = ’3′ are not loaded into any table. 


Thus you can selectively loads the necessary records into various tables using WHEN clause.

No comments:

Post a Comment