In this post we are going to discuss about commonly used Oracle database queries for manipulate data mainly grouping and merging. If the database structure is normalized and when it comes to visualization of data, it is required to group data based on their common characteristics. Similarly in most of the database systems including Oracle, Merging is the functionality provided in order to migrate data between two databases, schema or tables.
Let's create a simple database structure as follows. We can use a tool like SQL Fiddle to run queries and see how it works. Make sure you have selected Oracle 11g R2 as DB type before running any queries.
1
2
3
4
5
6
| CREATE TABLE SEMISTER_GPA( STUDENT_ID VARCHAR2(8), SEMISTER_ID VARCHAR2(4), GPA NUMBER(5,4), PRIMARY KEY (STUDENT_ID,SEMISTER_ID) ); |
Adding some values to above tables.
1
2
3
4
5
6
7
8
9
10
11
12
| INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000001' , 'L1S1' ,3.5245); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000002' , 'L1S1' ,3.6100); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000003' , 'L1S1' ,3.0589); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000001' , 'L1S2' ,3.7241); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000002' , 'L1S3' ,3.8180); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000003' , 'L1S2' ,3.2589); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000002' , 'L1S4' ,3.8675); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000002' , 'L2S1' ,3.2369); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000003' , 'L1S4' ,3.6523); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000001' , 'L1S4' ,3.2675); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000003' , 'L2S1' ,3.4389); INSERT INTO SEMISTER_GPA(STUDENT_ID,SEMISTER_ID,GPA) VALUES ( 'ST000003' , 'L2S2' ,3.0523); |
See in SQL Fiddle.
01. Simple grouping: Get the no. of semesters each student has completed.
1
| SELECT STUDENT_ID, COUNT (*) FROM SEMISTER_GPA GROUP BY STUDENT_ID; |
02. DENSE_RANK()and PARTITION BY
'PARTITION BY' is used partition a table into different segments which makes easier to manipulate each segment individually. Here 'SEMISTER_GPA' table can be partition by STUDENT_ID. In order to get the row which has the highest 'GPA' for each student, each segment is ordered by 'GPA' in descending order. Then the row which has the DENSE_RANK()=1 gives the expected result.
1
2
3
| SELECT DT.STUDENT_ID,DT.SEMISTER_ID, DT.GPA FROM ( SELECT STUDENT_ID,SEMISTER_ID,GPA, DENSE_RANK() OVER (PARTITION BY STUDENT_ID ORDER BY GPA DESC ) RNK FROM SEMISTER_GPA ) DT WHERE (DT.RNK = 1 ) |
03. LISTAGG () function
LISTAGG () function gives a single row, concatenated using specified column within each group ordered. To get the list of semesters each student has completed, we can group table using STUDENT_ID order each group by GPA and concatenate column values using above function.
1
| SELECT STUDENT_ID, LISTAGG(SEMISTER_ID, ',' ) WITHIN GROUP ( ORDER BY SEMISTER_ID) AS SEMISTER_LIST FROM SEMISTER_GPA GROUP BY STUDENT_ID |
Below mentioned is another sample data set. Timezone data is stored in two tables TIME_ZONE andTIMEZONE_DESCRIPTIONS. In order to migrate data into a single denormalized table 'MERGE INTO .. USING.. ' statement can be used. Merging can handle insert or update unconditionally without knowing it's presence in target table. This is a more flexible way of moving large amounts of data extracted from one or more tables. See in SQL Fiddle.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| CREATE TABLE TIME_ZONE ( TIMEZONE_ID NUMBER(3) NOT NULL , TIMEZONE_OFFSET VARCHAR2(10), MODIFIED_DATE DATE DEFAULT SYSDATE NOT NULL , PRIMARY KEY (TIMEZONE_ID) ); CREATE TABLE TIMEZONE_DESCRIPTIONS ( TIMEZONE_ID NUMBER(3) NOT NULL , LANGUAGE_ID VARCHAR2(2) DEFAULT 'EN' NOT NULL , DESCRIPTION NVARCHAR2(100), MODIFIED_DATE DATE DEFAULT SYSDATE NOT NULL , PRIMARY KEY (TIMEZONE_ID,LANGUAGE_ID) ); Insert into TIME_ZONE (TIMEZONE_ID,TIMEZONE_OFFSET,MODIFIED_DATE) values (1, '+0300' ,SYSDATE); Insert into TIME_ZONE (TIMEZONE_ID,TIMEZONE_OFFSET,MODIFIED_DATE) values (2, '+0200' ,SYSDATE); Insert into TIME_ZONE (TIMEZONE_ID,TIMEZONE_OFFSET,MODIFIED_DATE) values (3, '+0400' ,SYSDATE); Insert into TIME_ZONE (TIMEZONE_ID,TIMEZONE_OFFSET,MODIFIED_DATE) values (5, '-0400' ,SYSDATE); Insert into TIME_ZONE (TIMEZONE_ID,TIMEZONE_OFFSET,MODIFIED_DATE) values (6, '+0400' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (1, 'EN' , 'Saudi Arabia' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (2, 'EN' , 'Egypt' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (3, 'EN' , 'UAE' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (5, 'EN' , 'USA' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (6, 'EN' , 'Oman' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (1, 'FR' , 'Arabie Saoudite' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (2, 'FR' , 'Egypte' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (3, 'FR' , 'Émirats arabes unis' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (5, 'FR' , 'USA' ,SYSDATE); Insert into TIMEZONE_DESCRIPTIONS (TIMEZONE_ID,LANGUAGE_ID,DESCRIPTION,MODIFIED_DATE) values (6, 'FR' , 'Oman' ,SYSDATE); |
Denormalized table structure.
1
2
3
4
5
6
7
| CREATE TABLE TIME_ZONE_MASTER ( TIMEZONE_ID NUMBER(3) NOT NULL , TIMEZONE_OFFSET VARCHAR2(10), DESCRIPTION NVARCHAR2(100), MODIFIED_DATE DATE DEFAULT SYSDATE NOT NULL , PRIMARY KEY (TIMEZONE_ID) ); |
Select data set from TIMEZONE_DESCRIPTIONS table to join with TIME_ZONE table. In the target table, language wise descriptions will be stored in the same row according to a predefined format.
1
2
3
4
5
6
7
8
9
| SELECT X.TIMEZONE_ID, MAX (X.MODIFIED_DATE) AS MODIFIED_DATE, 'EN:' || MAX (DECODE(X.LANGUAGE_ID, 'EN' , X.DESCRIPTION)) || ',FR:' || MAX (DECODE(X.LANGUAGE_ID, 'FR' , X.DESCRIPTION)) TIME_ZONE_DESC FROM ( SELECT TZD.TIMEZONE_ID, TZD.LANGUAGE_ID, TZD.DESCRIPTION,TZD.MODIFIED_DATE FROM TIMEZONE_DESCRIPTIONS TZD ) X GROUP BY X.TIMEZONE_ID; |
To get the full select statement, join with TIME_ZONE table using TIMEZONE_ID.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| SELECT tz.TIMEZONE_ID, tz.TIMEZONE_OFFSET, v.TIME_ZONE_DESC, GREATEST(v.MODIFIED_DATE,tz.MODIFIED_DATE) AS MODIFIED_DATE FROM TIME_ZONE tz LEFT OUTER JOIN ( SELECT x.TIMEZONE_ID, MAX (x.MODIFIED_DATE) AS MODIFIED_DATE, 'EN:' || MAX (DECODE(x.language_id, 'EN' , x.DESCRIPTION)) || ',FR:' || MAX (DECODE(x.language_id, 'FR' , x.DESCRIPTION)) TIME_ZONE_DESC FROM ( SELECT tzd.TIMEZONE_ID, tzd.LANGUAGE_ID, tzd.DESCRIPTION, tzd.MODIFIED_DATE FROM TIMEZONE_DESCRIPTIONS tzd ) x GROUP BY x.TIMEZONE_ID ) v ON (tz.TIMEZONE_ID = v.TIMEZONE_ID); |
Merge statement syntax,
1
2
3
4
5
6
7
8
9
10
| MERGE INTO TARET_TABLE USING ( SELECT ) ON ( JOIN ) WHEN MATCHED THEN UPDATE SET WHEN NOT MATCHED THEN INSERT () VALUES (); |
Finally, we can create the full merge statement to update TIME_ZONE_MASTER as follows.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
| MERGE INTO TIME_ZONE_MASTER t USING ( SELECT tz.TIMEZONE_ID, tz.TIMEZONE_OFFSET, v.TIME_ZONE_DESC, GREATEST(v.MODIFIED_DATE,tz.MODIFIED_DATE) AS MODIFIED_DATE FROM TIME_ZONE tz LEFT OUTER JOIN ( SELECT x.TIMEZONE_ID, MAX (x.MODIFIED_DATE) AS MODIFIED_DATE, 'EN:' || MAX (DECODE(x.language_id, 'EN' , x.DESCRIPTION)) || ',FR:' || MAX (DECODE(x.language_id, 'FR' , x.DESCRIPTION)) TIME_ZONE_DESC FROM ( SELECT tzd.TIMEZONE_ID, tzd.LANGUAGE_ID, tzd.DESCRIPTION, tzd.MODIFIED_DATE FROM TIMEZONE_DESCRIPTIONS tzd ) x GROUP BY x.TIMEZONE_ID ) v ON (tz.TIMEZONE_ID = v.TIMEZONE_ID) ) td ON (t.TIMEZONE_ID = td.TIMEZONE_ID) WHEN MATCHED THEN UPDATE SET t.DESCRIPTION = td.TIME_ZONE_DESC, t.TIMEZONE_OFFSET = td.TIMEZONE_OFFSET, t.MODIFIED_DATE = td.MODIFIED_DATE WHEN NOT MATCHED THEN INSERT ( TIMEZONE_ID, TIMEZONE_OFFSET, DESCRIPTION, MODIFIED_DATE ) VALUES ( td.TIMEZONE_ID, td.TIMEZONE_OFFSET, td.TIME_ZONE_DESC, td.MODIFIED_DATE ); |
Final outcome would be like this.