Tuesday, April 7, 2015

PL/SQL Interview Questions


What are ACID properties of database ?

Atomicity: It refers to the ability of the database to guarantee that either all of the tasks of a transaction are performed or none of them are. Each transaction is said to be atomic if when one part of the transaction fails, the entire transaction fails.

Consistency: The consistency property ensures that the database remains in a consistent state before the start of the transaction and after the transaction is over (whether successful or not). It avoids inconsistent view such as inventory falls below 0. Weak consistency aka eventual consistency is were the database eventually reaches a consistent state. Such database is replicated were the latest version is sitting somewhere in the cluster while older versions are still out there.

Isolation: It refers that other operations cannot access or see the data in an intermediate state during a transaction. It ensures the consistency between transactions as well as maintains their performance. Hence each transaction is unaware of another transactions executing concurrently in the system.

Durability: It refers to the guarantee that once the user has been notified of success, the transaction will persist and cannot be undone. It will survive system failures, and that the database system has checked the integrity constraints and won't need to abort the transaction. Many databases implement durability by writing all transactions into a transaction log that can be played back to recreate the system state right before a failure.

We have a table containing email_Ids with duplicates allowed. Write a query to get all the email_Ids having duplicates entries ? (Note: We need to get unique email_Ids, but all of which occur more than once in the table).
Select email_ids, count(email_ids) from emails group by email_ids having count (email_ids) > 1;

What is the difference between Inner and Outer Join ?
Inner join: An inner join of A and B gives the result of A intersect B, i.e. the inner part of a venn diagram intersection. Inner join gives the intersection of the two tables, i.e. the two rows they have in common.
Select * from A INNER JOIN B on A.id = B.id;
Select * from A, B where A.id = B.id;

How to select a column depending on the value of another column ?

The CASE statement is used to return a value based on some conditions similar to an IF-THEN-ELSE statement. Once a condition is true, it will stop reading and return the result. If no conditions are true, it will return the value in the ELSE clause. If there is no ELSE part and no conditions are true, it returns NULL.
SELECT CASE <variable> WHEN <value>      THEN <returnvalue>
   WHEN <othervalue> THEN <returnthis>
   ELSE <returndefaultcase>
   END as something
FROM <table>
Example:
CREATE TABLE Players (
    game_id int,
    blue_player_id int,
    red_player_id int,
    category varchar(100)
);

SELECT CASE WHEN category ='blue'
then blue_player_id else red_player_id 
end as player_id 
FROM Players 
WHERE game_id = 0

Outer join: An outer join of A and B gives the results of A union B, i.e. the outer parts of a Venn diagram union.

Left outer join: A left outer join will give all rows in A, plus any common rows in B.
select * from A LEFT OUTER JOIN B on A.id = B.id;

Full outer join: A full outer join will give us the union of A and B, i.e. all the rows in A and all the rows in B. If something in A doesn't have a corresponding datum in B, then the B portion is null, and vice versa.
Select * from A FULL OUTER JOIN B on A.id = B.id;

INNER JOIN: Returns all rows when there is at least one match in BOTH tables
LEFT JOIN: Return all rows from the left table, and the matched rows from the right table
RIGHT JOIN: Return all rows from the right table, and the matched rows from the left table
FULL JOIN: Return all rows when there is a match in ONE of the tables




If an Employee table consists of columns Employee Number, Employee Name and Manager Number then what would be the query to find managers for all the employees ?
Select e.emp_name as employee, m.emp_name as manager from Employee e, Employee m where e.manager = m.emp_number

What is Self Join and Why is it required ?
Self Join is the act of joining one table with itself. Self Join is often very useful to convert a hierarchical structure into a flat structure. Consider were an employee table which has the employee id and his corresponding manager id in the same row as that of the employee (employee-manager hierarchy). Hence in order to display out the names of the manager of each employee right beside the employee, we can use self join as shown above.

What is the difference between UNION and UNION ALL ?
UNION and UNION ALL both unify for add two structurally similar data sets, but UNION operation returns only the unique records from the resulting data set whereas UNION ALL will return all the rows, even if one or more rows are duplicated to each other.

If two tables have have no common fields, how to combine their result sets ?
A cartesian product will match every row in the first table with every row in the second, resulting lots of duplicated information:
Select * from table1, table2;

A union can be used to output the data, although not side-by-side, and column types should be made compatible between the two selects, either by making the table columns compatible or coercing them in the select.
Select id as pid, desc, '' as cid, '' as name from table1
    UNION
Select '' as pid, '' as desc, id as cid, name from table2;

Student table contains, Student id, Course id and Grade. Find all the students with average grade greater than 50 ?
Having clause is like a Where clause, except that it applies to the results of a GROUP BY query. It can be used to select groups which satisfy a given condition.
Select student_id, AVG(grade) AS Average From Student GROUP BY student_id HAVING AVG(grade) > 50

Consider Students table and Address table. Student table contains student id and student name, while Address table contains student id as the foreign key and address. Write query to find the names of all the students who don’t have any address ?
SELECT student_name FROM Students WHERE student_id NOT IN (SELECT student_id FROM address); 

How to retrieve a second highest salary from a employees table containing duplicate values ?
SELECT max(salary) FROM Employee WHERE salary NOT IN (SELECT max(salary) FROM Employee);

SELECT max(salary) FROM Employee WHERE salary < (SELECT max(salary) FROM Employee);

SELECT max(salary) FROM Employee WHERE salary <> (SELECT max(salary) FROM Employee);

Write a SQL query to find the Nth highest salary ?
SELECT * FROM Employee Emp1 WHERE (N-1) = ( 
  SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > Emp1.Salary 
 )

What is the difference between a Function and a Procedure?

  • Procedures are pre-compile objects which are compiled for first time and its compiled format is saved which executes (compiled code) whenever it is called. But Function is compiled and executed every time when it is called.
  • Function must return a value but in Stored Procedure it is optional (Procedure can return zero or n values).
  • Functions can have only input parameters for it whereas Procedures can have input/output parameters.
  • Functions can be called from Procedure whereas Procedures cannot be called from Function.
  • Procedure allows SELECT as well as DML(INSERT/UPDATE/DELETE) statement in it whereas Function allows only SELECT statement in it.
  • Procedures can not be utilized in a SELECT statement whereas Function can be embedded in a SELECT statement.
  • Stored Procedures cannot be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section whereas Function can be.
  • Functions that return tables can be treated as another rowset. This can be used in JOINs with other tables.
  • Inline Function can be though of as views that take parameters and can be used in JOINs and other Rowset operations.
  • Exception can be handled by try-catch block in a Procedure whereas try-catch block cannot be used in a Function.
  • We can go for Transaction Management in Procedure whereas we can't go in Function.

Explain normalization of the Database and different types of normalization? 
Database normalization helps to avoid redundant data, ensures representation of attribute relationships and facilitates the checking of updates for violation of integrity constraints.

First normal form
: A relational schema R is in first normal form if the domains of all attributes of R are atomic (or rather, are treated atomically).
The information is stored in the table with primary key to uniquely identify each row. Each column contains atomic values, and there are not repeating groups of columns. Tables in first normal form cannot contain sub columns. The repeating groups of columns are represented as separate rows in another table linked by the foreign key matching its primary key.

Second normal form: A table is in 2nd Normal Form if it is in 1st normal form, and all the non-key columns are dependent on the table’s primary key. The primary key provides a means to uniquely identify each row in a table. Each column in the table should serve to describe what the primary key identifies, thus narrowing the table to a single purpose.

Third normal form: A table is in third normal form if it is in 2nd normal form, and it contains only columns that are non-transitively dependent on the primary key.
A column in a table may be related to others, through a second column which makes it transitive. When a column has a dependence on another column, we mean that the value can be derived from the other. Hence transitive dependence means a column’s value relies upon another column through a second intermediate column.

Consider three columns:  AuthorNationality, Author, and Book.  Column values for AuthorNationality and Author rely on the Book; once the book is known, you can find out the Author or AuthorNationality.  But also notice that the AuthorNationality relies upon Author.  That is, once you know the Author, you can determine their nationality.  In this sense then, the AuthorNationality relies upon Book, via Author.  This is a transitive dependence.

To be non-transitively dependent, then, means that all the columns are dependent on the primary key (a criteria for 2nd normal form) and no other columns in the table.

Boyce–Codd normal form: A relational schema R is in Boyce–Codd normal form (BCNF) if, for every one of its dependencies X -> Y, one of the following conditions holds true:

X -> Y is a trivial functional dependency (i.e., Y is a subset of X)
X is a superkey for schema R

Informally the Boyce-Codd normal form is expressed as "Each attribute must represent a fact about the key, the whole key, and nothing but the key.”

What is Explain Plan and How it is used to optimize the queries ?
The explain plan is used to show the query execution plan without actually running the query. The query execution plan is a set of methods on how the database will access the data from the tables. Execution Plan depends on the type of query optimizer used by the database, Rule based (RBO) or Cost based (CBO). The cost based optimizer takes all the related table statistics into consideration while the Rule based optimizer generates the query plan using a  fixed set of pre-defined rules. Explain plans enables to check following things:
  • Ensure that the tables will be joined in optimal order.
  • Determine the most restrictive indexes to fetch the rows.
  • Determine the best internal join method to use (e.g. nested loops, hash join).
  • Determine that the SQL is executing the steps in the optimal order.
The Oracle query optimizer takes into account following components to calculate or estimate the optimal execution plan:

Cardinality: The cardinality is the estimated number of rows that will be returned by each operation. Cardinality can be determined by using a simple SELECT COUNT(*) query for each tables used in the query and applying any WHERE clause predicates belonging to that table in the query.

Access Method: The access method or access path shows how the data will be accessed from each table (or index).  Oracle supports nine common access methods:
  • Full table scan - Reads all rows from a table and filters out those that do not meet the where clause predicates. A full table scan will use multi block IO (typically 1MB IOs). A full table scan is selected if a large portion of the rows in the table must be accessed, no indexes exist or the ones present can’t be used or if the cost is the lowest.
  • Table access by ROWID – The rowid of a row specifies the data file, the data block within that file, and the location of the row within that block. Oracle first obtains the rowids either from a WHERE clause predicate or through an index scan of one or more of the table's indexes. Oracle then locates each selected row in the table based on its rowid and does a row-by-row access.
  • Index unique scan – Only one row will be returned from the scan of a unique index. It will be used when there is an equality predicate on a unique (B-tree) index or an index created as a result of a primary key constraint.
  • Index range scan – Oracle accesses adjacent index entries and then uses the ROWID values in the index to retrieve the corresponding rows from the table. An index range scan can be bounded or unbounded. It will be used when a statement has an equality predicate on a non-unique index key, or a non-equality or range predicate on a unique index key. (=, <, >,LIKE if not on leading edge). Data is returned in the ascending order of index columns. 
  • Index skip scan - Normally, in order for an index to be used, the prefix of the index key (leading edge of the index) would be referenced in the query. However, if all the other columns in the index are referenced in the statement except the first column, Oracle can do an index skip scan, to skip the first column of the index and use the rest of it. This can be advantageous if there are few distinct values in the leading column of a concatenated index and many distinct values in the non-leading key of the index.
  • Full Index scan - A full index scan does not read every block in the index structure, contrary to what its name suggests. An index full scan processes all of the leaf blocks of an index, but only enough of the branch blocks to find the first leaf block. It is used when all of the columns necessary to satisfy the statement are in the index and it is cheaper than scanning the table.
  • Fast full index scan - This is an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has the NOT NULL constraint. It cannot be used to eliminate a sort operation, because the data access does not follow the index key. It will also read all of the blocks in the index using multiblock reads, unlike a full index scan.
  • Bitmap Index – A bitmap index uses a set of bits for each key values and a mapping function that converts each bit position to a rowid. Oracle can efficiently merge bitmap indexes that correspond to several predicates in a WHERE clause, using Boolean operations to resolve AND and OR conditions. 
Join method: The join method describes how data from two data producing operators will be joined together. The join methods in the SQL statement can be identified using the operations column in the explain plan. Oracles offers several join methods and join types.
  • Hash Joins - It is used for joining large data sets. The optimizer uses the smaller of the two tables or data sources to build a hash table, based on the join key, in memory. It then scans the larger table, and performs the same hashing algorithm on the join column(s). It then probes the previously built hash table for each value and if they match, it returns a row.
  • Nested Loops joins - They are useful when small subsets of data are being joined and if there is an efficient way of accessing the second table. For every row in the outer table, Oracle accesses all the rows in the inner table similar like two embedded FOR loops.
  • Sort Merge joins – Sort merge joins are useful when the join condition between two tables is an inequality condition such as, <, <=, >, or >=. Sort merge joins can perform better than nested loop joins for large data sets. The join consists of two steps, Sort join operation and Merge join operation.In sort join operation both the inputs are sorted on the join key, then in merge join operation the sorted lists are merged together. Sort merge joins are mostly chosen when there is an index on one of the tables that will eliminate one of the sorts.
  • Cartesian join - In this join every row from one data source is joined with every row from the other data source, creating a Cartesian product of the two sets. Typically this is only chosen if the tables involved are small or if one or more of the tables does not have a join conditions to any other table in the statement. Cartesian joins are not common used and signals a sign of problem with cardinality estimates.
Join Types: Oracle offers several join types: inner join, (left) outer join, full outerjoin, anti join, semi join, grouped outer join, etc. The join order is determined based on cost, which is strongly influenced by the cardinality estimates and the access paths available.

Join Order: The join order is the order in which the tables are joined together in a multi-table SQL statement. The join order is determined in an execution plan by checking the indentation of the tables in the operation column (using preorder traversal). The first statement to be executed is the one that has the most indentation. Also if two statements appear at the same level of indentation, the top statement is executed first. Equally indented tables are joined first followed by the tables with more indentations in operation column of the execution plan. The join order is determined based on cost, which is strongly influenced by the cardinality estimates and the access paths available. In general joins that result in at most one row always go first.

Partitioning: Partitioning allows a table, index or index-organized table to be subdivided into smaller pieces. Each piece of the database object is called a Partition. Partition pruning or Partition elimination is the simplest means to improve performance using Partitioning. For example, if an application has an ORDERS table that contains a record of all orders for the last 2 years, and this table has been partitioned by day, a query requesting orders for a single week would only access seven partitions of the ORDERS table instead of 730 partitions (the entire table). Partition pruning is visible in an execution plan in the PSTART and PSTOP columns. The PSTART column contains the number of the first partition that will be accessed and PSTOP column contains the number of the last partition that will be accessed.

Autotrace is another command which can be configured by "set autotrace on", to run the SQL query and provide the query plan along with statistics. Autotrace actually executes the query and automatically queries the plan table, whereas EXPLAIN PLAN does neither.

How to check the performance of stored procedure and fine tune ?
The SQL Trace facility and TKPROF let’s accurately assess the efficiency of the SQL statements when an application runs. For best results, these tools are used with EXPLAIN PLAN rather than using EXPLAIN PLAN alone. The SQL Trace facility provides performance information on individual SQL statements. It generates the following statistics for each statement:
  • Parse, execute, and fetch counts
  • CPU and elapsed times
  • Physical reads and logical reads
  • Number of rows processed
  • Misses on the library cache
  • Username under which each parse occurred
  • Each commit and rollback
The SQL Trace facility can be enabled for a session or for an instance. When the SQL Trace facility is enabled, performance statistics for all SQL statements executed in a user session or in the instance are placed into trace files.
  • The TKPROF program is ran to format the contents of the trace file and place the output into a readable output file. Determine the execution plans of SQL statements
  • Create a SQL script that stores the statistics in the database
TKPROF reports each statement executed with the resources it has consumed, the number of times it was called, and the number of rows which it processed. This information lets you easily locate those statements that are using the greatest resource. With experience or with baselines available, you can assess whether the resources used are reasonable given the work done.
 
What command is used to encrypt PL/SQL application? 

In order to encrypt pl/sql script wrap command is used in oracle. The wrap command is used to create PLB File:
$ wrap iname=Testproc.sql
Processing Testproc.sql to Testproc.plb

To get the procedure script wrapped such that it becomes unreadable:
$ wrap iname=Testproc.sql oname=wrapped.sql
Processing Testproc.sql to wrapped.sql

To execute plb File, below command is used:
SQL> @Testproc.plb

How does syntax error differ from runtime error?

Errors are nothing but syntatical mistakes such as not closing opened if condition or missing semicolon etc.
Exceptions are run time errors such as when some value is passed where condition does not satisfies any condition, oracle throws an exception e.g.  No_data_found,too_many_rows,cursor opened etc.

The key difference is error caught by oracle at complie time are called as errors and error caught at run time are called as exceptions.

What is Commit, Rollback and Savepoint? 

A Transaction is an atomic unit were either all changes are committed or rollback.
The COMMIT, ROLLBACK, SAVEPOINT, and SET TRANSACTION commands can be used to control the transaction.

COMMIT: It makes changes permanent and save to a database during the current transaction.
ROLLBACK: Executed at the end of current transaction and undo/undone any changes made since the begin transaction.
SAVEPOINT: Saves the current point with the unique name in the processing of a transaction.
AUTOCOMMIT: Set AUTOCOMMIT ON to execute COMMIT Statement automatically.
SET TRANSACTION: PL/SQL SET TRANSACTION command set the transaction properties such as read-write/read only access.

Explain Implicit and Explicit cursors?

An implicit cursor are created automatically by Oracle when executing a query. It is simpler to code, but suffers from inefficiency, vulnerability to data errors and has less programmatic control. Oracle performs the open, fetches, and close automatically for implicit cursor. The information about the most recently executed SQL statement can be obtained by examining the values in the implicit SQL cursor attributes. PL/SQL employs an implicit cursor for each UPDATE, DELETE, or INSERT statement as opposed to Select were a choice is provided between implicit or explicit cursor only for executing a single-row SELECT statement .
SELECT SUM (salary) INTO department_total FROM employee WHERE department_number = 10;

An explicit cursor is explicitly defined in the declaration section of your code. It takes more code, but gives more control providing more control on open-fetch-close operations. There is no such thing as an explicit cursor for UPDATE, DELETE, and INSERT statements.
DECLARE
  CURSOR cur IS SELECT col FROM table WHERE something;
BEGIN
  OPEN cur;
  FETCH cur INTO var;
  CLOSE cur;
END;

What are the types of triggers?

Row triggers and Statement triggers: It specifies the number of times the trigger should get executed, Once for every row or once for every triggering statement.
Before and After triggers – Before trigger is fired before and update, insert or triggering statement is executed. While After trigger is fired after the triggering statement is executed.

What is the difference between normal index and clustered index ?

A clustered index is a special type of index that reorders the way records in the table are physically stored. Hence if a table has a clustered index, then the rows of that table will be stored on disk in the same exact order as the clustered index. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. It is faster to read using clustered index than non clustered as data is physically stored in index order.
  Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. The disadvantage on other hand is performance impact when values of indexed column is changed possibility causing re-sorting of rows to maintain order.

A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non-clustered index does not consist of the data pages, but instead contains index rows. Non clustered index stores both a value and a pointer to the actual row that holds that value. It can be used many times on a single table. It is also quicker for insert and update operations than a clustered index.

Both indexes improves the performance while fetching data but slows down insert or update operations.

What is Dirty Read, Non-Repeatable Read and Phantom Read ?

Dirty Read: Accessing an updated value that has not been committed is considered a dirty read because it is possible for that value to be rolled back to its previous value. If you read a value that is later rolled back, you will have read an invalid value.

Non-Repeatable Read: A non-repeatable read occurs when transaction A retrieves a row, transaction B subsequently updates the row, and transaction A later retrieves the same row again. Transaction A retrieves the same row twice but sees different data.
Hence non-repeatable read consists in reading two different committed values, whereas dirty read consists in reading a value that hasn't been committed yet.

Phantom Read: A phantom read occurs when transaction A retrieves a set of rows satisfying a given condition, transaction B subsequently inserts or updates a row such that the row now meets the condition in transaction A, and transaction A later repeats the conditional retrieval. Transaction A now sees an additional row. This row is referred to as a phantom.

What are the different types of transaction isolation levels ?

Transaction isolation level defines how the changes made to some data repository by one transaction affects other simultaneous concurrent transactions, and also how/when that changed data becomes available to other transactions.

READ_UNCOMMITTED: It states that a transaction may read data that is still uncommitted by other transactions. There is no lock on table and data can be read from the table while writing on it. For example Transaction A writes a record without committing it and Transaction B can read the uncommited record and use it. If Transaction A executes a rollback, Transaction B still has the inconsistent data read previously. Hence READ_UNCOMMITTED is vulnerable to dirty reads, non-repeatable reads and phantom reads.

READ_COMMITTED: It states that a transaction can't read data that is not yet committed by other transactions. Since there is a lock on the committed data, we can only read committed records. In this case Transaction A reads some records. Then Transaction B writes that same record and commits. Later Transaction A reads that same record again and may get different values because Transaction B made changes to that record and committed which is known as non-repeatable read. READ_COMMITTED is also vulnerable to phantom reads.

REPEATABLE_READ: It states that if a transaction reads one record from the database multiple times the result of all those reading operations must always be the same. There is lock on the block of SQL which is using the select query.
Now Transaction A reads a range of records. Meanwhile Transaction B inserts a new record in the same range that Transaction A initially fetched and commits. Later Transaction A reads the same range again and will also get the record that Transaction B just inserted. This is a phantom read: a transaction fetched a range of records multiple times from the database and obtained different result sets (containing phantom records). Hence REPEATABLE_READ eliminates both dirty reads and non-repeatable reads but issue of phantom reads still persists.

SERIALIZABLE: It is the most restrictive of all isolation levels. Transactions are executed with locking at all levels (read, range and write locking) so they appear as if they were executed in a serialized way. This leads to a scenario where none of the issues mentioned above may occur, but as there is no transaction concurrency, a performance penalty is introduced and is slowest way to work with data. The lock is on entire table on which the Select query is fired.

Below is the summary of all the Isolation levels:

Isolation Level Dirty Reads Non-Repeatable Reads Phantom Reads
TRANSACTION_READ_UNCOMMITTED Allowed Allowed Allowed
TRANSACTION_READ_COMMITTED Prevented Allowed Allowed
TRANSACTION_REPEATABLE_READ Prevented Prevented Allowed
TRANSACTION_SERIALIZABLE Prevented Prevented Prevented

Note: Using the @Transactional annotation we can define the isolation level of a Spring managed bean transactional method.



No comments:

Post a Comment