Java: 2007

Sunday, August 19, 2007

Difference between equals and hashCode method

The Java super class java.lang.Object has two very important methods defined in it. They are -

public boolean equals(Object obj)
public int hashCode()

These methods prove very important when user classes are confronted with other Java classes, when objects of such classes are added to collections etc.

public boolean equals(Object obj)

This method checks if some other object passed to it as an argument is equal to the object on which this method is invoked. The default implementation of this method in Object class simply checks if two object references x and y refer to the same object. i.e. It checks if x == y. This particular comparison is also known as "shallow comparison". However, the classes providing their own implementations of the equals method are supposed to perform a "deep comparison"; by actually comparing the relevant data members. Since Object class has no data members that define its state, it simply performs shallow comparison.

This is what the JDK 1.4 API documentation says about the equals method of Object class-

Indicates whether some other object is "equal to" this one.

It is reflexive: for any reference value x, x.equals(x) should return true.
It is symmetric: for any reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive: for any reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
It is consistent: for any reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
For any non-null reference value x, x.equals(null) should return false.

The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any reference values x and y, this method returns true if and only if x and y refer to the same object (x==y has the value true).

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

Here are some useful guidelines for implementing the equals method correctly.

Use the equality == operator to check if the argument is the reference to this object, if yes. return true. This saves time when actual comparison is costly.
Use the following condition to check that the argument is not null and it is of the correct type, if not then return false.
if((obj == null) || (obj.getClass() != this.getClass())) return false;
Note that, correct type does not mean the same type or class as shown in the example above. It could be any class or interface that one or more classes agree to implement for providing the comparison.
Cast the method argument to the correct type. Again, the correct type may not be the same class. Also, since this step is done after the above type-check condition, it will not result in a ClassCastException.
Compare significant variables of both, the argument object and this object and check if they are equal. If *all* of them are equal then return true, otherwise return false. Again, as mentioned earlier, while comparing these class members/variables; primitive variables can be compared directly with an equality operator (==) after performing any necessary conversions (Such as float to Float.floatToIntBits or double to Double.doubleToLongBits). Whereas, object references can be compared by invoking their equals method recursively. You also need to ensure that invoking equals method on these object references does not result in a NullPointerException
It is neither necessary, nor advisable to include those class members in this comparison which can be calculated from other variables, hence the word "significant variables". This certainly improves the performance of the equals method. Only you can decide which class members are significant and which are not.
Do not change the type of the argument of the equals method. It takes a java.lang.Object as an argument, do not use your own class instead. If you do that, you will not be overriding the equals method, but you will be overloading it instead; which would cause problems. It is a very common mistake, and since it does not result in a compile time error, it becomes quite difficult to figure out why the code is not working properly.
Review your equals method to verify that it fulfills all the requirements stated by the general contract of the equals method.
Lastly, do not forget to override the hashCode method whenever you override the equals method, that's unpardonable. ;)

public int hashCode()

This method returns the hash code value for the object on which this method is invoked. This method returns the hash code value as an integer and is supported for the benefit of hashing based collection classes such as Hashtable, HashMap, HashSet etc. This method must be overridden in every class that overrides the equals method.

This is what the JDK 1.4 API documentation says about the hashCode method of Object class-

Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.

hashCode

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java^TM programming language.)

Consistency during same execution - Firstly, it states that the hash code returned by the hashCode method must be consistently the same for multiple invocations during the same execution of the application as long as the object is not modified to affect the equals method.
Hash Code & Equals relationship - The second requirement of the contract is the hashCode counterpart of the requirement specified by the equals method. It simply emphasizes the same relationship - equal objects must produce the same hash code. However, the third point elaborates that unequal objects need not produce distinct hash codes.

After reviewing the general contracts of these two methods, it is clear that the relationship between these two methods can be summed up in the following statement -

Equal objects must produce the same hash code as long as they are equal, however unequal objects need not produce distinct hash codes.

Tips

Equal objects must produce the same hash code as long as they are equal, however unequal objects need not produce distinct hash codes.
The equals method provides "deep comparison" by checking if two objects are logically equal as opposed to the "shallow comparison" provided by the equality operator ==.
However, the equals method in java.lang.Object class only provides "shallow comparison", same as provided by the equality operator ==.
The equals method only takes Java objects as an argument, and not primitives; passing primitives will result in a compile time error.
Passing objects of different types to the equals method will never result in a compile time error or runtime error.
For standard Java wrapper classes and for java.lang.String, if the equals argument type (class) is different from the type of the object on which the equals method is invoked, it will return false.
The class java.lang.StringBuffer does not override the equals method, and hence it inherits the implementation from java.lang.Object class.
The equals method must not provide equality comparison with any built in Java class, as it would result in the violation of the symmetry requirement stated in the general contract of the equals method.
If null is passed as an argument to the equals method, it will return false.
Equal hash codes do not imply that the objects are equal.
return 1; is a legal implementation of the hashCode method, however it is a very bad implementation. It is legal because it ensures that equal objects will have equal hash codes, it also ensures that the hash code returned will be consistent for multiple invocations during the same execution. Thus, it does not violate the general contract of the hashCode method. It is a bad implementation because it returns same hash code for all the objects. This explanation applies to all implementations of the hashCode method which return same constant integer value for all the objects.
In standard JDK 1.4, the wrapper classes java.lang.Short, java.lang.Byte, java.lang.Character and java.lang.Integer simply return the value they represent as the hash code by typecasting it to an int.
Since JDK version 1.3, the class java.lang.String caches its hash code, i.e. it calculates the hash code only once and stores it in an instance variable and returns this value whenever the hashCode method is called. It is legal because java.lang.String represents an immutable string.
It is incorrect to involve a random number directly while computing the hash code of the class object, as it would not consistently return the same hash code for multiple invocations during the same execution.

Tuesday, August 14, 2007

Multithreading

Multithreading is one of the most important concepts of the Java language. You simply cannot do without multithreading in real-world programming. Multithreading basically enables your program to do more than one task at once and also to synchronize the various tasks. But before we launch into multithreading we will briefly summarize the points about threads.

Brief Recapitulation of threads

There are two ways to create threads.

Subclass Thread and override run()
Implement Runnable and override run()

Either of these two approaches may be used. Since multiple inheritance doesn't allow us to extend more than one class at a time, implementing the Runnable interface may help us in this situation.

You call a thread by the start() method. And start calls the run() method. You never call run() directly. The stop() method is now deprecated and should be avoided. Threads have priorities between 1-10, the default being 5 i.e normal priority.

A daemon thread is a thread that has no other role other than to serve other threads. When only daemon threads remain, the program exits. When a new thread object is created, the new thread has priority equal to the creating thread, and is a daemon thread if and only if the creating thread is a daemon.

When the JVM starts, there is usually a single non-daemon thread which typically calls the main() method of the class.

Threads can be in one of four states.

New Threads
When a thread is first created, the thread is not yet running.
Runnable Threads
Once the start() method is invoked the thread is runnable and starts to run only when the code inside the run() method begins executing.
Blocked Threads
Threads can enter the blocked state when any of these four conditions occur.
When sleep() is called.
When suspend() is called.
When wait() is called.
The thread calls an operation e.g. during input/output, which will not return until reading/writing is complete.
Dead Threads
A thread dies because of two reasons.
It dies a natural death when the run() method exits.
It is killed because its stop() method was invoked.

Now it is time for some examples. Take a look at two examples below for creating more than one thread.

class NewThread extends Thread {

String name;

Thread t;

NewThread(String threadname) {

name=threadname;

t=new Thread(this, name);

System.out.println("New Thread: " + t );

t.start();

public void run() {

try {

for(int i=5; i>0;i--) {

System.out.println(name + ":" + i) ;

Thread.sleep(1000);

catch (InterruptedException e) {

System.out.println(name + " Interrupted. ");

System.out.println(name + " Exiting.");

class MultiThreadDemo {

public static void main (String args[]) {

new NewThread("One");

new NewThread("Two");

new NewThread("Three");

try {

Thread.sleep(10000);

catch (InterruptedException e) {

System.out.println("Main Thread Interrupted.");

System.out.println("main Thread Exiting.");

And the second one.

Note: Suspend and resume are deprecated methods.

class NewThread implements Runnable {

String name;

Thread t;

NewThread(String threadname) {

name=threadname;

t=new Thread(this, name);

System.out.println("New Thread: " + t);

t.start();

public void run() {

try {

for(int i=5;i>0;i--) {

System.out.println(name + ":" + i);

Thread.sleep(200);

catch (InterruptedException e) {

System.out.println(name + "Interrupted. ");

System.out.println(name + " Exiting.");

class SuspendResume {

public static void main(String args[]) {

NewThread ob1 = new NewThread("One");

NewThread ob2 = new NewThread("Two");

try {

Thread.sleep(1000);

System.out.println("Suspending thread One");

Thread.sleep(1000);

ob1.t.suspend();

System.out.println("Resuming thread One");

ob1.t.resume();

System.out.println("Suspending thread Two");

Thread.sleep(1000);

ob2.t.suspend();

System.out.println("Resuming thread Two");

ob2.t.resume();

catch (InterruptedException e) {

System.out.println("main thread interrupted." );

try  {

ob1.t.join();

ob2.t.join();

catch (InterruptedException e) {

System.out.println("main thread interrupted.");

System.out.println("Main thread Exiting.");

Synchronization

When two or more threads need access to a shared resource, they need some way to ensure that the resource will be used by only one thread at a time. The process by which this is achieved is synchronization.

Key to synchronization is the concept of the monitor. A monitor is an object that is used as a mutually exclusive lock. Only one thread can own the monitor at a given time. When a thread acquires a lock, it is said to have entered the monitor. The other threads attempting to enter the locked monitor will be suspended until the first exits the monitor.

There are two ways you can synchronize your code.

synchronized methods.
synchronized statement

Both involve the use of the synchronized keyword. See below for an example.

import java.io.*;

class Deposit {

static int balance = 1000;

public static void main(String args[]) {

PrintWriter out = new PrintWriter(System.out, true);

Account account = new Account(out);

DepositThread first, second;

first = new DepositThread(account, 1000, "#1");

second=new DepositThread(account, 1000, "\t\t\t\t#2");

first.start();

second.start();

try {

first.join();

second.join();

catch (InterruptedException e) {  }

out.println("*** Final balance is  "  + balance);

class Account {

PrintWriter out;

Account(PrintWriter out) {

this.out=out;

synchronized  void deposit(int amount, String name ) {

int balance;

out.println(name + "  trying to deposit " + amount);

out.println(name + "  getting balance... " );

balance=getBalance();

out.println(name + "  balance got is  " + balance);

balance += amount;

out.println(name + "  setting balance...");

setBalance(balance);

out.println(name + "  balance set to  "  + Deposit.balance);

int getBalance()  {

try  {

Thread.sleep(1000);

catch (InterruptedException e) {  }

return Deposit.balance;

void setBalance(int balance)  {

try {

Thread.sleep(1000);

catch(InterruptedException e) {  }

Deposit.balance = balance;

class DepositThread extends Thread {

Account account;

int deposit_amount;

String message;

DepositThread(Account account, int amount, String message) {

this.message=message;

this.account=account;

this.deposit_amount=amount;

public void run() {

account.deposit(deposit_amount, message);

Note: What will happen if the synchronized keyword is removed in the preceding example ?

Inter-thread Communication

Java's inter-thread communication process involves the use of wait(), notify() and notifyall() methods. These methods are implemented as final methods in Object, so all classes have them. These methods can only be called from within synchronized code.

Rules for using these methods:

wait() tells the calling thread to give up the monitor and go to sleep until some other thread enters the same monitor and calls notify().
notify() wakes up the first thread that called wait() on the object.
notifyall() wakes up all the threads waiting on the object. The highest priority thread will run first.

See below for an incorrect implementation of a producer/consumer example.

//An  incorrect implementation of a producer and consumer.

class Q {

int n;

synchronized int get() {

System.out.println("Got: " + n);

return n;

synchronized void put(int n) {

this.n=n;

System.out.println("Put: " + n);

class Producer implements Runnable {

        Q q;

        Producer(Q q) {

               this.q=q;

               new Thread(this, "Producer").start();

        public void run() {

               int i=0;

               while(true) {

                       q.put(i++);

class Consumer implements Runnable {

        Q q;

        Consumer(Q q) {

               this.q=q;

               new Thread(this, "Consumer").start();

        public void run() {

               while(true) {

                       q.get();

class PC  {

        public static void main(String args[]) {

               Q q = new Q();

               new Producer(q);

               new Consumer(q);

               System.out.println("Press Control-C to stop");

The correct way would be using wait() and notify() as shown here.

//A  correct implementation of a producer and consumer.

class Q {

        int n;

        boolean valueset = false;

        synchronized int get() {

   if (!valueset)

        try {

               wait();

        catch (InterruptedException e) {

        System.out.println("InterruptedException caught");

               System.out.println("Got: " + n);

               valueset=false;

               notify();

               return n;

        synchronized void put(int n) {

        if (valueset)

        try {

               wait();

        catch(InterruptedException e) {

        System.out.println("InterruptedException caught");

               this.n=n;

               valueset=true;

               System.out.println("Put: " + n);

               notify();

class Producer implements Runnable {

        Q q;

        Producer(Q q) {

               this.q=q;

               new Thread(this, "Producer").start();

        public void run() {

               int i=0;

               while(true) {

                       q.put(i++);

class Consumer implements Runnable {

        Q q;

        Consumer(Q q) {

               this.q=q;

               new Thread(this, "Consumer").start();

        public void run() {

               while(true) {

                       q.get();

class PCFixed  {

        public static void main(String args[]) {

               Q q = new Q();

               new Producer(q);

               new Consumer(q);

               System.out.println("Press Control-C to stop");

Now to summarize the points about multithreading: thread synchronization, inter-thread communication, thread priorities, thread scheduling, and daemon threads.

What is connection pooling?

With servlets, opening a database connection is a major bottleneck because we are creating and tearing down a new connection for every page request and the time taken to create connection will be more. Creating a connection pool is an ideal approach for a complicated servlet. With a connection pool, we can duplicate only the resources we need to duplicate rather than the entire servlet. A connection pool can also intelligently manage the size of the pool and make sure each connection remains valid. A number of connection pool packages are currently available. Some like DbConnectionBroker are freely available from Java Exchange Works by creating an object that dispenses connections and connection Ids on request. The ConnectionPool class maintains a Hastable, using Connection objects as keys and Boolean values as stored values. The Boolean value indicates whether a connection is in use or not. A program calls getConnection() method of the ConnectionPool for getting Connection object it can use; it calls returnConnection() to give the connection back to the pool.

jdbc connection , datasource connection, connection pooling

Both are database connections. A connection is a handle to database. When you are directly creating a connection by calling Drivermanager.getConnection(..) , you are creating a connection by yourself and when closing close() on it, the link to database is lost. On the other hand when you get a connection from a datasource, when you call the close() on it, it will not close the link to database, but will return to a connection pool where it can be reused by some other classes. It is always better to use a connection pool because creating connections are expensive

DriverManager.getConnection() literally creates, that is, builds a connection to the database using the values you previously supplied when you loaded the driver.

A connection pool is an object that contains several already made connections to the database, and simply provides you with one of those existing connections. The confusion arises because a) the method name is frequently the same and b) the connection pool object calls the driver's getConnection() method to create several connections before lending any of them out.

In other words:

DriverManager.getConnection() builds a connection to the database.

ConnectionPool.getConnection() fetches an existing connection.

If you use DriverManager.getConnection(), you are indeed bypassing the Connection Pool entirely.

Database Connection Pooling with Tomcat

Software object pooling is not a new concept. There are many scenarios where some type of object pooling technique is employed to improve application performance, concurrency, and scalability. After all, having your database code create a new Connection object on every client request is an expensive process. Moreover, with today's demanding applications, creating new connections for data access from scratch, maintaining them, and tearing down the open connection can lead to massive load on the server.

Connection pooling eliminates JDBC overhead. Further, object pooling also helps to reduce the garbage collection load. In this article, we'll look at an elegant way of creating a pool of open database-connection objects in Tomcat, so that they are handy whenever an application needs to access a DB resource.

With Database Connection Pooling (DBCP), we can scale our applications to handle increased load and deliver high performance benefits. Using recycled database connection objects cuts the time taken to re-instantiate and load frequently used objects, thus reducing unnecessary overheads. Configuring a DB pool can be a daunting task, because there has to be a way for different components within an application to know about the available pooled objects, and a mechanism to locate them. This is exactly where JNDI steps in, tying these dependencies together.

JNDI to the Rescue

The Java Naming and Directory Interface (JNDI) has been at the core of Java EE since its inception. JNDI offers a generic mechanism for Java EE components to find other components, resources, or services indirectly at runtime. The primary role of JNDI in a Java EE application is to provide an indirection layer, so that components can find required resources without being particularly aware of the indirection. This indirection is almost transparent. JNDI helps in holding applications together, but this coupling is very flexible, so that apps can be easily reassembled. JNDI spares you from providing direct references to the data source, JDBC driver class names, user names and passwords, or any vendor-specific quirks of setting up pooling. We just look up all of these dependencies at runtime through a JNDI call. The developer, on the other hand, is ignorant of the external resources.

Tomcat Configuration

Our approach to DBCP uses the Jakarta-Commons database connection pool. But first, we need to configure the JNDI DataSource in Tomcat by adding a declaration for the resource to server.xml file, which resides inside the /conf directory of your Tomcat installation (indicated by the environment variable CATALINA_HOME). The JNDI DataSource is used as a factory for connections. One of the major advantages of using a configuration like this is that the characteristics of the pool can be changed without affecting the application code. Our application's use of connection pooling is almost transparent. The following code snippet shows us how to configure the container to enable connection pooling.

reloadable="true" crossContext="true">

  type="javax.sql.DataSource" removeAbandoned="true"
  removeAbandonedTimeout="30" maxActive="100"
  maxIdle="30" maxWait="10000" username="kunal"
  password="java_facier"
  driverClassName="com.mysql.jdbc.Driver"
  url="jdbc:mysql://localhost/dbcptest"/>

We can configure a maximum number of DB connections in the pool. Make sure you choose a maximum connection count large enough to handle all of your database connections--alternatively, you can set 0 for no limit. Further, we can set the maximum number of idle database connections to be retained in the pool. Set this value to -1 for no limit. The most optimal performance is attained when the pool in its steady state contains just enough connections to service all concurrent connection requests, without having to create new physical database connections at runtime. We can also specify the maximum time (in milliseconds) to wait for a database connection to become available, which in this example is 10 seconds. An exception is thrown if this timeout is exceeded. You can set this value to -1 to wait indefinitely. Please make sure your connector driver, such as mysql.jar, is placed inside the /common/lib directory of your Tomcat installation.

To achieve performance and high throughput, we also need to fine-tune the container to work under heavy traffic. Here's how we'll configure the Connector element for the maxProcessors and acceptCount parameters in the server.xml file:


  maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
  enableLookups="false" redirectPort="8443" acceptCount="150"
  connectionTimeout="20000" disableUploadTimeout="true" />

Configuring JNDI Reference

In order for JNDI to resolve the reference, we have to insert a tag into the web.xml deployment descriptor file. We first begin by setting a tag for registering a ServletContextListener as shown below:



        com.onjava.dbcp.DBCPoolingListener




      DB Connection Pooling
      jdbc/TestDB
      javax.sql.DataSource
      Container
 


        EnrolledStudents
        com.onjava.dbcp.CourseEnrollmentServlet
        1



        EnrolledStudents
        /enrollment.do

This binding is vendor-specific, and every container has its own mechanism for setting data sources. Please note that this is just a declaration for dependency on an external resource, and doesn't create the actual resource. Comprehending the tags is pretty straightforward: this indicates to the container that the local reference name jdbc/TestDB should be set by the app deployer, and this should match with the resource name, as declared in server.xml file.

Putting DBCP into Action

As our application first starts asking the pool for Connection objects, they will be newly created, but when the application has finished with an object, it's returned to the pool rather than destroyed. This has huge performance benefits. Now, as the application needs more Connection objects, the pool will be able to issue recycled objects that have previously been returned by the application.

As an example, let's create a listener class to work with the pool. Our listener class implements the ServletContextListener interface; thus, it'll be initialized when the container starts and creates a ServletContext for this web app. Remember, there's only one ServletContext per web app. Any class implementing the ServletContextListener interface is initialized when the container starts. This early initialization cuts unnecessary overhead later, since it's ideal to have a cached set of open database connection objects available when the container starts rather than waiting for a client request. Inside the listener class, we'll do the necessary JNDI lookup and then set the DataSource as a ServletContext attribute so that it's available to the entire web app. The following code snippet shows us how to extract DataSource through a JNDI call:

public class DBCPoolingListener implements
ServletContextListener{
public void contextInitialized
 (ServletContextEvent sce){

 try {
   // Obtain our environment naming context
   Context envCtx = (Context) new InitialContext().
   lookup("java:comp/env");

   // Look up our data source
   DataSource  ds = (DataSource) envCtx.lookup
      ("jdbc/TestDB");

   sce.getServletContext().setAttribute
     ("DBCPool", ds);
  } catch(NamingException e){ e.printStackTrace();
 }
}
public void contextDestroyed(ServletContextEvent
sce){
}
}

The component naming context is indicated by the prefix java:comp/env/.

For the sake of simplicity, we'll create a simple servlet, hard-coding the business logic and presentation. We'll use the JDBC 2.0 Standard Extension API, which specifies that a database service provider can implement a pooling technique that can allow multiple Connection objects to be shared among several requesting clients. Here's how we'll extract DataSource from the ServletContext attribute and then establish a Connection to pooled DB connection objects.

public void init() throws ServletException {
   try {
//Create a datasource for pooled connections.
datasource = (DataSource) getServletContext().
getAttribute("DBCPool");

 //Register the driver for non-pooled connections.
 Class.forName("com.mysql.jdbc.Driver").
     newInstance();
   }
   catch (Exception e) {
     throw new ServletException(e.getMessage());
   }
 }

The servlet is written to use either pooled or non-pooled database connections, depending on the query string passed in its URL. The servlet fetches a pooled connection object using Tomcat DBCP, and non-pooled connections directly from MySQL connector.

Here's an example of obtaining a Connection object. If the pooledConnection flag is set, it simply calls getConnection() on the DataSource. If not, it manually creates a new Connection object:

private synchronized Connection getConnection
   (boolean pooledConnection)
   throws SQLException {
 if (pooledConnection) {
    pooledCount++;

   // Allocate and use a connection from the pool
   return datasource.getConnection();
 }
 else {

   nonPooledCount++;
   Connection con = DriverManager.getConnection(
     "jdbc:mysql://localhost/dbcptest","kunal",
     "java_facier");
   return con;   //return a newly created object
   }



 }

Having acquired a Connection, the servlet executes a simple join between the course and enrollment tables, and then formats and outputs the results as HTML. The example uses PreparedStatement to pre-compile SQL and run it repeatedly. This eliminates the tedious task of parsing and compiling the SQL query on every client request. Pre-compilation improves performance and offers enhanced security by preventing SQL injection attacks. For thread safety, we'll keep Connection, PreparedStatement, and ResultSet as local variables inside of the doGet() method.

Connections issued from the JNDI DataSource factory will be returned to the pool when closed. Clients use a connection pool by borrowing a connection object, using it, and then returning it to the pool by closing it. We have to make sure that after we are done with the Connection, we close it. If a Connection is not closed, it will never be returned to the pool and become available for reuse. Of course, that would tie up resources. The finally block guarantees that used ResultSet, PreparedStatement, and Connection objects are closed and prevents any connection pool leaks, as shown below:


finally {
     try {if (rs != null) rs.close();} catch (SQLException e) {}
     try {if (pstmt != null) pstmt.close();} catch (SQLException e) {}
     try {if (connection != null) connection.close();} catch (SQLException e) {}

Performance Measurement

Before our application hits the ground running, we would like to stress test the app, evaluate its performance, and compare the results between the cached set of pooled connection objects and the non-pooling alternative. For this, we'll rely on JMeter, an open source tool for load testing with a drag-and-drop-style GUI. I have written a test plan for stress testing the web app. I have set up JMeter to stimulate 50 concurrent users, accessing a common servlet two times without any interval. The results are pretty apparent after looking at the JMeter graph results shown in Figures 1 and 2, below.

Figure 1. Pooled DB connections deviation (click for full-size image)

Figure 2. Non-pooled DB connections deviation (click for full-size image)

After several test runs, it turned out that connection pooling is at least four times faster than explicitly creating a DB connection object from the ground up. Admittedly, to get more accurate results, JMeter should run on a different machine. The ramp-up period, which describes the amount of time for creating the total number of threads, should be carefully chosen. It's not considered to be a good idea to set it to zero if you have a large number of threads, because that would create all of the threads at once and send out requests immediately. At the same time, a higher ramp-up period is also not appropriate, as it might underestimate the results.

Conclusion

Connection pooling is a technique used for sharing a cached set of open database connections among several requesting clients. It doesn't require you to modify your code significantly; rather, it provides enhanced performance benefits. Object pooling should be used with care. It does require additional overhead for such tasks as managing the state of the object pool, issuing objects to the application, and recycling used objects. Pooling is best suited for objects that have a short lifetime. If you are already working in a rich Java EE environment, then most likely you would be using an out-of-box connection pooling facility provided by your app server, and your applications' use of connection pooling is almost transparent.

Resources

Example source code used in this article
Jakarta-Commons home
Sun's JNDI tutorial
Tomcat JNDI DataSource how-to document
MySQL's official JDBC driver
JDBC 2.0 tutorial

Sunday, August 12, 2007

What's the difference between "PreparedStatement" and "Statement"?

PreparedStatements are useful when you have one query to execute several times with just parameters changed. In normal case each and every query has to be checked by database whether syntax is ok or not. SQL Statement are precomplied and stored in PreparedStatement object, so it saves time of database to check its syntax.

The PreparedStatement is a slightly more powerful version of a Statement, and should always be at least as quick and easy to handle as a Statement.

Parse the incoming SQL query
Compile the SQL query
Plan/optimize the data acquisition path
Execute the optimized query / acquire and return data

A Statement will always proceed through the four steps above for each SQL query sent to the database. A PreparedStatement pre-executes steps (1) - (3) in the execution process above. Thus, when creating a PreparedStatement some pre-optimization is performed immediately. The effect is to lessen the load on the database engine at execution time.

The other strength of the PreparedStatement is that you can use it over and over again with new parameter values, rather than having to create a new Statement object for each new set of parameters. This approach is obviously more efficient, as only one object is created.

Use the set methods each time to specify new parameter values.

Where will be the pre executed steps stored, i.e) in Application server or in DataBaseServer

What's the differences between object and object references?

Student s = new Student("Joe",20);
is an object of class Student. Can we say s is an object reference
class Student?

s is not an object, it's a variable which contains a reference to an object.
Objects don't have names, just types and locations in memory (and, of course,
fields and methods). Read your statement as: Create a new Student object in
memory, initializing it with the data sent as arguments to a constructor, and
when created, assign a reference to that object to the Student variable s. s
is a reference or object type variable which may reference a Student object or
an object of any subclass of Student.

Take another statement: int x = s.getValue();

Read this statement as "Go to the object referenced by variable s and execute
its getValue() method. Assign the return from that method to the int variable
x."

Difference between String StringBuffer and StringBuilder

String is immutable whereas StringBuffer and StringBuilder can change their values.

The only difference between StringBuffer and StringBuilder is that StringBuilder is unsynchronized whereas StringBuffer is synchronized. So when the application needs to be run only in a single thread then it is better to use StringBuilder. StringBuilder is more efficient than StringBuffer.

Criteria to choose among String, StringBuffer and StringBuilder

If your text is not going to change use a string Class because a String object is immutable.
If your text can change and will only be accessed from a single thread, use a StringBuilder because StringBuilder is unsynchronized.
If your text can changes, and will be accessed from multiple threads, use a StringBuffer because StringBuffer is synchronous.

StringBuffer versus String

What is the performance impact of the StringBuffer and String classes?

Java provides the StringBuffer and String classes, and the String class is used to manipulate character strings that cannot be changed. Simply stated, objects of type String are read only and immutable. The StringBuffer class is used to represent characters that can be modified.

The significant performance difference between these two classes is that StringBuffer is faster than String when performing simple concatenations. In String manipulation code, character strings are routinely concatenated. Using the String class, concatenations are typically performed as follows:

     String str = new String ("Stanford  ");
    str += "Lost!!";

If you were to use StringBuffer to perform the same concatenation, you would need code that looks like this:

     StringBuffer str = new StringBuffer ("Stanford ");
    str.append("Lost!!");

Developers usually assume that the first example above is more efficient because they think that the second example, which uses the append method for concatenation, is more costly than the first example, which uses the + operator to concatenate two String objects.

The + operator appears innocent, but the code generated produces some surprises. Using a StringBuffer for concatenation can in fact produce code that is significantly faster than using a String. To discover why this is the case, we must examine the generated bytecode from our two examples. The bytecode for the example using String looks like this:

0 new #7 
3 dup
4 ldc #2 
6 invokespecial #12 
9 astore_1
10 new #8 
13 dup
14 aload_1
15 invokestatic #23 
18 invokespecial #13 
21 ldc #1 
23 invokevirtual #15 
26 invokevirtual #22 
29 astore_1

The bytecode at locations 0 through 9 is executed for the first line of code, namely:

     String str = new String("Stanford ");

Then, the bytecode at location 10 through 29 is executed for the concatenation:

     str += "Lost!!";

Things get interesting here. The bytecode generated for the concatenation creates a StringBuffer object, then invokes its append method: the temporary StringBuffer object is created at location 10, and its append method is called at location 23. Because the String class is immutable, a StringBuffer must be used for concatenation.

After the concatenation is performed on the StringBuffer object, it must be converted back into a String. This is done with the call to the toString method at location 26. This method creates a new String object from the temporary StringBuffer object. The creation of this temporary StringBuffer object and its subsequent conversion back into a String object are very expensive.

In summary, the two lines of code above result in the creation of three objects:

A String object at location 0
A StringBuffer object at location 10
A String object at location 26

Now, let's look at the bytecode generated for the example using StringBuffer:

0 new #8 
3 dup
4 ldc #2 
6 invokespecial #13 
9 astore_1
10 aload_1
11 ldc #1 
13 invokevirtual #15 
16 pop

The bytecode at locations 0 to 9 is executed for the first line of code:

     StringBuffer str = new StringBuffer("Stanford ");

The bytecode at location 10 to 16 is then executed for the concatenation:

     str.append("Lost!!");

Notice that, as is the case in the first example, this code invokes the append method of a StringBuffer object. Unlike the first example, however, there is no need to create a temporary StringBuffer and then convert it into a String object. This code creates only one object, the StringBuffer, at location 0.

In conclusion, StringBuffer concatenation is significantly faster than String concatenation. Obviously, StringBuffers should be used in this type of operation when possible. If the functionality of the String class is desired, consider using a StringBuffer for concatenation and then performing one conversion to String.

Tuesday, August 7, 2007

Different types of SQL Joins

The main point of Relational
Databases is to reduce the amount of duplicated data in tables by “normalising”
the data – that is, reduce duplication and redundant data. For
instance, if we were creating a table of employees and their departments
in a business – rather than list the same department over and over
again, we could split out the departments into their own table and reference
that table in the employees table as below:

Employees Table

*Employee_ID*	*First_Name*	*Last_Name*	*Department_ID*
1	Joe	Bloggs	1
2	Fred	Jones	2
3	Bill	Smith	3
4	Ann	Jones	2
5	Sally	Dee	1
6	Jim	Lee	2

Departments Table

*Department_ID*	*Department_Name*
1	Sales
2	Tech Support
3	Finance

The problem then comes
when you want to list all employees and their department. In order
to do this we will need to JOIN the 2 tables in the SQL statement to
get the relevant data back. This is the statement we will use:

SELECT
First_Name, Last_Name, Department_Name
FROM
Employees INNER JOIN Departments ON
Employee.Department_ID = Department.Department_ID

Which will give us the
following

First_Name	Last_Name	Department_Name
Joe	Bloggs	Sales
Sally	Dee	Sales
Fred	Jones	Tech Support
Ann	Jones	Tech Support
Jim	Lee	Tech Support
Bill	Smith	Finance

Things to note from the
statement:

We have specified
an INNER JOIN. An inner join returns only those rows from the
listed tables that match the criteria specified in the ON portion of
the JOIN clause. In this case we have said where Department_ID
in the Employees Table matches the Department_ID in the Departments
table.

Although here
we have specified that it is an INNER JOIN, this is the default type
of join so you can just specify JOIN and get the same results.

The JOIN statement
is in the FROM clause. This is part of the ANSI SQL-92 standard
for how to format Transact SQL statements. It is also quicker
and easier for SQL to evaluate joins in the FROM clause before it gets
to the WHERE clause.

It is not required
but it is considered good practice to write all keywords in upper case.
Although Query Analyzer will colour-code keywords, this practice will
help to more easily identify and separate them.

There were also no duplicate
column names in our query so we just listed the columns we wanted and
listed the full table names in the ON portion of the JOIN clause.
If there were duplicate column names in the query we would have had
to have specified to SQL Server which table we wanted the column data
to be displayed from just like in the ON portion otherwise SQL will
complain that it doesn’t know which table you want the data from.

If we had to do this,
it would make our SQL statement look like:

SELECT
Employees.First_Name,
Employees.Last_Name,
Departments.Department_Name
FROM
Employees INNER JOIN Departments ON
Employee.Department_ID = Department.Department_ID

Which looks unwieldy enough
when there are just a few columns – can you imagine a statement containing
30 columns and 4 JOINS written out like this?

Fortunately, SQL allows
us to use Table Aliases to replace the table names in most of the statement,
so we can rewrite our statement as below:

SELECT
e.First_Name,
e.Last_Name,
d.Department_Name
FROM
Employees e JOIN Departments d ON
e.Department_ID = d.Department_ID

As you can see, we have
now replaced the table names with the letters ‘e’ and ‘d’ respectively.
We told SQL to recognise these as the table names in the FROM clause
when we said:

Employees e JOIN Departments
d

It did not have to be
a single letter it is just easier (and quicker to type!). Indeed
you will find that, once you are writing complex queries involving a
lot of tables, you will have to start using more characters. When
using aliases, try to use something that will identify the table –
in our case it was easy as we only had 2 tables so using the first letter
of each was enough to identify them - If our 2 tables were called Employees
and Entitlements we would have had to use more than one letter!
One important note to remember is that, once you have used a table alias
you must then use it everywhere else in the statement in place of the
table name.

Now lets get a bit more
complex. Instead of listing employees and what department they
are in, we want to list all the departments and which employees
work in them. Given the tables above and the knowledge we have
gained so far we could write a statement such as:

SELECT
d.Department_Name,
e.First_Name,
e.Last_Name
FROM
Departments d JOIN Employees e ON
d.Department_ID = e.Department_ID
ORDER BY
Department_Name

* Note I have thrown in
an ORDER BY clause to order the listing by department – nothing to
do with joins it just makes the output neater

Which would give us:

Department_Name	First_Name	Last_Name
Finance	Bill	Smith
Tech Support	Jim	Lee
Tech Support	Fred	Jones
Tech Support	Ann	Jones
Sales	Sally	Dee
Sales	Joe	Bloggs

Which is great, and just
what we wanted. But now, suppose our business expands and adds
some more departments so that the departments table looks like this:

*Department_ID*	*Department_Name*
1	Sales
2	Tech Support
3	Finance
4	Marketing
5	Legal

If we run our statement
again, we will get exactly the same output! Where are our missing
departments? Because an inner join will only give us those rows
that match from both tables we don’t see the Legal department
or the Marketing department.

In order to see all the
departments regardless of whether an employee is assigned to that department
we will have to use an OUTER JOIN. An outer join returns all rows
from the outer table and those rows that match from the other (inner)
table.

There are 2 types of outer
join, a LEFT OUTER JOIN and a RIGHT OUTER JOIN. They do the same
thing except one takes the left most listed table as the outer table
and the other takes the right most listed table as the outer table.
It just allows you to be more flexible in how you can write your joins
and can become invaluable when you have to join 3 or more tables.

All we have to do to our
statement is change JOIN to read LEFT OUTER JOIN as below:

SELECT
d.Department_Name,
e.First_Name,
e.Last_Name
FROM
Departments d LEFT OUTER JOIN Employees e
ON d.Department_ID = e.Department_ID
ORDER BY
Department_Name

And we will get the output
we wanted as below:

Department_Name	First_Name	Last_Name
Legal	NULL	NULL
Finance	Bill	Smith
Tech Support	Jim	Lee
Tech Support	Fred	Jones
Tech Support	Ann	Jones
Sales	Sally	Dee
Sales	Joe	Bloggs
Marketing	NULL	NULL

Some things to note about
the statement and the output:

The keyword
OUTER is optional so you can just use LEFT JOIN (or RIGHT JOIN).

You can see
that we now have all the departments listed and where there was no matching
Department_ID in the Employees table we have the entry NULL against
First_Name and Last_Name from that table.

Now lets expand our employees
table. It is clear from looking at it that we have people working
in the same departments so lets promote some of them. We will
record this fact by adding a Manager_ID column. This will be populated
with an Employee_ID whenever a person has a manager, to indicate who
that manager is. This will leave our table looking like this:

*Employee_ID*	*First_Name*	*Last_Name*	*Department_ID*	*Manager_ID*
1	Joe	Bloggs	1	NULL
2	Fred	Jones	2	NULL
3	Bill	Smith	3	NULL
4	Ann	Jones	2	2
5	Sally	Dee	1	1
6	Jim	Lee	2	2

This shows that Joe, Fred
and Bill have no managers (Indicated by the NULL in the column for this
value). Ann and Jim are managed by Fred and Sally is managed by
Joe.

We now need to list all
employees and the name of their manager. We know from above that,
as not everyone has a manager, we will have to use an OUTER JOIN but
we do not have another table to join to. We will have to use a
SELF JOIN – that is, join the table to itself - and this is where
table aliases really come into play. The statement we will use
is this:

SELECT
e.First_Name,
e.Last_Name,
e2.First_Name AS Manager_Name
FROM
Employees e LEFT JOIN Employees e2 ON
e.manager_id = e2.employee_id

*First_Name*	*Last_Name*	*Manager_Name*
Joe	Bloggs	NULL
Fred	Jones	NULL
Bill	Smith	NULL
Ann	Jones	Fred
Sally	Dee	Joe
Jim	Lee	Fred

Things to note about the
statement:

Basically we
have used table aliases to create a “copy” of the employees table
and called it e2. Now we can use our copy of the employees table
just as if it was another table.

You can also
see in the statement that I have aliased the column name as well so
that the column title makes more sense – you can do this in any SQL
statement if you feel that the actual column name doesn’t make much
sense. For instance, if we had named our columns Emp_FName and
Emp_LName it doesn’t look too good on a report so we could alias them
to First Name and Last Name. You can have spaces in these column
names as log as you enclose the name in quotes like below:

SELECT
e.First_Name AS ‘First Name’,
e.Last_Name AS ‘Last Name’,
e2.First_Name AS ‘Manager Name’
FROM
Employees e LEFT JOIN Employees e2 ON
e.manager_id = e2.employee_id

First Name	Last Name	Manager Name
Joe	Bloggs	NULL

* As a test, how would
you rewrite our managers query above so that it listed managers and
their employees? My solution at the end of the article.

Up until now we have only
been dealing with 2 tables. What happens when you need to join
3 (or more) tables?

Our employees are hardworking
so lets allow them to work some overtime. To do this we will add
2 tables – an Overtime table and an Overtime_Allowed table.

Why 2 tables? Well,
we will be restricting who can take overtime based on department and
to comply with normalisation rules, rather then put all that information
into one table with lots of duplication we will create 1 table for the
overtime definition and then what is called a “link” table to contain
information about both departments and overtime. The new tables
are below:

Overtime Table

*OTime_ID*	*OTime_Name*	*Pay_Scale*
1	Weekday	0
2	Weekend	1.5
3	Bank Holiday	2

Overtime_Allowed
Table

*Dept_ID*	*OTime_ID*	*OTime_Allowed*
1	1	Y
1	2	N
1	3	N
2	1	Y
2	2	Y
2	3	Y
3	1	Y
3	2	Y
3	3	N
4	1	Y
4	2	Y
4	3	N
5	1	Y
5	2	Y
5	3	Y

Hopefully you can now
see the point of the 2 tables.

** NOTE – in reality
the OTime_Allowed column would be data type BIT (a 1 or a 0 to indicate
TRUE or FALSE or YES and NO in our case) I have made it a CHAR
field to keep the statement simple so we don’t get bogged down in understanding
the syntax of converting a BIT field into a ‘Yes’ or ‘No’ when the results
are displayed.

So now suppose we want
to list all departments and their overtime allowances. In order
to do this we will have to join the Departments, Overtime and Overtime_Allowed
tables.

When joining 3 or more
tables it is important to understand how SQL processes the joins so
that you can order them correctly to get the results you need.
I like to think of multi-table joins as like the game “six degrees
of Kevin Bacon” - you take each table in turn and relate it to the
next until finally you have linked Tom Cruise to Kevin Bacon in as few
hops as possible. But maybe that’s just me!

SQL processes joins in
much the same way - It takes the first two tables listed and creates
a result set for them. It then takes the next table and compares
it to the temporary table it has created containing the previous result-set
and, based on the conditions in the ON clause, creates a result set
and so on until it creates the final table it displays.

An example is needed:

SELECT
d.Department_Name
AS Department,
o.OTime_Name AS ‘Overtime Allowed’
FROM
Departments d
JOIN Overtime_Allowed oa ON d.Department_ID = oa.Dept_ID
JOIN Overtime o ON oa.OTime_ID = o.OTime_ID
WHERE
oa.OTime_Allowed = ‘Y’
ORDER BY
d.Department_Name

*Department*	Overtime Allowed
Finance	Weekend
Finance	Weekday
Legal	Weekend
Legal	Weekday
Legal	Bank Holiday
Marketing	Weekend
Marketing	Weekday
Sales	Weekday
Tech Support	Bank Holiday
Tech Support	Weekend
Tech Support	Weekday

Here you can see that
we used the link table to get our results by joining Departments to
Overtime via Overtime_Allowed.

We have now covered the
2 main types of join – INNER JOIN and OUTER JOIN and
their various uses such as SELF JOIN. We have also covered
joining more than 2 tables. And still there is more to talk about!

There are 2 other types
of joins that we haven’t yet covered – CROSS JOIN and FULL OUTER JOIN.
These are less commonly used joins (ie – I’ve never had course to
use them!) but can come in handy.

Cross join:

A cross join on 2 tables
will return every possible combination of rows and columns in those
2 tables. This is called the Cartesian product of the 2 tables.
This means that if we were to join our Employees and Departments tables
in a cross join we would get 30 rows returned as Employees has 6 rows
and Departments has 5 (6*5=30)

When you specify a cross
join you can either use the CROSS JOIN keywords or separate the table
names with a comma:

SELECT
First_Name,
Last_Name,
Department_Name
FROM
Employees CROSS JOIN Departments

SELECT
First_Name,
Last_Name,
Department_Name
FROM
Employees,
Departments

Both would give the results
shown below:

First Name	Last Name	Manager Name
Joe	Bloggs	Sales
Fred	Jones	Sales
Bill	Smith	Sales
Ann	Jones	Sales
Sally	Dee	Sales
Jim	Lee	Sales
Joe	Bloggs	Tech Support
Fred	Jones	Tech Support
Bill	Smith	Tech Support
Ann	Jones	Tech Support
Sally	Dee	Tech Support
Jim	Lee	Tech Support
Joe	Bloggs	Finance
Fred	Jones	Finance
Bill	Smith	Finance
Ann	Jones	Finance
Sally	Dee	Finance
Jim	Lee	Finance
Joe	Bloggs	Marketing
Fred	Jones	Marketing
Bill	Smith	Marketing
Ann	Jones	Marketing
Sally	Dee	Marketing
Jim	Lee	Marketing
Joe	Bloggs	Legal
Fred	Jones	Legal
Bill	Smith	Legal
Ann	Jones	Legal
Sally	Dee	Legal
Jim	Lee	Legal

Full outer join:

A full outer join on 2
tables would return all rows from both tables with Null entries in the
left columns where things don’t match with the right and vice versa.

For example, suppose a
temp worker joins the company – temp workers have no department so
the Department_ID field in Employees would be NULL. We already
have 2 departments that have no workers that we added earlier so a FULL
OUTER JOIN between Employees and Departments would then give us:

SELECT
First_Name,
Last_Name,
Department_Name
FROM
Employees FULL OUTER JOIN Departments ON
Employee.Department_ID = Department.Department_ID

First_Name	Last_Name	Department_Name
John	Deed	NULL
Joe	Bloggs	Sales
Sally	Dee	Sales
Fred	Jones	Tech Support
Ann	Jones	Tech Support
Jim	Lee	Tech Support
Bill	Smith	Finance
NULL	NULL	Marketing
NULL	NULL	Legal

So there you go.
INNER JOIN, OUTER JOIN, SELF JOIN, CROSS JOIN and FULL OUTER JOIN.
Hopefully you now understand all the various ways SQL allows you to
join tables and you can make use of this to normalise your database
and still display the data you need.

Before we go though, some
final points on using joins:

The columns
which are used in join statement have to be either the same data type
or compatible data types. Compatible data types means that SQL
Server can perform an implicit conversion on the data type when it comes
to compare the columns.

The performance
of queries using joins can, in general, be increased if all columns
in a join are indexed.

The only thing that remains
is to see if you rewrote the managers and employees query in the same
way I did – my solution below:

SELECT
e.First_Name AS Manager_First,
e.Last_Name AS Manager_Last,
e2.First_Name AS Employee
FROM
Employees e JOIN Employees e2 ON
e.Employee_ID = e2.Manager_ID
ORDER BY
e.First_Name

[…] Database “normalising” The main point of Relational Databases is to reduce the amount of duplicated data in tables by “normalising” the data – that is, reduce duplication and redundant data. […]

Java

Sunday, August 19, 2007

Difference between equals and hashCode method

Tuesday, August 14, 2007

Multithreading

What is connection pooling?

jdbc connection , datasource connection, connection pooling

Database Connection Pooling with Tomcat

JNDI to the Rescue

Tomcat Configuration

Configuring JNDI Reference

Putting DBCP into Action

Performance Measurement

Conclusion

Resources

Sunday, August 12, 2007

What's the difference between "PreparedStatement" and "Statement"?

What's the differences between object and object references?

Difference between String StringBuffer and StringBuilder

StringBuffer versus String

StringBuffer versus String

What is the performance impact of the StringBuffer and String classes?

Tuesday, August 7, 2007

Different types of SQL Joins

Topics

Hot Topics

Select Subject