Sunday, August 19, 2007

Difference between equals and hashCode method


The Java super class java.lang.Object has two very important methods defined in it. They are -
  • public boolean equals(Object obj)
  • public int hashCode()
These methods prove very important when user classes are confronted with other Java classes, when objects of such classes are added to collections etc.

public boolean equals(Object obj)

This method checks if some other object passed to it as an argument is equal to the object on which this method is invoked. The default implementation of this method in Object class simply checks if two object references x and y refer to the same object. i.e. It checks if x == y. This particular comparison is also known as "shallow comparison". However, the classes providing their own implementations of the equals method are supposed to perform a "deep comparison"; by actually comparing the relevant data members. Since Object class has no data members that define its state, it simply performs shallow comparison.


This is what the JDK 1.4 API documentation says about the equals method of Object class-

Indicates whether some other object is "equal to" this one.
    The equals method implements an equivalence relation:
  • It is reflexive: for any reference value x, x.equals(x) should return true.
  • It is symmetric: for any reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
  • It is transitive: for any reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
  • It is consistent: for any reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
  • For any non-null reference value x, x.equals(null) should return false.
The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any reference values x and y, this method returns true if and only if x and y refer to the same object (x==y has the value true).

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

Here are some useful guidelines for implementing the equals method correctly.
  1. Use the equality == operator to check if the argument is the reference to this object, if yes. return true. This saves time when actual comparison is costly.
  2. Use the following condition to check that the argument is not null and it is of the correct type, if not then return false.
    if((obj == null) || (obj.getClass() != this.getClass())) return false;
    Note that, correct type does not mean the same type or class as shown in the example above. It could be any class or interface that one or more classes agree to implement for providing the comparison.
  3. Cast the method argument to the correct type. Again, the correct type may not be the same class. Also, since this step is done after the above type-check condition, it will not result in a ClassCastException.
  4. Compare significant variables of both, the argument object and this object and check if they are equal. If *all* of them are equal then return true, otherwise return false. Again, as mentioned earlier, while comparing these class members/variables; primitive variables can be compared directly with an equality operator (==) after performing any necessary conversions (Such as float to Float.floatToIntBits or double to Double.doubleToLongBits). Whereas, object references can be compared by invoking their equals method recursively. You also need to ensure that invoking equals method on these object references does not result in a NullPointerException
  5. It is neither necessary, nor advisable to include those class members in this comparison which can be calculated from other variables, hence the word "significant variables". This certainly improves the performance of the equals method. Only you can decide which class members are significant and which are not.
  6. Do not change the type of the argument of the equals method. It takes a java.lang.Object as an argument, do not use your own class instead. If you do that, you will not be overriding the equals method, but you will be overloading it instead; which would cause problems. It is a very common mistake, and since it does not result in a compile time error, it becomes quite difficult to figure out why the code is not working properly.
  7. Review your equals method to verify that it fulfills all the requirements stated by the general contract of the equals method.
  8. Lastly, do not forget to override the hashCode method whenever you override the equals method, that's unpardonable. ;)


public int hashCode()

This method returns the hash code value for the object on which this method is invoked. This method returns the hash code value as an integer and is supported for the benefit of hashing based collection classes such as Hashtable, HashMap, HashSet etc. This method must be overridden in every class that overrides the equals method.

This is what the JDK 1.4 API documentation says about the hashCode method of Object class-

Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
    The general contract of hashCode is:
  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

  1. Consistency during same execution - Firstly, it states that the hash code returned by the hashCode method must be consistently the same for multiple invocations during the same execution of the application as long as the object is not modified to affect the equals method.
  2. Hash Code & Equals relationship - The second requirement of the contract is the hashCode counterpart of the requirement specified by the equals method. It simply emphasizes the same relationship - equal objects must produce the same hash code. However, the third point elaborates that unequal objects need not produce distinct hash codes.
After reviewing the general contracts of these two methods, it is clear that the relationship between these two methods can be summed up in the following statement -

Equal objects must produce the same hash code as long as they are equal, however unequal objects need not produce distinct hash codes.






Tips

  • Equal objects must produce the same hash code as long as they are equal, however unequal objects need not produce distinct hash codes.
  • The equals method provides "deep comparison" by checking if two objects are logically equal as opposed to the "shallow comparison" provided by the equality operator ==.
  • However, the equals method in java.lang.Object class only provides "shallow comparison", same as provided by the equality operator ==.
  • The equals method only takes Java objects as an argument, and not primitives; passing primitives will result in a compile time error.
  • Passing objects of different types to the equals method will never result in a compile time error or runtime error.
  • For standard Java wrapper classes and for java.lang.String, if the equals argument type (class) is different from the type of the object on which the equals method is invoked, it will return false.
  • The class java.lang.StringBuffer does not override the equals method, and hence it inherits the implementation from java.lang.Object class.
  • The equals method must not provide equality comparison with any built in Java class, as it would result in the violation of the symmetry requirement stated in the general contract of the equals method.
  • If null is passed as an argument to the equals method, it will return false.
  • Equal hash codes do not imply that the objects are equal.
  • return 1; is a legal implementation of the hashCode method, however it is a very bad implementation. It is legal because it ensures that equal objects will have equal hash codes, it also ensures that the hash code returned will be consistent for multiple invocations during the same execution. Thus, it does not violate the general contract of the hashCode method. It is a bad implementation because it returns same hash code for all the objects. This explanation applies to all implementations of the hashCode method which return same constant integer value for all the objects.
  • In standard JDK 1.4, the wrapper classes java.lang.Short, java.lang.Byte, java.lang.Character and java.lang.Integer simply return the value they represent as the hash code by typecasting it to an int.
  • Since JDK version 1.3, the class java.lang.String caches its hash code, i.e. it calculates the hash code only once and stores it in an instance variable and returns this value whenever the hashCode method is called. It is legal because java.lang.String represents an immutable string.
  • It is incorrect to involve a random number directly while computing the hash code of the class object, as it would not consistently return the same hash code for multiple invocations during the same execution.

Tuesday, August 14, 2007

Multithreading

Multithreading is one of the most important concepts of the Java language. You simply cannot do without multithreading in real-world programming. Multithreading basically enables your program to do more than one task at once and also to synchronize the various tasks. But before we launch into multithreading we will briefly summarize the points about threads.


Brief Recapitulation of threads

There are two ways to create threads.

  • Subclass Thread and override run()
  • Implement Runnable and override run()

Either of these two approaches may be used. Since multiple inheritance doesn't allow us to extend more than one class at a time, implementing the Runnable interface may help us in this situation.

You call a thread by the start() method. And start calls the run() method. You never call run() directly. The stop() method is now deprecated and should be avoided. Threads have priorities between 1-10, the default being 5 i.e normal priority.

A daemon thread is a thread that has no other role other than to serve other threads. When only daemon threads remain, the program exits. When a new thread object is created, the new thread has priority equal to the creating thread, and is a daemon thread if and only if the creating thread is a daemon.

When the JVM starts, there is usually a single non-daemon thread which typically calls the main() method of the class.

Threads can be in one of four states.

  • New Threads
    When a thread is first created, the thread is not yet running.
  • Runnable Threads
    Once the start() method is invoked the thread is runnable and starts to run only when the code inside the run() method begins executing.
  • Blocked Threads
    Threads can enter the blocked state when any of these four conditions occur.
    When sleep() is called.
    When suspend() is called.
    When wait() is called.
    The thread calls an operation e.g. during input/output, which will not return until reading/writing is complete.
  • Dead Threads
    A thread dies because of two reasons.
    It dies a natural death when the run() method exits.
    It is killed because its stop() method was invoked.

Now it is time for some examples. Take a look at two examples below for creating more than one thread.

 
class NewThread extends Thread {
String name;
Thread t;
        
NewThread(String threadname) {
name=threadname;
t=new Thread(this, name);
System.out.println("New Thread: " + t );
t.start();
}
 
public void run() {
try {
for(int i=5; i>0;i--) {
System.out.println(name + ":" + i) ;
Thread.sleep(1000);
}
}
catch (InterruptedException e) {
System.out.println(name + " Interrupted. ");
}
System.out.println(name + " Exiting.");
}
}
 
class MultiThreadDemo {
        
public static void main (String args[]) {
new NewThread("One");
new NewThread("Two");
new NewThread("Three");
               
try {
Thread.sleep(10000);
}
catch (InterruptedException e) {
System.out.println("Main Thread Interrupted.");
}
System.out.println("main Thread Exiting.");
}
}

And the second one.

Note: Suspend and resume are deprecated methods.

 
class NewThread implements Runnable {
String name;
Thread t;
        
NewThread(String threadname) {
name=threadname;       
t=new Thread(this, name);
System.out.println("New Thread: " + t);
t.start();
}
 
public void run() {
try {
for(int i=5;i>0;i--) {
System.out.println(name + ":" + i);
Thread.sleep(200);
}
}
catch (InterruptedException e) {
System.out.println(name + "Interrupted. ");
}
 
System.out.println(name + " Exiting.");
}
}
 
class SuspendResume {  
 
public static void main(String args[]) {
NewThread ob1 = new NewThread("One");
NewThread ob2 = new NewThread("Two");
 
try {
Thread.sleep(1000);
System.out.println("Suspending thread One");
Thread.sleep(1000);
ob1.t.suspend();
System.out.println("Resuming thread One");    
ob1.t.resume();
 
System.out.println("Suspending thread Two");
Thread.sleep(1000);
ob2.t.suspend();
System.out.println("Resuming thread Two");
ob2.t.resume();
}
catch (InterruptedException e) {
System.out.println("main thread interrupted." );
}
 
try  {
ob1.t.join();
ob2.t.join();
}
catch (InterruptedException e) {
System.out.println("main thread interrupted.");
}
 
System.out.println("Main thread Exiting.");
}
}       

Back to TOP


Synchronization

When two or more threads need access to a shared resource, they need some way to ensure that the resource will be used by only one thread at a time. The process by which this is achieved is synchronization.

Key to synchronization is the concept of the monitor. A monitor is an object that is used as a mutually exclusive lock. Only one thread can own the monitor at a given time. When a thread acquires a lock, it is said to have entered the monitor. The other threads attempting to enter the locked monitor will be suspended until the first exits the monitor.

There are two ways you can synchronize your code.

  • synchronized methods.
  • synchronized statement

Both involve the use of the synchronized keyword. See below for an example.

 
import java.io.*;
 
class Deposit {
static int balance = 1000;
 
public static void main(String args[]) {
PrintWriter out = new PrintWriter(System.out, true);
Account account = new Account(out);
DepositThread first, second;
 
first = new DepositThread(account, 1000, "#1");
second=new DepositThread(account, 1000, "\t\t\t\t#2");
 
first.start();
second.start();
 
try {
first.join();
second.join();
}
catch (InterruptedException e) {  }
 
out.println("*** Final balance is  "  + balance);
}
}
 
class Account {
PrintWriter out;
        
Account(PrintWriter out) {
this.out=out;
}
 
synchronized  void deposit(int amount, String name ) {
int balance;
               
out.println(name + "  trying to deposit " + amount);
out.println(name + "  getting balance... " );
balance=getBalance();
out.println(name + "  balance got is  " + balance);
 
balance += amount;
 
out.println(name + "  setting balance...");
setBalance(balance);
out.println(name + "  balance set to  "  + Deposit.balance);
}
        
int getBalance()  {
 
try  {
Thread.sleep(1000);
}
catch (InterruptedException e) {  }
                       
return Deposit.balance;
}
 
void setBalance(int balance)  {
try {
Thread.sleep(1000);
}
catch(InterruptedException e) {  }
               
Deposit.balance = balance;
}
}
 
class DepositThread extends Thread {
Account account;
int deposit_amount;
String message;
 
DepositThread(Account account, int amount, String message) {
this.message=message;
this.account=account;
this.deposit_amount=amount;
}
public void run() {
account.deposit(deposit_amount, message);
}
}       
 

Note: What will happen if the synchronized keyword is removed in the preceding example ?

Back to TOP


Inter-thread Communication

Java's inter-thread communication process involves the use of wait(), notify() and notifyall() methods. These methods are implemented as final methods in Object, so all classes have them. These methods can only be called from within synchronized code.

Rules for using these methods:

  • wait() tells the calling thread to give up the monitor and go to sleep until some other thread enters the same monitor and calls notify().
  • notify() wakes up the first thread that called wait() on the object.
  • notifyall() wakes up all the threads waiting on the object. The highest priority thread will run first.

See below for an incorrect implementation of a producer/consumer example.

 
//An  incorrect implementation of a producer and consumer.
 
class Q {
 
int n;
               
synchronized int get() {
System.out.println("Got: " + n);
return n;
}
synchronized void put(int n) {
this.n=n;
System.out.println("Put: " + n);
               
}
}
 
class Producer implements Runnable {
        Q q;
        
        Producer(Q q) {
               this.q=q;
               new Thread(this, "Producer").start();
        }
        public void run() {
               int i=0;
               
               while(true) {
                       q.put(i++);
               }
        }
}
 
class Consumer implements Runnable {
        Q q;
        
        Consumer(Q q) {
               this.q=q;
               new Thread(this, "Consumer").start();
        }
 
        public void run() {
               while(true) {
                       q.get();
               }
        }
}
 
class PC  {
        public static void main(String args[]) {
               Q q = new Q();
 
               new Producer(q);
               new Consumer(q);
               System.out.println("Press Control-C to stop");
        }
}
        

The correct way would be using wait() and notify() as shown here.

 
//A  correct implementation of a producer and consumer.
 
class Q {
 
        int n;
        boolean valueset = false;
        
        synchronized int get() {
   if (!valueset) 
        try {
               wait();
               }
        catch (InterruptedException e) {
        System.out.println("InterruptedException caught");
               }
        
               System.out.println("Got: " + n);
               valueset=false;
               notify();
 
               return n;
        }
        synchronized void put(int n) {
        if (valueset)
        try {
               wait();
               }
        catch(InterruptedException e) {
        System.out.println("InterruptedException caught");
               }              
               this.n=n;
               valueset=true;
               System.out.println("Put: " + n);
               notify();
        }
}
 
class Producer implements Runnable {
        Q q;
        
        Producer(Q q) {
               this.q=q;
               new Thread(this, "Producer").start();
        }
        public void run() {
               int i=0;
               
               while(true) {
                       q.put(i++);
               }
        }
}
 
class Consumer implements Runnable {
        Q q;
        
        Consumer(Q q) {
               this.q=q;
               new Thread(this, "Consumer").start();
        }
 
        public void run() {
               while(true) {
                       q.get();
               }
        }
}
 
class PCFixed  {
        public static void main(String args[]) {
               Q q = new Q();
 
               new Producer(q);
               new Consumer(q);
               System.out.println("Press Control-C to stop");
        }
}
        

Now to summarize the points about multithreading: thread synchronization, inter-thread communication, thread priorities, thread scheduling, and daemon threads.

What is connection pooling?

With servlets, opening a database connection is a major bottleneck because we are creating and tearing down a new connection for every page request and the time taken to create connection will be more. Creating a connection pool is an ideal approach for a complicated servlet. With a connection pool, we can duplicate only the resources we need to duplicate rather than the entire servlet. A connection pool can also intelligently manage the size of the pool and make sure each connection remains valid. A number of connection pool packages are currently available. Some like DbConnectionBroker are freely available from Java Exchange Works by creating an object that dispenses connections and connection Ids on request. The ConnectionPool class maintains a Hastable, using Connection objects as keys and Boolean values as stored values. The Boolean value indicates whether a connection is in use or not. A program calls getConnection() method of the ConnectionPool for getting Connection object it can use; it calls returnConnection() to give the connection back to the pool.

jdbc connection , datasource connection, connection pooling

Both are database connections. A connection is a handle to database. When you are directly creating a connection by calling Drivermanager.getConnection(..) , you are creating a connection by yourself and when closing close() on it, the link to database is lost. On the other hand when you get a connection from a datasource, when you call the close() on it, it will not close the link to database, but will return to a connection pool where it can be reused by some other classes. It is always better to use a connection pool because creating connections are expensive

DriverManager.getConnection() literally creates, that is, builds a connection to the database using the values you previously supplied when you loaded the driver.

A connection pool is an object that contains several already made connections to the database, and simply provides you with one of those existing connections. The confusion arises because a) the method name is frequently the same and b) the connection pool object calls the driver's getConnection() method to create several connections before lending any of them out.

In other words:

DriverManager.getConnection() builds a connection to the database.

ConnectionPool.getConnection() fetches an existing connection.

If you use DriverManager.getConnection(), you are indeed bypassing the Connection Pool entirely.

Database Connection Pooling with Tomcat

Software object pooling is not a new concept. There are many scenarios where some type of object pooling technique is employed to improve application performance, concurrency, and scalability. After all, having your database code create a new Connection object on every client request is an expensive process. Moreover, with today's demanding applications, creating new connections for data access from scratch, maintaining them, and tearing down the open connection can lead to massive load on the server.

Connection pooling eliminates JDBC overhead. Further, object pooling also helps to reduce the garbage collection load. In this article, we'll look at an elegant way of creating a pool of open database-connection objects in Tomcat, so that they are handy whenever an application needs to access a DB resource.

With Database Connection Pooling (DBCP), we can scale our applications to handle increased load and deliver high performance benefits. Using recycled database connection objects cuts the time taken to re-instantiate and load frequently used objects, thus reducing unnecessary overheads. Configuring a DB pool can be a daunting task, because there has to be a way for different components within an application to know about the available pooled objects, and a mechanism to locate them. This is exactly where JNDI steps in, tying these dependencies together.

JNDI to the Rescue

The Java Naming and Directory Interface (JNDI) has been at the core of Java EE since its inception. JNDI offers a generic mechanism for Java EE components to find other components, resources, or services indirectly at runtime. The primary role of JNDI in a Java EE application is to provide an indirection layer, so that components can find required resources without being particularly aware of the indirection. This indirection is almost transparent. JNDI helps in holding applications together, but this coupling is very flexible, so that apps can be easily reassembled. JNDI spares you from providing direct references to the data source, JDBC driver class names, user names and passwords, or any vendor-specific quirks of setting up pooling. We just look up all of these dependencies at runtime through a JNDI call. The developer, on the other hand, is ignorant of the external resources.

Tomcat Configuration

Our approach to DBCP uses the Jakarta-Commons database connection pool. But first, we need to configure the JNDI DataSource in Tomcat by adding a declaration for the resource to server.xml file, which resides inside the /conf directory of your Tomcat installation (indicated by the environment variable CATALINA_HOME). The JNDI DataSource is used as a factory for connections. One of the major advantages of using a configuration like this is that the characteristics of the pool can be changed without affecting the application code. Our application's use of connection pooling is almost transparent. The following code snippet shows us how to configure the container to enable connection pooling.

reloadable="true" crossContext="true">

type="javax.sql.DataSource" removeAbandoned="true"
removeAbandonedTimeout="30" maxActive="100"
maxIdle="30" maxWait="10000" username="kunal"
password="java_facier"
driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/dbcptest"/>

We can configure a maximum number of DB connections in the pool. Make sure you choose a maximum connection count large enough to handle all of your database connections--alternatively, you can set 0 for no limit. Further, we can set the maximum number of idle database connections to be retained in the pool. Set this value to -1 for no limit. The most optimal performance is attained when the pool in its steady state contains just enough connections to service all concurrent connection requests, without having to create new physical database connections at runtime. We can also specify the maximum time (in milliseconds) to wait for a database connection to become available, which in this example is 10 seconds. An exception is thrown if this timeout is exceeded. You can set this value to -1 to wait indefinitely. Please make sure your connector driver, such as mysql.jar, is placed inside the /common/lib directory of your Tomcat installation.

To achieve performance and high throughput, we also need to fine-tune the container to work under heavy traffic. Here's how we'll configure the Connector element for the maxProcessors and acceptCount parameters in the server.xml file:


maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="150"
connectionTimeout="20000" disableUploadTimeout="true" />

Configuring JNDI Reference

In order for JNDI to resolve the reference, we have to insert a tag into the web.xml deployment descriptor file. We first begin by setting a tag for registering a ServletContextListener as shown below:



com.onjava.dbcp.DBCPoolingListener




DB Connection Pooling
jdbc/TestDB
javax.sql.DataSource
Container



EnrolledStudents
com.onjava.dbcp.CourseEnrollmentServlet
1



EnrolledStudents
/enrollment.do

This binding is vendor-specific, and every container has its own mechanism for setting data sources. Please note that this is just a declaration for dependency on an external resource, and doesn't create the actual resource. Comprehending the tags is pretty straightforward: this indicates to the container that the local reference name jdbc/TestDB should be set by the app deployer, and this should match with the resource name, as declared in server.xml file.


Putting DBCP into Action

As our application first starts asking the pool for Connection objects, they will be newly created, but when the application has finished with an object, it's returned to the pool rather than destroyed. This has huge performance benefits. Now, as the application needs more Connection objects, the pool will be able to issue recycled objects that have previously been returned by the application.






As an example, let's create a listener class to work with the pool. Our listener class implements the ServletContextListener interface; thus, it'll be initialized when the container starts and creates a ServletContext for this web app. Remember, there's only one ServletContext per web app. Any class implementing the ServletContextListener interface is initialized when the container starts. This early initialization cuts unnecessary overhead later, since it's ideal to have a cached set of open database connection objects available when the container starts rather than waiting for a client request. Inside the listener class, we'll do the necessary JNDI lookup and then set the DataSource as a ServletContext attribute so that it's available to the entire web app. The following code snippet shows us how to extract DataSource through a JNDI call:

public class DBCPoolingListener implements
ServletContextListener{
public void contextInitialized
(ServletContextEvent sce){

try {
// Obtain our environment naming context
Context envCtx = (Context) new InitialContext().
lookup("java:comp/env");

// Look up our data source
DataSource ds = (DataSource) envCtx.lookup
("jdbc/TestDB");

sce.getServletContext().setAttribute
("DBCPool", ds);
} catch(NamingException e){ e.printStackTrace();
}
}
public void contextDestroyed(ServletContextEvent
sce){
}
}

The component naming context is indicated by the prefix java:comp/env/.

For the sake of simplicity, we'll create a simple servlet, hard-coding the business logic and presentation. We'll use the JDBC 2.0 Standard Extension API, which specifies that a database service provider can implement a pooling technique that can allow multiple Connection objects to be shared among several requesting clients. Here's how we'll extract DataSource from the ServletContext attribute and then establish a Connection to pooled DB connection objects.

public void init() throws ServletException {
try {
//Create a datasource for pooled connections.
datasource = (DataSource) getServletContext().
getAttribute("DBCPool");

//Register the driver for non-pooled connections.
Class.forName("com.mysql.jdbc.Driver").
newInstance();
}
catch (Exception e) {
throw new ServletException(e.getMessage());
}
}

The servlet is written to use either pooled or non-pooled database connections, depending on the query string passed in its URL. The servlet fetches a pooled connection object using Tomcat DBCP, and non-pooled connections directly from MySQL connector.

Here's an example of obtaining a Connection object. If the pooledConnection flag is set, it simply calls getConnection() on the DataSource. If not, it manually creates a new Connection object:

private synchronized Connection getConnection
(boolean pooledConnection)
throws SQLException {
if (pooledConnection) {
pooledCount++;

// Allocate and use a connection from the pool
return datasource.getConnection();
}
else {

nonPooledCount++;
Connection con = DriverManager.getConnection(
"jdbc:mysql://localhost/dbcptest","kunal",
"java_facier");
return con; //return a newly created object
}



}

Having acquired a Connection, the servlet executes a simple join between the course and enrollment tables, and then formats and outputs the results as HTML. The example uses PreparedStatement to pre-compile SQL and run it repeatedly. This eliminates the tedious task of parsing and compiling the SQL query on every client request. Pre-compilation improves performance and offers enhanced security by preventing SQL injection attacks. For thread safety, we'll keep Connection, PreparedStatement, and ResultSet as local variables inside of the doGet() method.


Connections issued from the JNDI DataSource factory will be returned to the pool when closed. Clients use a connection pool by borrowing a connection object, using it, and then returning it to the pool by closing it. We have to make sure that after we are done with the Connection, we close it. If a Connection is not closed, it will never be returned to the pool and become available for reuse. Of course, that would tie up resources. The finally block guarantees that used ResultSet, PreparedStatement, and Connection objects are closed and prevents any connection pool leaks, as shown below:




finally {
try {if (rs != null) rs.close();} catch (SQLException e) {}
try {if (pstmt != null) pstmt.close();} catch (SQLException e) {}
try {if (connection != null) connection.close();} catch (SQLException e) {}


Performance Measurement

Before our application hits the ground running, we would like to stress test the app, evaluate its performance, and compare the results between the cached set of pooled connection objects and the non-pooling alternative. For this, we'll rely on JMeter, an open source tool for load testing with a drag-and-drop-style GUI. I have written a test plan for stress testing the web app. I have set up JMeter to stimulate 50 concurrent users, accessing a common servlet two times without any interval. The results are pretty apparent after looking at the JMeter graph results shown in Figures 1 and 2, below.

Thumbnail, click for full-size image.
Figure 1. Pooled DB connections deviation (click for full-size image)

Thumbnail, click for full-size image.
Figure 2. Non-pooled DB connections deviation (click for full-size image)

After several test runs, it turned out that connection pooling is at least four times faster than explicitly creating a DB connection object from the ground up. Admittedly, to get more accurate results, JMeter should run on a different machine. The ramp-up period, which describes the amount of time for creating the total number of threads, should be carefully chosen. It's not considered to be a good idea to set it to zero if you have a large number of threads, because that would create all of the threads at once and send out requests immediately. At the same time, a higher ramp-up period is also not appropriate, as it might underestimate the results.

Conclusion

Connection pooling is a technique used for sharing a cached set of open database connections among several requesting clients. It doesn't require you to modify your code significantly; rather, it provides enhanced performance benefits. Object pooling should be used with care. It does require additional overhead for such tasks as managing the state of the object pool, issuing objects to the application, and recycling used objects. Pooling is best suited for objects that have a short lifetime. If you are already working in a rich Java EE environment, then most likely you would be using an out-of-box connection pooling facility provided by your app server, and your applications' use of connection pooling is almost transparent.

Resources







Sunday, August 12, 2007

What's the difference between "PreparedStatement" and "Statement"?

PreparedStatements are useful when you have one query to execute several times with just parameters changed. In normal case each and every query has to be checked by database whether syntax is ok or not. SQL Statement are precomplied and stored in PreparedStatement object, so it saves time of database to check its syntax.

  • The PreparedStatement is a slightly more powerful version of a Statement, and should always be at least as quick and easy to handle as a Statement.
    1. Parse the incoming SQL query
    2. Compile the SQL query
    3. Plan/optimize the data acquisition path
    4. Execute the optimized query / acquire and return data
    A Statement will always proceed through the four steps above for each SQL query sent to the database. A PreparedStatement pre-executes steps (1) - (3) in the execution process above. Thus, when creating a PreparedStatement some pre-optimization is performed immediately. The effect is to lessen the load on the database engine at execution time.

    The other strength of the PreparedStatement is that you can use it over and over again with new parameter values, rather than having to create a new Statement object for each new set of parameters. This approach is obviously more efficient, as only one object is created.

    Use the set methods each time to specify new parameter values.




    Where will be the pre executed steps stored, i.e) in Application server or in DataBaseServer

    What's the differences between object and object references?

    Student s = new Student("Joe",20);
    is an object of class Student. Can we say s is an object reference
    class Student?


    s is not an object, it's a variable which contains a reference to an object.
    Objects don't have names, just types and locations in memory (and, of course,
    fields and methods). Read your statement as: Create a new Student object in
    memory, initializing it with the data sent as arguments to a constructor, and
    when created, assign a reference to that object to the Student variable s. s
    is a reference or object type variable which may reference a Student object or
    an object of any subclass of Student.

    Take another statement: int x = s.getValue();

    Read this statement as "Go to the object referenced by variable s and execute
    its getValue() method. Assign the return from that method to the int variable
    x."

    Difference between String StringBuffer and StringBuilder

    String is immutable whereas StringBuffer and StringBuilder can change their values.

    The only difference between StringBuffer and StringBuilder is that StringBuilder is unsynchronized whereas StringBuffer is synchronized. So when the application needs to be run only in a single thread then it is better to use StringBuilder. StringBuilder is more efficient than StringBuffer.

    Criteria to choose among String, StringBuffer and StringBuilder

    1. If your text is not going to change use a string Class because a String object is immutable.
    2. If your text can change and will only be accessed from a single thread, use a StringBuilder because StringBuilder is unsynchronized.
    3. If your text can changes, and will be accessed from multiple threads, use a StringBuffer because StringBuffer is synchronous.

    StringBuffer versus String

    StringBuffer versus String

    What is the performance impact of the StringBuffer and String classes?

    Java provides the StringBuffer and String classes, and the String class is used to manipulate character strings that cannot be changed. Simply stated, objects of type String are read only and immutable. The StringBuffer class is used to represent characters that can be modified.

    The significant performance difference between these two classes is that StringBuffer is faster than String when performing simple concatenations. In String manipulation code, character strings are routinely concatenated. Using the String class, concatenations are typically performed as follows:

         String str = new String ("Stanford  ");
    str += "Lost!!";


    If you were to use StringBuffer to perform the same concatenation, you would need code that looks like this:

         StringBuffer str = new StringBuffer ("Stanford ");
    str.append("Lost!!");


    Developers usually assume that the first example above is more efficient because they think that the second example, which uses the append method for concatenation, is more costly than the first example, which uses the + operator to concatenate two String objects.

    The + operator appears innocent, but the code generated produces some surprises. Using a StringBuffer for concatenation can in fact produce code that is significantly faster than using a String. To discover why this is the case, we must examine the generated bytecode from our two examples. The bytecode for the example using String looks like this:

    0 new #7 
    3 dup
    4 ldc #2
    6 invokespecial #12
    9 astore_1
    10 new #8
    13 dup
    14 aload_1
    15 invokestatic #23
    18 invokespecial #13
    21 ldc #1
    23 invokevirtual #15
    26 invokevirtual #22
    29 astore_1


    The bytecode at locations 0 through 9 is executed for the first line of code, namely:

         String str = new String("Stanford ");


    Then, the bytecode at location 10 through 29 is executed for the concatenation:

         str += "Lost!!";


    Things get interesting here. The bytecode generated for the concatenation creates a StringBuffer object, then invokes its append method: the temporary StringBuffer object is created at location 10, and its append method is called at location 23. Because the String class is immutable, a StringBuffer must be used for concatenation.

    After the concatenation is performed on the StringBuffer object, it must be converted back into a String. This is done with the call to the toString method at location 26. This method creates a new String object from the temporary StringBuffer object. The creation of this temporary StringBuffer object and its subsequent conversion back into a String object are very expensive.

    In summary, the two lines of code above result in the creation of three objects:

    1. A String object at location 0
    2. A StringBuffer object at location 10
    3. A String object at location 26


    Now, let's look at the bytecode generated for the example using StringBuffer:

    0 new #8 
    3 dup
    4 ldc #2
    6 invokespecial #13
    9 astore_1
    10 aload_1
    11 ldc #1
    13 invokevirtual #15
    16 pop


    The bytecode at locations 0 to 9 is executed for the first line of code:

         StringBuffer str = new StringBuffer("Stanford ");


    The bytecode at location 10 to 16 is then executed for the concatenation:

         str.append("Lost!!");


    Notice that, as is the case in the first example, this code invokes the append method of a StringBuffer object. Unlike the first example, however, there is no need to create a temporary StringBuffer and then convert it into a String object. This code creates only one object, the StringBuffer, at location 0.

    In conclusion, StringBuffer concatenation is significantly faster than String concatenation. Obviously, StringBuffers should be used in this type of operation when possible. If the functionality of the String class is desired, consider using a StringBuffer for concatenation and then performing one conversion to String.

    Tuesday, August 7, 2007

    Different types of SQL Joins

    The main point of Relational
    Databases is to reduce the amount of duplicated data in tables by “normalising”
    the data – that is, reduce duplication and redundant data. For
    instance, if we were creating a table of employees and their departments
    in a business – rather than list the same department over and over
    again, we could split out the departments into their own table and reference
    that table in the employees table as below:

    Employees Table

    Employee_ID First_Name Last_Name Department_ID
    1 Joe Bloggs 1
    2 Fred Jones 2
    3 Bill Smith 3
    4 Ann Jones 2
    5 Sally Dee 1
    6 Jim Lee 2

    Departments Table

    Department_ID Department_Name
    1 Sales
    2 Tech Support
    3 Finance

    The problem then comes
    when you want to list all employees and their department. In order
    to do this we will need to JOIN the 2 tables in the SQL statement to
    get the relevant data back. This is the statement we will use:

    SELECT
    First_Name, Last_Name, Department_Name
    FROM
    Employees
    INNER JOIN Departments ON
    Employee.Department_ID = Department.Department_ID

    Which will give us the
    following

    First_Name Last_Name Department_Name
    Joe Bloggs Sales
    Sally Dee Sales
    Fred Jones Tech Support
    Ann Jones Tech Support
    Jim Lee Tech Support
    Bill Smith Finance

    Things to note from the
    statement:

    • We have specified
      an INNER JOIN. An inner join returns only those rows from the
      listed tables that match the criteria specified in the ON portion of
      the JOIN clause. In this case we have said where Department_ID
      in the Employees Table matches the Department_ID in the Departments
      table.
    • Although here
      we have specified that it is an INNER JOIN, this is the default type
      of join so you can just specify JOIN and get the same results.
    • The JOIN statement
      is in the FROM clause. This is part of the ANSI SQL-92 standard
      for how to format Transact SQL statements. It is also quicker
      and easier for SQL to evaluate joins in the FROM clause before it gets
      to the WHERE clause.
    • It is not required
      but it is considered good practice to write all keywords in upper case.
      Although Query Analyzer will colour-code keywords, this practice will
      help to more easily identify and separate them.

    There were also no duplicate
    column names in our query so we just listed the columns we wanted and
    listed the full table names in the ON portion of the JOIN clause.
    If there were duplicate column names in the query we would have had
    to have specified to SQL Server which table we wanted the column data
    to be displayed from just like in the ON portion otherwise SQL will
    complain that it doesn’t know which table you want the data from.

    If we had to do this,
    it would make our SQL statement look like:

    SELECT
    Employees.First_Name,
    Employees.Last_Name,
    Departments.Department_Name
    FROM
    Employees
    INNER JOIN Departments ON
    Employee.Department_ID = Department.Department_ID

    Which looks unwieldy enough
    when there are just a few columns – can you imagine a statement containing
    30 columns and 4 JOINS written out like this?

    Fortunately, SQL allows
    us to use Table Aliases to replace the table names in most of the statement,
    so we can rewrite our statement as below:

    SELECT
    e.First_Name,
    e.Last_Name,
    d.Department_Name
    FROM
    Employees e
    JOIN Departments d ON
    e.Department_ID = d.Department_ID

    As you can see, we have
    now replaced the table names with the letters ‘e’ and ‘d’ respectively.
    We told SQL to recognise these as the table names in the FROM clause
    when we said:

    Employees e JOIN Departments
    d

    It did not have to be
    a single letter it is just easier (and quicker to type!). Indeed
    you will find that, once you are writing complex queries involving a
    lot of tables, you will have to start using more characters. When
    using aliases, try to use something that will identify the table –
    in our case it was easy as we only had 2 tables so using the first letter
    of each was enough to identify them - If our 2 tables were called Employees
    and Entitlements we would have had to use more than one letter!
    One important note to remember is that, once you have used a table alias
    you must then use it everywhere else in the statement in place of the
    table name.

    Now lets get a bit more
    complex. Instead of listing employees and what department they
    are in, we want to list all the departments and which employees
    work in them. Given the tables above and the knowledge we have
    gained so far we could write a statement such as:

    SELECT
    d.Department_Name,
    e.First_Name,
    e.Last_Name
    FROM
    Departments d
    JOIN Employees e ON
    d.Department_ID = e.Department_ID
    ORDER BY
    Department_Name

    * Note I have thrown in
    an ORDER BY clause to order the listing by department – nothing to
    do with joins it just makes the output neater ;-)

    Which would give us:

    Department_Name First_Name Last_Name
    Finance Bill Smith
    Tech Support Jim Lee
    Tech Support Fred Jones
    Tech Support Ann Jones
    Sales Sally Dee
    Sales Joe Bloggs

    Which is great, and just
    what we wanted. But now, suppose our business expands and adds
    some more departments so that the departments table looks like this:

    Department_ID Department_Name
    1 Sales
    2 Tech Support
    3 Finance
    4 Marketing
    5 Legal

    If we run our statement
    again, we will get exactly the same output! Where are our missing
    departments? Because an inner join will only give us those rows
    that match from both tables we don’t see the Legal department
    or the Marketing department.

    In order to see all the
    departments regardless of whether an employee is assigned to that department
    we will have to use an OUTER JOIN. An outer join returns all rows
    from the outer table and those rows that match from the other (inner)
    table.

    There are 2 types of outer
    join, a LEFT OUTER JOIN and a RIGHT OUTER JOIN. They do the same
    thing except one takes the left most listed table as the outer table
    and the other takes the right most listed table as the outer table.
    It just allows you to be more flexible in how you can write your joins
    and can become invaluable when you have to join 3 or more tables.

    All we have to do to our
    statement is change JOIN to read LEFT OUTER JOIN as below:

    SELECT
    d.Department_Name,
    e.First_Name,
    e.Last_Name
    FROM
    Departments d
    LEFT OUTER JOIN Employees e
    ON d.Department_ID = e.Department_ID
    ORDER BY
    Department_Name

    And we will get the output
    we wanted as below:

    Department_Name First_Name Last_Name
    Legal NULL NULL
    Finance Bill Smith
    Tech Support Jim Lee
    Tech Support Fred Jones
    Tech Support Ann Jones
    Sales Sally Dee
    Sales Joe Bloggs
    Marketing NULL NULL

    Some things to note about
    the statement and the output:

    • The keyword
      OUTER is optional so you can just use LEFT JOIN (or RIGHT JOIN).
    • You can see
      that we now have all the departments listed and where there was no matching
      Department_ID in the Employees table we have the entry NULL against
      First_Name and Last_Name from that table.

    Now lets expand our employees
    table. It is clear from looking at it that we have people working
    in the same departments so lets promote some of them. We will
    record this fact by adding a Manager_ID column. This will be populated
    with an Employee_ID whenever a person has a manager, to indicate who
    that manager is. This will leave our table looking like this:

    Employee_ID First_Name Last_Name Department_ID Manager_ID
    1 Joe Bloggs 1 NULL
    2 Fred Jones 2 NULL
    3 Bill Smith 3 NULL
    4 Ann Jones 2 2
    5 Sally Dee 1 1
    6 Jim Lee 2 2

    This shows that Joe, Fred
    and Bill have no managers (Indicated by the NULL in the column for this
    value). Ann and Jim are managed by Fred and Sally is managed by
    Joe.

    We now need to list all
    employees and the name of their manager. We know from above that,
    as not everyone has a manager, we will have to use an OUTER JOIN but
    we do not have another table to join to. We will have to use a
    SELF JOIN – that is, join the table to itself - and this is where
    table aliases really come into play. The statement we will use
    is this:

    SELECT
    e.First_Name,
    e.Last_Name,
    e2.First_Name
    AS Manager_Name
    FROM
    Employees e
    LEFT JOIN Employees e2 ON
    e.manager_id = e2.employee_id

    First_Name Last_Name Manager_Name
    Joe Bloggs NULL
    Fred Jones NULL
    Bill Smith NULL
    Ann Jones Fred
    Sally Dee Joe
    Jim Lee Fred

    Things to note about the
    statement:

    • Basically we
      have used table aliases to create a “copy” of the employees table
      and called it e2. Now we can use our copy of the employees table
      just as if it was another table.
    • You can also
      see in the statement that I have aliased the column name as well so
      that the column title makes more sense – you can do this in any SQL
      statement if you feel that the actual column name doesn’t make much
      sense. For instance, if we had named our columns Emp_FName and
      Emp_LName it doesn’t look too good on a report so we could alias them
      to First Name and Last Name. You can have spaces in these column
      names as log as you enclose the name in quotes like below:
    SELECT
    e.First_Name
    AS ‘First Name’,
    e.Last_Name
    AS ‘Last Name’,
    e2.First_Name
    AS ‘Manager Name’
    FROM
    Employees e
    LEFT JOIN Employees e2 ON
    e.manager_id = e2.employee_id

    First
    Name
    Last
    Name
    Manager
    Name
    Joe Bloggs NULL

    * As a test, how would
    you rewrite our managers query above so that it listed managers and
    their employees? My solution at the end of the article.

    Up until now we have only
    been dealing with 2 tables. What happens when you need to join
    3 (or more) tables?

    Our employees are hardworking
    so lets allow them to work some overtime. To do this we will add
    2 tables – an Overtime table and an Overtime_Allowed table.

    Why 2 tables? Well,
    we will be restricting who can take overtime based on department and
    to comply with normalisation rules, rather then put all that information
    into one table with lots of duplication we will create 1 table for the
    overtime definition and then what is called a “link” table to contain
    information about both departments and overtime. The new tables
    are below:

    Overtime Table

    OTime_ID OTime_Name Pay_Scale
    1 Weekday 0
    2 Weekend 1.5
    3 Bank Holiday 2

    Overtime_Allowed
    Table

    Dept_ID OTime_ID OTime_Allowed
    1 1 Y
    1 2 N
    1 3 N
    2 1 Y
    2 2 Y
    2 3 Y
    3 1 Y
    3 2 Y
    3 3 N
    4 1 Y
    4 2 Y
    4 3 N
    5 1 Y
    5 2 Y
    5 3 Y

    Hopefully you can now
    see the point of the 2 tables.

    ** NOTE – in reality
    the OTime_Allowed column would be data type BIT (a 1 or a 0 to indicate
    TRUE or FALSE or YES and NO in our case) I have made it a CHAR
    field to keep the statement simple so we don’t get bogged down in understanding
    the syntax of converting a BIT field into a ‘Yes’ or ‘No’ when the results
    are displayed.

    So now suppose we want
    to list all departments and their overtime allowances. In order
    to do this we will have to join the Departments, Overtime and Overtime_Allowed
    tables.

    When joining 3 or more
    tables it is important to understand how SQL processes the joins so
    that you can order them correctly to get the results you need.
    I like to think of multi-table joins as like the game “six degrees
    of Kevin Bacon” - you take each table in turn and relate it to the
    next until finally you have linked Tom Cruise to Kevin Bacon in as few
    hops as possible. But maybe that’s just me!

    SQL processes joins in
    much the same way - It takes the first two tables listed and creates
    a result set for them. It then takes the next table and compares
    it to the temporary table it has created containing the previous result-set
    and, based on the conditions in the ON clause, creates a result set
    and so on until it creates the final table it displays.

    An example is needed:

    SELECT
    d.Department_Name
    AS Department,
    o.OTime_Name
    AS ‘Overtime Allowed’
    FROM
    Departments d
    JOIN Overtime_Allowed oa ON d.Department_ID = oa.Dept_ID
    JOIN Overtime o ON oa.OTime_ID = o.OTime_ID
    WHERE
    oa.OTime_Allowed =
    ‘Y’
    ORDER BY
    d.Department_Name

    Department Overtime
    Allowed
    Finance Weekend
    Finance Weekday
    Legal Weekend
    Legal Weekday
    Legal Bank Holiday
    Marketing Weekend
    Marketing Weekday
    Sales Weekday
    Tech Support Bank Holiday
    Tech Support Weekend
    Tech Support Weekday

    Here you can see that
    we used the link table to get our results by joining Departments to
    Overtime via Overtime_Allowed.

    We have now covered the
    2 main types of join – INNER JOIN and OUTER JOIN and
    their various uses such as SELF JOIN. We have also covered
    joining more than 2 tables. And still there is more to talk about!

    There are 2 other types
    of joins that we haven’t yet covered – CROSS JOIN and FULL OUTER JOIN.
    These are less commonly used joins (ie – I’ve never had course to
    use them!) but can come in handy.

    Cross join:

    A cross join on 2 tables
    will return every possible combination of rows and columns in those
    2 tables. This is called the Cartesian product of the 2 tables.
    This means that if we were to join our Employees and Departments tables
    in a cross join we would get 30 rows returned as Employees has 6 rows
    and Departments has 5 (6*5=30)

    When you specify a cross
    join you can either use the CROSS JOIN keywords or separate the table
    names with a comma:

    SELECT
    First_Name,
    Last_Name,
    Department_Name
    FROM
    Employees
    CROSS JOIN Departments
    SELECT
    First_Name,
    Last_Name,
    Department_Name
    FROM
    Employees,
    Departments

    Both would give the results
    shown below:

    First
    Name
    Last
    Name
    Manager
    Name
    Joe Bloggs Sales
    Fred Jones Sales
    Bill Smith Sales
    Ann Jones Sales
    Sally Dee Sales
    Jim Lee Sales
    Joe Bloggs Tech Support
    Fred Jones Tech Support
    Bill Smith Tech Support
    Ann Jones Tech Support
    Sally Dee Tech Support
    Jim Lee Tech Support
    Joe Bloggs Finance
    Fred Jones Finance
    Bill Smith Finance
    Ann Jones Finance
    Sally Dee Finance
    Jim Lee Finance
    Joe Bloggs Marketing
    Fred Jones Marketing
    Bill Smith Marketing
    Ann Jones Marketing
    Sally Dee Marketing
    Jim Lee Marketing
    Joe Bloggs Legal
    Fred Jones Legal
    Bill Smith Legal
    Ann Jones Legal
    Sally Dee Legal
    Jim Lee Legal

    Full outer join:

    A full outer join on 2
    tables would return all rows from both tables with Null entries in the
    left columns where things don’t match with the right and vice versa.

    For example, suppose a
    temp worker joins the company – temp workers have no department so
    the Department_ID field in Employees would be NULL. We already
    have 2 departments that have no workers that we added earlier so a FULL
    OUTER JOIN between Employees and Departments would then give us:

    SELECT
    First_Name,
    Last_Name,
    Department_Name
    FROM
    Employees FULL
    OUTER JOIN Departments ON
    Employee.Department_ID = Department.Department_ID

    First_Name Last_Name Department_Name
    John Deed NULL
    Joe Bloggs Sales
    Sally Dee Sales
    Fred Jones Tech Support
    Ann Jones Tech Support
    Jim Lee Tech Support
    Bill Smith Finance
    NULL NULL Marketing
    NULL NULL Legal

    So there you go.
    INNER JOIN, OUTER JOIN, SELF JOIN, CROSS JOIN and FULL OUTER JOIN.
    Hopefully you now understand all the various ways SQL allows you to
    join tables and you can make use of this to normalise your database
    and still display the data you need.

    Before we go though, some
    final points on using joins:

    • The columns
      which are used in join statement have to be either the same data type
      or compatible data types. Compatible data types means that SQL
      Server can perform an implicit conversion on the data type when it comes
      to compare the columns.
    • The performance
      of queries using joins can, in general, be increased if all columns
      in a join are indexed.

    The only thing that remains
    is to see if you rewrote the managers and employees query in the same
    way I did – my solution below:

    SELECT
    e.First_Name
    AS Manager_First,
    e.Last_Name
    AS Manager_Last,
    e2.First_Name
    AS Employee
    FROM
    Employees e
    JOIN Employees e2 ON
    e.Employee_ID = e2.Manager_ID
    ORDER BY
    e.First_Name


    […] Database “normalising” The main point of Relational Databases is to reduce the amount of duplicated data in tables by “normalising” the data – that is, reduce duplication and redundant data. […]

    Topics