Friday, 6 July 2007

Java Collections # 3

Hi Guys,

As i said earlier in this post we will take a look at the other part of the java collection. And this other part of the java collections family is "Set". Refer to the figure availabe in the post Java Collections # 1 !

In this episode we will take a look at Set and its implementations. Sets contains unique elements. That means duplicates are not allowed.
Although some implementations might not allow even null, those implementations that allow null, they will allow ONLY one null.
Set is an interface. Following is the list of classes that implement the interface directly or indirectly :-

HashSet
LinkedHashSet
TreeSet

Following program will demonstrate to you, the ordering of the elements in the concrete class :-
-------------------------------------------
Listing #2
-------------------------------------------
package dev.data.workSpace.projects.scjp.section9.set;

import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.Set;
import java.util.TreeSet;


public class SetExampleTester {

/**
* Omkar
*
* 15 Jun, 2007
* 9:43:56 AM
* SetExampleTester
* main
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub

Set set ;
HashSet hashSet = new HashSet();
LinkedHashSet linkedHashSet = new LinkedHashSet();
TreeSet treeSet = new TreeSet();

// Setting the reference of Hashset to Set.
set = hashSet;
/*set.add("Omkar");
set.add("maria");
set.add("MariA");
set.add("Patkar");
set.add("Sharapova");
set.add("OMKAR");*/
set.add(new Integer(12));
set.add(new Integer(-12));
set.add(new Integer(34));
set.add(new Integer(-67));
set.add(new Integer(990));
set.add(new Integer(0));

System.out.println("set "+set);
System.out.println("Thus the elements in the HashSet are not in any particular order.\n");

// Setting the reference of Hashset to LinkedHashSet.
set = linkedHashSet ;
/*set.add("Omkar");
set.add("maria");
set.add("MariA");
set.add("Patkar");
set.add("Sharapova");
set.add("OMKAR");*/
set.add(new Integer(12));
set.add(new Integer(-12));
set.add(new Integer(34));
set.add(new Integer(-67));
set.add(new Integer(990));
set.add(new Integer(0));

System.out.println("set "+set);
System.out.println("Thus the order of elements in the treeset is same as the insertion order.\n");

// Setting the reference of Hashset to TreeSet.
set = treeSet ;
/*set.add("Omkar");
set.add("maria");
set.add("MariA");
set.add("Patkar");
set.add("Sharapova");
set.add("OMKAR");*/
set.add(new Integer(12));
set.add(new Integer(-12));
set.add(new Integer(34));
set.add(new Integer(-67));
set.add(new Integer(990));
set.add(new Integer(0));

System.out.println("set "+set);
System.out.println("Thus the order of elements in the treeset is sorted.");


}

}



-------------------------------------------------------------
The output of this program is as follows :-
-------------------------------------------------------------
set [0, 34, 990, -12, -67, 12]
Thus the elements in the HashSet are not in any particular order.

set [12, -12, 34, -67, 990, 0]
Thus the order of elements in the treeset is same as the insertion order.

set [-67, -12, 0, 12, 34, 990]
Thus the order of elements in the treeset is sorted.

----------------------------------
---------------------------


Thus the output of the program itself demonstrates for the ordering of the elements in these classes.

Since we have seen the core functionality that Collection interface provides,....we will not get into the details of how Set implementations honor these contracts... after all sets are also, collections ! :)

But Here in sets there is an additional functionality that we should discuss here!
And that functionality is subset functionality ! Remember, ONLY "SortedSet" and its implementation have this funcionality !

We will take a look at following program that will illustrate how the subset functionality is implemented. For this we take example of TreeSet in our program
-------------------------------------------
Listing #3
-------------------------------------------
package dev.data.workSpace.projects.scjp.section9.set;

import java.util.Set;
import java.util.TreeSet;

public class SetExampleTesterTree1 {

/**
* Omkar
* 2 Jul, 2007
* 8:24:33 PM
* SetExampleTesterTree1
* main
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
TreeSet treeSet = new TreeSet();
/*treeSet.add(new Integer(12));
treeSet.add(new Integer(-12));
treeSet.add(new Integer(34));
treeSet.add(new Integer(-67));
treeSet.add(new Integer(990));
treeSet.add(new Integer(0));*/
treeSet.add("omkar");
treeSet.add("anant");
treeSet.add("patkar");
treeSet.add("java");
treeSet.add("sun");
treeSet.add("microsystem");
treeSet.add("technology");
treeSet.add("open source");
treeSet.add("micosoft sucks !");
treeSet.add("c# is copy of java");


System.out.println("treeSet "+treeSet);

Set subRightSet = treeSet.subSet("java","sun");
System.out.println("subset "+subRightSet+" ... This subset is a \"half-open range\" subset ---> From Right hand side!");
Set subLeftSet = treeSet.subSet("java"+"\0","sun");
System.out.println("subset "+subLeftSet+" ... This subset is a \"open range\" subset ---> From Left hand side!");
Set fullCloseSubSet = treeSet.subSet("java","sun"+"\0");
System.out.println("subset "+fullCloseSubSet+" ... This subset is a \"closed range\" subset !");

treeSet = new TreeSet();
treeSet.add(new Integer(1));
treeSet.add(new Integer(12));
treeSet.add(new Integer(31));
treeSet.add(new Integer(45));
treeSet.add(new Integer(-91));
treeSet.add(new Integer(867));
treeSet.add(new Integer(432));
treeSet.add(new Integer(75));
treeSet.add(new Integer(21));
treeSet.add(new Integer(1091));


System.out.println("\n\n "+treeSet);
subRightSet = treeSet.subSet(new Integer(21),new Integer(432));
System.out.println("subset "+subRightSet+" ... This subset is a \"half-open range\" subset ---> From Right hand side!");
subLeftSet = treeSet.subSet(new Integer(22),new Integer(432));
System.out.println("subset "+subLeftSet+" ... This subset is a \"open range\" subset ---> From Left hand side!");
fullCloseSubSet = treeSet.subSet(new Integer(21),new Integer(433));
System.out.println("subset "+fullCloseSubSet+" ... This subset is a \"closed range\" subset !");

}

}

-------------------------------------------

And the output of the program is :-

treeSet [anant, c# is copy of java, java, micosoft sucks !, microsystem, omkar, open source, patkar, sun, technology]
subset [java, micosoft sucks !, microsystem, omkar, open source, patkar] ... This subset is a "half-open range" subset ---> From Right hand side!
subset [micosoft sucks !, microsystem, omkar, open source, patkar] ... This subset is a "open range" subset ---> From Left hand side!
subset [java, micosoft sucks !, microsystem, omkar, open source, patkar, sun] ... This subset is a "closed range" subset !


[-91, 1, 12, 21, 31, 45, 75, 432, 867, 1091]
subset [21, 31, 45, 75] ... This subset is a "half-open range" subset ---> From Right hand side!
subset [31, 45, 75] ... This subset is a "open range" subset ---> From Left hand side!
subset [21, 31, 45, 75, 432] ... This subset is a "closed range" subset !

-------------------------------------------

Now let us try to understand how the subSet() function works, and subsequently why we got such a result :-

subSet() function returns a SortedSet with the lower value inclusive and the higher value exclusive. This is called as half-open range ! Therefore when we say :-

treeSet.subSet("java","sun"); ... then,

out of the set => [anant, c# is copy of java, java, micosoft sucks !, microsystem, omkar, open source, patkar, sun, technology]

a subset as [java, micosoft sucks !, microsystem, omkar, open source, patkar] is obtained ...as u can see the lower value "java" is included in the subset and the higher value "sun" is not included !

But lets say we want to include the higher value also,...then what do we do ?
For that we have used something like this :-

treeSet.subSet("java","sun"+"\0");

because of this now the subset that is returned will contain both "java" and "sun" how was this possible ? .... For that we need to understand how the subSet() function works ! When we specify two arguments to this function, then this function uses these arguments to return a subset as follows :-
The first argument is taken and checked if this argument is greater than or equal to any element in the sorted set. If an element is found... then that element will be the first element in the subset that will be returned .... now an element is searched in the original list such that the element is less than the second argument that we have provided. If an element is found then that element will be the last element of the subset that will be returned.

Therefore now u know when we called the function like this :-

treeSet.subSet("java","sun");

at that time an element "java" was found which is greater than equal to the first argument and hence the subset that is returned contains this element as the first element. Now, an element "patkar" is obtained that is less than "sun" and hence this element becomes the last element of the subset that is returned. Hence this subset is half open subset as it does not include the higher value i.e., "sun".

NOTE :- The comparisons that took place for these objects of type String were lexicographical which means "java" is lexicographically less than "sun".

So why use only "\0" to include the higher value ?

Well, based upon the discussion in the above paragraph, when the elements in the set are compared to the second argument an element "sun" is found which is lexicographically less than "sun"+"\0" ...i.e., "sun\0". That means the second argument could have been any string that starts with "sun" and ends with any thing ! .... Therefore if u provide, String like "sunoja" OR "sun23" OR "sunny" as the second argument it would have given the same result with "sun\0" as the second argument !

hmmm.... ! So what are we achieving by calling :-

treeSet.subSet("java"+"\0","sun");

Again, the same FUNDA ! elements in the set are searched to see if there is any element that is greater than or equal to the 1st argument that is "java"+"\0" ...i.e., "java\0". Now, there is an element "java" in the set, but it is lexicographically less than "java\0"...therefore it is automatically excluded from the subSet that will be returned. The element next to "java" in the set is "microsoft sucks!" which is lexicographically greater than or equal to "java\0" ...and hence it is selected as the starting element of the subSet that is returned.
So if u observe now the subset that is returned neither includes the lower value nor the higher value that we had specified as two arguments in the subSet() function. Such a range is called as the Closed range !

I know i know ,....this seemingly simple thing subSet() is looking so complex because of this Lexicographical thing. And that is the reason why i have included the example of subSet using Integers. I thought that playing with numbers will be always easy compared to playing with the Strings. Just go through the listing and the output,...and soon the funda of subSet will be understood by you.

functions like tailSet(), headSet() are also based upon similar principle.

Well thats enough for today, in the next episode we will meet another family which claims to be a distant relative of Collections - yes ! ..ladies and gentlemen stay tuned for next episode of the new family of Collections - "The Map" family !

Thanks and Regards
Omkar Patkar.

1 comments:

Anonymous said...

Hi Omkar,

Any post on Maps??

-Saurabh