Exforsys.com
 
Home Tech Articles J2EE
 

HashSet vs TreeSet vs ArrayList

 

It turns out that there are times when a (HashSet|TreeSet)+ArrayList approach is more efficient than a HashSet-only approach, and times when it is not. This is interesting because the former maintain order while the latter does not.

Here is some code that runs tests on most of the cases discussed above. It compares HashSet to TreeSet as well as both combined with ArrayList (to ensure list order).

code:
import java.util.*;
import java.math.*;

public class NoDupsTest
{
/* ---- HashSet approach (list order not maintained) ---- */
public static void test_HashSet(ArrayList a_master)
{
ArrayList a = new ArrayList(a_master);
long cur = System.currentTimeMillis();

HashSet h = new HashSet(a);
a.clear();
a.addAll(h);

long diff = System.currentTimeMillis()-cur;
System.out.println("HashSet approach (no order) = "+diff);
}


/* ---- TreeSet approach (list order not maintained, but sorted) ---- */
public static void test_TreeSet(ArrayList a_master)
{
ArrayList a = new ArrayList(a_master);
long cur = System.currentTimeMillis();

TreeSet t = new TreeSet(a);
a.clear();
Iterator iter = t.iterator();
while (iter.hasNext())
a.add(iter.next());

long diff = System.currentTimeMillis()-cur;
System.out.println(
"TreeSet approach (sorted order) = "+diff+" ms");
}


/* ---- ArrayList and HashSet approach (order maintained) ---- */
public static void test_HashSet_listOrder(ArrayList a_master)
{
ArrayList a = new ArrayList(a_master);
long cur = System.currentTimeMillis();

Set set = new HashSet();
List newList = new ArrayList();
for (Iterator iter = a.iterator(); iter.hasNext(); ) {
Object element = iter.next();
if (set.add(element))
newList.add(element);
}
a.clear();
a.addAll(newList);

long diff = System.currentTimeMillis()-cur;
System.out.println(
"ArrayList/HashSet approach (list order) = "+diff+" ms");
}


/* ---- ArrayList and TreeSet approach ---- */
public static void test_TreeSet_listOrder(ArrayList a_master)
{
ArrayList a = new ArrayList(a_master);
long cur = System.currentTimeMillis();

Set set = new TreeSet();
List newList = new ArrayList();
for (Iterator iter = a.iterator(); iter.hasNext(); ) {
Object element = iter.next();
if (set.add(element))
newList.add(element);
}
a.clear();
a.addAll(newList);

long diff = System.currentTimeMillis()-cur;
System.out.println(
"ArrayList/TreeSet approach (list order) = "+diff+" ms");
}


public static void main(String[] args)
{
int num = Integer.parseInt(args[0]);
int numUnique = Integer.parseInt(args[1]);

/* Create master list */
Random rnd = new Random();
ArrayList masterList = new ArrayList();
for (int i=0; i<num; i++) {
masterList.add(new Integer((Math.abs(rnd.nextInt())%numUnique)));
}

/* Run tests */
test_HashSet(masterList);
test_TreeSet(masterList);
test_HashSet_listOrder(masterList);
test_TreeSet_listOrder(masterList);
}
}


The (crude) code test above takes, as arguments, the number of objects in the list and the number of unique objects (they are distributed randomly).

My sample runs looked like this:

(first arg is number of objects, second arg is number of unique objects -- distributed randomly)

% java NoDupsTest 500000 500000
% java NoDupsTest 500000 250000
etc.

and resulted in

code:

List size = 500,000 Integer objects (Integer implements Comparable)
Unique objects are distributed randomly throughout list.
All times in milliseconds.

AL=ArrayList
ORD=Ordered

--Unordered-- --Ordered--
Num unique HashSet TreeSet HashSet+AL TreeSet+AL Best Best(ORD)
---------- ------- ------- ----------- ---------- -------- ---------
500,000 1372 3284 2283 4216 HashSet HashSet+AL
250,000 1212 2754 1522 2754 HashSet HashSet+AL
100,000 1031 2754 721 2123 HashSet+AL HashSet+AL
50,000 951 1973 541 1833 HashSet+AL HashSet+AL
25,000 921 1622 471 1452 HashSet+AL HashSet+AL
10,000 861 1182 521 1181 HashSet+AL HashSet+AL
5,000 801 932 330 871 HashSet+AL HashSet+AL
1,000 731 661 331 631 HashSet+AL HashSet+AL
100 721 470 250 460 HashSet+AL HashSet+AL


In summary, to efficiently remove duplicate objects from an ArrayList, HashSet+ArrayList is a good general strategy and maintains list order. The main disadvantage is, however, more memory. Most interestingly (although it makes sense when you think about it some), both (HashSet|TreeSet)+AL can be superior to HashSet alone.


Read Next: Chat Application using JSP and Java Servlets



 
Related Topics


 

Comments



Post Your Comment:

Members Please Login
Your Name:*
e-mail ID:(required for notification)*
Image Verification: 
 
 Subscribe    

Sponsored Links

 

Subscribe via RSS


Get Daily Updates via Subscribe to Exforsys Free Training via email


Get Latest Free Training Updates delivered directly to your Inbox...

Enter your email address:


 

Subscribe to Exforsys Free Training via RSS
 

 
Partners -  Privacy and Legal Policy -  Site News -  Contact   Sitemap  

Copyright © 2000 - 2009 exforsys.com. All Rights Reserved

Page copy protected against web site content infringement by Copyscape