alt
Advertisement
Online Training
Career Series
Exforsys
Exforsys arrow Tech Articles arrow J2EE arrow HashSet vs TreeSet vs ArrayList
Site Search


HashSet vs TreeSet vs ArrayList

It turns out that there are times when a (HashSet|TreeSet)+ArrayList approach is more efficient than a HashSet-only approach, and times when it is not. This is interesting because the former maintain order while the latter does not.

Here is some code that runs tests on most of the cases discussed above. It compares HashSet to TreeSet as well as both combined with ArrayList (to ensure list order).

code:
import java.util.*;
import java.math.*;

public class NoDupsTest
{
/* ---- HashSet approach (list order not maintained) ---- */
public static void test_HashSet(ArrayList a_master)
{
ArrayList a = new ArrayList(a_master);
long cur = System.currentTimeMillis();

HashSet h = new HashSet(a);
a.clear();
a.addAll(h);

long diff = System.currentTimeMillis()-cur;
System.out.println("HashSet approach (no order) = "+diff);
}


/* ---- TreeSet approach (list order not maintained, but sorted) ---- */
public static void test_TreeSet(ArrayList a_master)
{
ArrayList a = new ArrayList(a_master);
long cur = System.currentTimeMillis();

TreeSet t = new TreeSet(a);
a.clear();
Iterator iter = t.iterator();
while (iter.hasNext())
a.add(iter.next());

long diff = System.currentTimeMillis()-cur;
System.out.println(
"TreeSet approach (sorted order) = "+diff+" ms");
}


/* ---- ArrayList and HashSet approach (order maintained) ---- */
public static void test_HashSet_listOrder(ArrayList a_master)
{
ArrayList a = new ArrayList(a_master);
long cur = System.currentTimeMillis();

Set set = new HashSet();
List newList = new ArrayList();
for (Iterator iter = a.iterator(); iter.hasNext(); ) {
Object element = iter.next();
if (set.add(element))
newList.add(element);
}
a.clear();
a.addAll(newList);

long diff = System.currentTimeMillis()-cur;
System.out.println(
"ArrayList/HashSet approach (list order) = "+diff+" ms");
}


/* ---- ArrayList and TreeSet approach ---- */
public static void test_TreeSet_listOrder(ArrayList a_master)
{
ArrayList a = new ArrayList(a_master);
long cur = System.currentTimeMillis();

Set set = new TreeSet();
List newList = new ArrayList();
for (Iterator iter = a.iterator(); iter.hasNext(); ) {
Object element = iter.next();
if (set.add(element))
newList.add(element);
}
a.clear();
a.addAll(newList);

long diff = System.currentTimeMillis()-cur;
System.out.println(
"ArrayList/TreeSet approach (list order) = "+diff+" ms");
}


public static void main(String[] args)
{
int num = Integer.parseInt(args[0]);
int numUnique = Integer.parseInt(args[1]);

/* Create master list */
Random rnd = new Random();
ArrayList masterList = new ArrayList();
for (int i=0; i<num; i++) {
masterList.add(new Integer((Math.abs(rnd.nextInt())%numUnique)));
}

/* Run tests */
test_HashSet(masterList);
test_TreeSet(masterList);
test_HashSet_listOrder(masterList);
test_TreeSet_listOrder(masterList);
}
}


The (crude) code test above takes, as arguments, the number of objects in the list and the number of unique objects (they are distributed randomly).

My sample runs looked like this:

(first arg is number of objects, second arg is number of unique objects -- distributed randomly)

% java NoDupsTest 500000 500000
% java NoDupsTest 500000 250000
etc.

and resulted in

code:

List size = 500,000 Integer objects (Integer implements Comparable)
Unique objects are distributed randomly throughout list.
All times in milliseconds.

AL=ArrayList
ORD=Ordered

--Unordered-- --Ordered--
Num unique HashSet TreeSet HashSet+AL TreeSet+AL Best Best(ORD)
---------- ------- ------- ----------- ---------- -------- ---------
500,000 1372 3284 2283 4216 HashSet HashSet+AL
250,000 1212 2754 1522 2754 HashSet HashSet+AL
100,000 1031 2754 721 2123 HashSet+AL HashSet+AL
50,000 951 1973 541 1833 HashSet+AL HashSet+AL
25,000 921 1622 471 1452 HashSet+AL HashSet+AL
10,000 861 1182 521 1181 HashSet+AL HashSet+AL
5,000 801 932 330 871 HashSet+AL HashSet+AL
1,000 731 661 331 631 HashSet+AL HashSet+AL
100 721 470 250 460 HashSet+AL HashSet+AL


In summary, to efficiently remove duplicate objects from an ArrayList, HashSet+ArrayList is a good general strategy and maintains list order. The main disadvantage is, however, more memory. Most interestingly (although it makes sense when you think about it some), both (HashSet|TreeSet)+AL can be superior to HashSet alone.


Trackback(0)
Comments (0)add comment

Write comment

busy
 
< Prev   Next >
Sponsored Links
© 2008 Exforsys.com
Joomla! is Free Software released under the GNU/GPL License.
Page copy protected against web site content infringement by Copyscape