Haskell is a purely functional language and efficiency is quite often a major concern. A well designed Haskell program performs on par with Java, but not as fast as C.

The nub function (defined in Data.List) eliminates duplicate values from a list. It is an oft-used function for Haskell programmers. However it has a worst case complexity of O(n²), which means it is almost useless for very large lists. This blog post presents alternative definitions for reducing complexity under different circumstances.

First, let’s look at (one of) the standard definition of the nub function:

nub :: (Eq a) => [a] -> [a] nub [] = [] nub (x:xs) = x : nub (filter (\y -> x /= y) xs)

This definition compares each element to the (rest of) the list and eliminates the duplicates. The complexity is O(n²), or to be more precise ½O(n²).

The simplest way to reduce the worst case complexity from O(n²) to O(n . log n) is to convert it to a set and then back to a list.

import qualified Data.Set as S nub' :: (Ord a) -> [a] -> [a] nub' = Set.toList . Set.fromList

By definition a set contains only unique values, so this will obviously eliminate all duplicates from the list. However this requires the elements of a list to be Orderable (members of the Ord class), as Eq class is not sufficient. Continue reading Haskell’s nub function and complexity