Clojure - Data Structures

Table of Contents

nil
Numbers
Strings
Characters
Keywords
Symbols
Collections
Lists (IPersistentList)
Vectors (IPersistentVector)
Maps (IPersistentMap)
StructMaps
ArrayMaps
Sets

Clojure has a rich set of data structures. They share a set of properties:

They are immutable
They are read-able
They support proper value equality semantics in their implementation of equals
They provide good hash values
In addition, the collections:
- Are manipulated via interfaces.
- Support sequencing
- Support persistent manipulation.
- Support metadata
- Implement java.lang.Iterable
- Implement the non-optional (read-only) portion of java.util.Collection or java.util.Map

nil

nil is a possible value of any data type in Clojure. nil has the same value as Java null. The Clojure conditional system is based around nil and false, with nil and false representing the values of logical falsity in conditional tests - anything else is logical truth. In addition, nil is used as the end-of-sequence sentinel value in the sequence protocol.

Numbers

Clojure provides full support for JVM primitive values by default, allowing high performance, idiomatic Clojure code for numeric applications.

Clojure also supports the Java boxed number types derived from java.lang.Number, including BigInteger and BigDecimal, plus its own Ratio type. There is some special handling:

Longs

By default Clojure operates with natural numbers as instances of Java’s long primitive type. When a primitive integer operation results in a value that too large to be contained in a primitive value, a java.lang.ArithmeticException is thrown. Clojure provides a set of alternative math operators suffixed with an apostrophe: +', -', *', inc', and dec'. These operators auto-promote to BigInt upon overflow, but are less efficient than the regular math operators.

Ratio

Represents a ratio between integers. Division of integers that can’t be reduced to an integer yields a ratio, i.e. 22/7 = 22/7, rather than a floating point or truncated value.

Contagion

BigInts and floating point types are "contagious" across operations. That is, any integer operation involving a BigInt will result in a BigInt, and any operation involving a double or float will result in a double.

BigInt and BigDecimal literals

Numeric literals for BigInt and BigDecimal are specified using a postfix N and M respectively.

Example expression Return value

Example expression	Return value
`(== 1 1.0 1M)`	`true`
`(/ 2 3)`	`2/3`
`(/ 2.0 3)`	`0.6666666666666666`
`(map #(Math/abs %) (range -3 3))`	`(3 2 1 0 1 2)`
`(class 36786883868216818816N)`	`clojure.lang.BigInt`
`(class 3.14159265358M)`	`java.math.BigDecimal`

(== 1 1.0 1M)

true

(/ 2 3)

2/3

(/ 2.0 3)

0.6666666666666666

(map #(Math/abs %) (range -3 3))

(3 2 1 0 1 2)

(class 36786883868216818816N)

clojure.lang.BigInt

(class 3.14159265358M)

java.math.BigDecimal

Related functions

Computation: + - * / inc dec quot rem min max
Auto-promoting computation: +' -' *' inc' dec'
Comparison: == < <= > >= zero? pos? neg?
Bitwise operations: bit-and bit-or bit-xor bit-not bit-shift-right bit-shift-left
Ratios: numerator denominator
Coercions: int bigdec bigint double float long num short

Strings

Clojure strings are Java Strings. See also Printing.

user=> (map (fn [x] (.toUpperCase x)) (.split "Dasher Dancer Prancer" " "))
("DASHER" "DANCER" "PRANCER")

Related functions

str string? pr-str prn-str print-str println-str with-out-str

Characters

Clojure characters are Java Characters.

Related functions

char char-name-string char-escape-string

Keywords

Keywords are symbolic identifiers that evaluate to themselves. They provide very fast equality tests. Like Symbols, they have names and optional namespaces, both of which are strings. The leading ':' is not part of the namespace or name.

Keywords implement IFn for invoke() of one argument (a map) with an optional second argument (a default value). For example (:mykey my-hash-map :none) means the same as (get my-hash-map :mykey :none). See get.

Related functions

keyword keyword?

Symbols

Symbols are identifiers that are normally used to refer to something else. They can be used in program forms to refer to function parameters, let bindings, class names and global vars. They have names and optional namespaces, both of which are strings. Symbols can have metadata (see with-meta).

Symbols, just like Keywords, implement IFn for invoke() of one argument (a map) with an optional second argument (a default value). For example ('mysym my-hash-map :none) means the same as (get my-hash-map 'mysym :none). See get.

Related functions

symbol symbol? gensym (see also the #-suffix reader macro)

Collections

All of the Clojure collections are immutable and persistent. In particular, the Clojure collections support efficient creation of 'modified' versions, by utilizing structural sharing, and make all of their performance bound guarantees for persistent use. The collections are efficient and inherently thread-safe. Collections are represented by abstractions, and there may be one or more concrete realizations. In particular, since 'modification' operations yield new collections, the new collection might not have the same concrete type as the source collection, but will have the same logical (interface) type.

All the collections support count for getting the size of the collection, conj for 'adding' to the collection, and seq to get a sequence that can walk the entire collection, though their specific behavior is slightly different for different types of collections.

Because collections support the seq function, all of the sequence functions can be used with any collection.

Java collection hashes

The Java collection interfaces specify algorithms for Lists, Sets, and Maps in calculating hashCode() values. All Clojure collections conform to these specifications in their hashCode() implementations.

Clojure collection hashes

Clojure provides its own hash computations that provide better hash properties for collections (and other types), known as the hasheq value.

The IHashEq interface marks collections that provide the hasheq() function to obtain the hasheq value. In Clojure, the hash function can be used to compute the hasheq value.

Ordered collections (vector, list, seq, etc) must use the following algorithm for calculating hasheq (where hash computes hasheq). Note that unchecked-add-int and unchecked-multiply-int are used to get integer overflow calculations.

(defn hash-ordered [collection]
  (-> (reduce (fn [acc e] (unchecked-add-int
                            (unchecked-multiply-int 31 acc)
                            (hash e)))
              1
              collection)
      (mix-collection-hash (count collection))))

Unordered collections (maps, sets) must use the following algorithm for calculating hasheq. A map entry is treated as an ordered collection of key and value. Note that unchecked-add-int is used to get integer overflow calculations.

(defn hash-unordered [collection]
  (-> (reduce unchecked-add-int 0 (map hash collection))
      (mix-collection-hash (count collection))))

The mix-collection-hash algorithm is an implementation detail subject to change.

Lists (IPersistentList)

Lists are collections. They implement the ISeq interface directly. (Note that the empty list implements ISeq as well, however the seq function will always return nil for an empty sequence.) count is O(1). conj puts the item at the front of the list.

Related functions

Create a list: list list*
Treat a list like a stack: peek pop
Examine a list: list?

Vectors (IPersistentVector)

A Vector is a collection of values indexed by contiguous integers. Vectors support access to items by index in log32N hops. count is O(1). conj puts the item at the end of the vector. Vectors also support rseq, which returns the items in reverse order. Vectors implement IFn, for invoke() of one argument, which they presume is an index and look up in themselves as if by nth, i.e. vectors are functions of their indices. Vectors are compared first by length, then each element is compared in order.

Related functions

Create a vector: vector vec vector-of
Examine a vector: get nth peek rseq vector?
'change' a vector: assoc pop subvec replace

Maps (IPersistentMap)

A Map is a collection that maps keys to values. Two different map types are provided - hashed and sorted. Hash maps require keys that correctly support hashCode and equals. Sorted maps require keys that implement Comparable, or an instance of Comparator. Hash maps provide faster access (log32N hops) vs (logN hops), but sorted maps are, well, sorted. count is O(1). conj expects another (possibly single entry) map as the item, and returns a new map which is the old map plus the entries from the new, which may overwrite entries of the old. conj also accepts a MapEntry or a vector of two items (key and value). seq returns a sequence of map entries, which are key/value pairs. Sorted map also supports rseq, which returns the entries in reverse order. Maps implement IFn, for invoke() of one argument (a key) with an optional second argument (a default value), i.e. maps are functions of their keys. nil keys and values are ok.

Related functions

Create a new map: hash-map sorted-map sorted-map-by
'change' a map: assoc dissoc select-keys merge merge-with zipmap
Examine a map: get contains? find keys vals map?
Examine a map entry: key val

StructMaps

Most uses of StructMaps would now be better served by records.

Often many map instances have the same base set of keys, for instance when maps are used as structs or objects would be in other languages. StructMaps support this use case by efficiently sharing the key information, while also providing optional enhanced-performance accessors to those keys. StructMaps are in all ways maps, supporting the same set of functions, are interoperable with all other maps, and are persistently extensible (i.e. struct maps are not limited to their base keys). The only restriction is that you cannot dissociate a struct map from one of its base keys. A struct map will retain its base keys in order.

StructMaps are created by first creating a structure basis object using create-struct or defstruct, then creating instances with struct-map or struct.

(defstruct desilu :fred :ricky)
(def x (map (fn [n]
              (struct-map desilu
                :fred n
                :ricky 2
                :lucy 3
                :ethel 4))
             (range 100000)))
(def fred (accessor desilu :fred))
(reduce (fn [n y] (+ n (:fred y))) 0 x)
 -> 4999950000
(reduce (fn [n y] (+ n (fred y))) 0 x)
 -> 4999950000

Related functions

StructMap setup: create-struct defstruct accessor
Create individual struct: struct-map struct

ArrayMaps

When doing code form manipulation it is often desirable to have a map which maintains key order. An array map is such a map - it is simply implemented as an array of key val key val… As such, it has linear lookup performance, and is only suitable for very small maps. It implements the full map interface. New ArrayMaps can be created with the array-map function. Note that an array map will only maintain sort order when un-'modified'. Subsequent assoc-ing will eventually cause it to 'become' a hash-map.

Sets

Sets are collections of unique values.

There is literal support for hash-sets:

#{:a :b :c :d}
-> #{:d :a :b :c}

You can create sets with the hash-set and sorted-set functions:

(hash-set :a :b :c :d)
-> #{:d :a :b :c}

(sorted-set :a :b :c :d)
-> #{:a :b :c :d}

You can also get a set of the values in a collection using the set function:

(set [1 2 3 2 1 2 3])
-> #{1 2 3}

Sets are collections:

(def s #{:a :b :c :d})
(conj s :e)
-> #{:d :a :b :e :c}

(count s)
-> 4

(seq s)
-> (:d :a :b :c)

(= (conj s :e) #{:a :b :c :d :e})
-> true

Sets support 'removal' with disj, as well as contains? and get, the latter returning the object that is held in the set which compares equal to the key, if found:

(disj s :d)
-> #{:a :b :c}

(contains? s :b)
-> true

(get s :a)
-> :a

Sets are functions of their members, using get:

(s :b)
-> :b

(s :k)
-> nil

Clojure provides basic set operations like union / difference / intersection, as well as some pseudo-relational algebra support for 'relations', which are simply sets of maps - select / index / rename / join.

← Other Functions Datatypes →