KeyedSets

Documentation for KeyedSets. The latest docs are deployed from the main branch as stable.

Overview

KeyedSets provides a type-stable, parametric KeyedSet{S,N} that stores unique sequences of type S with associated names of type N. It supports set-like operations while collecting detailed conflict information about duplicates, same-sequence/different-name, and same-name/different-sequence situations.

Types

  • KeyedPair{S,N}: container of a sequence::S and name::N.
  • KeyedSet{S,N}: mapping of sequence to name.
  • ConflictSummary{S,N}: captures conflicts detected during operations.

Constructors

  • KeyedSet{S,N}() creates an empty set.
  • KeyedSet(pairs) constructs and infers S and N from the first element. Elements can be KeyedPair{S,N} or (sequence::S, name::N) tuples.

Operations

  • union(a, b) / union_with_conflicts(a, b)
  • intersect(a, b) / intersect_with_conflicts(a, b)
  • setdiff(a, b) / setdiff_with_conflicts(a, b)

The _with_conflicts variants return (result, conflicts::ConflictSummary). The Base-overrides return only result and log conflicts.

Examples

using KeyedSets

"""
DNA example: sequences as Strings, names as Strings
"""
a = KeyedSet([("ACGT", "seq1"), ("AAA", "alpha")])
b = KeyedSet([("AAA", "alpha2"), ("TTT", "beta")])

res, conf = union_with_conflicts(a, b)
collect(sequences(res))         # => ["ACGT", "AAA", "TTT"] (order not guaranteed)
res["AAA"]                      # => "alpha" (left wins)
conf.sequence_name_mismatches   # => [("AAA", "alpha", "alpha2")]
KeyedSets.ConflictSummaryType
ConflictSummary{S,N}

Summary of conflicts detected during set-like operations.

Fields:

  • duplicates::Vector{KeyedPair{S,N}} — same sequence and name present in both sides
  • sequence_name_mismatches::Vector{Tuple{S,N,N}} — same sequence but different names (sequence, left_name, right_name)
  • name_collisions::Vector{Tuple{N,S,S}} — same name used for different sequences (name, left_sequence, right_sequence)
source
KeyedSets.KeyedSetType
KeyedSet{S,N,D<:AbstractDict{S,N}}

A collection that maps a sequence (key) to a name (value). Behaves like a set of sequences for set-like operations, while tracking names and reporting conflicts.

Internally stores data::D, an AbstractDict{S,N} mapping sequence => name.

source
KeyedSets.KeyedSetMethod
KeyedSet{S,N}()
KeyedSet{S,N}(pairs)
KeyedSet(pairs)

Construct an empty KeyedSet{S,N} or from an iterable of pairs.

Accepted element forms in pairs:

  • KeyedPair{S,N}
  • (sequence::S, name::N) tuple
source
Base.:==Method
Base.:(==)(a::KeyedSet, b::KeyedSet)

Equality is based on the set of sequences only (names are ignored). This implementation avoids temporary set allocation.

source
Base.emptyMethod
Base.empty(ks::KeyedSet)

Return a new empty KeyedSet with the same type parameters and backing storage type as ks.

source
Base.push!Method
push!(ks::KeyedSet, kp)

Insert a KeyedPair or (sequence, name) into the set. If the sequence exists with a different name, the existing name is kept and a message is logged.

source
KeyedSets.intersect_with_conflictsMethod
intersect_with_conflicts(left::KeyedSet, right::KeyedSet)

Intersection by sequences. If names differ, the name from left is used. Conflicts are returned in ConflictSummary.

source
KeyedSets.setdiff_with_conflictsMethod
setdiff_with_conflicts(left::KeyedSet, right::KeyedSet)

Set difference by sequences: elements in left not in right. When a name from left exists in right but for a different sequence, a name collision is reported.

source
KeyedSets.union_with_conflictsMethod
union_with_conflicts(left::KeyedSet, right::KeyedSet) -> (result, conflicts)

Union by sequences. If a sequence exists on both sides with different names, the name from left is kept. Conflicts are returned in ConflictSummary.

source