KeyedSets
Documentation for KeyedSets
. The latest docs are deployed from the main
branch as stable.
Overview
KeyedSets
provides a type-stable, parametric KeyedSet{S,N}
that stores unique sequences of type S
with associated names of type N
. It supports set-like operations while collecting detailed conflict information about duplicates, same-sequence/different-name, and same-name/different-sequence situations.
Types
KeyedPair{S,N}
: container of asequence::S
andname::N
.KeyedSet{S,N}
: mapping of sequence to name.ConflictSummary{S,N}
: captures conflicts detected during operations.
Constructors
KeyedSet{S,N}()
creates an empty set.KeyedSet(pairs)
constructs and infersS
andN
from the first element. Elements can beKeyedPair{S,N}
or(sequence::S, name::N)
tuples.
Operations
union(a, b)
/union_with_conflicts(a, b)
intersect(a, b)
/intersect_with_conflicts(a, b)
setdiff(a, b)
/setdiff_with_conflicts(a, b)
The _with_conflicts
variants return (result, conflicts::ConflictSummary)
. The Base
-overrides return only result
and log conflicts.
Examples
using KeyedSets
"""
DNA example: sequences as Strings, names as Strings
"""
a = KeyedSet([("ACGT", "seq1"), ("AAA", "alpha")])
b = KeyedSet([("AAA", "alpha2"), ("TTT", "beta")])
res, conf = union_with_conflicts(a, b)
collect(sequences(res)) # => ["ACGT", "AAA", "TTT"] (order not guaranteed)
res["AAA"] # => "alpha" (left wins)
conf.sequence_name_mismatches # => [("AAA", "alpha", "alpha2")]
KeyedSets.ConflictSummary
KeyedSets.KeyedPair
KeyedSets.KeyedSet
KeyedSets.KeyedSet
Base.:==
Base.empty
Base.push!
KeyedSets._name_to_sequences
KeyedSets.intersect_with_conflicts
KeyedSets.names
KeyedSets.sequences
KeyedSets.setdiff_with_conflicts
KeyedSets.union_with_conflicts
KeyedSets.ConflictSummary
— TypeConflictSummary{S,N}
Summary of conflicts detected during set-like operations.
Fields:
duplicates::Vector{KeyedPair{S,N}}
— same sequence and name present in both sidessequence_name_mismatches::Vector{Tuple{S,N,N}}
— same sequence but different names(sequence, left_name, right_name)
name_collisions::Vector{Tuple{N,S,S}}
— same name used for different sequences(name, left_sequence, right_sequence)
KeyedSets.KeyedPair
— TypeKeyedPair{S,N}
Pair of a sequence and its human-readable name.
Fields:
sequence::S
name::N
KeyedSets.KeyedSet
— TypeKeyedSet{S,N,D<:AbstractDict{S,N}}
A collection that maps a sequence (key) to a name (value). Behaves like a set of sequences for set-like operations, while tracking names and reporting conflicts.
Internally stores data::D
, an AbstractDict{S,N}
mapping sequence => name
.
KeyedSets.KeyedSet
— MethodKeyedSet{S,N}()
KeyedSet{S,N}(pairs)
KeyedSet(pairs)
Construct an empty KeyedSet{S,N}
or from an iterable of pairs.
Accepted element forms in pairs
:
KeyedPair{S,N}
(sequence::S, name::N)
tuple
Base.:==
— MethodBase.:(==)(a::KeyedSet, b::KeyedSet)
Equality is based on the set of sequences only (names are ignored). This implementation avoids temporary set allocation.
Base.empty
— MethodBase.empty(ks::KeyedSet)
Return a new empty KeyedSet
with the same type parameters and backing storage type as ks
.
Base.push!
— Methodpush!(ks::KeyedSet, kp)
Insert a KeyedPair
or (sequence, name)
into the set. If the sequence exists with a different name, the existing name is kept and a message is logged.
KeyedSets._name_to_sequences
— Method_name_to_sequences(ks)
Build a mapping from name to the set of sequences using that name.
KeyedSets.intersect_with_conflicts
— Methodintersect_with_conflicts(left::KeyedSet, right::KeyedSet)
Intersection by sequences. If names differ, the name from left
is used. Conflicts are returned in ConflictSummary
.
KeyedSets.names
— Methodnames(ks)
Return an iterator over names stored in the set.
KeyedSets.sequences
— Methodsequences(ks)
Return an iterator over sequences contained in the set.
KeyedSets.setdiff_with_conflicts
— Methodsetdiff_with_conflicts(left::KeyedSet, right::KeyedSet)
Set difference by sequences: elements in left
not in right
. When a name from left
exists in right
but for a different sequence, a name collision is reported.
KeyedSets.union_with_conflicts
— Methodunion_with_conflicts(left::KeyedSet, right::KeyedSet) -> (result, conflicts)
Union by sequences. If a sequence exists on both sides with different names, the name from left
is kept. Conflicts are returned in ConflictSummary
.