API Reference

Public API

NaNTracker.nantrackFunction
nantrack(model)

Wrap trackable leaf layers of model with NaNCheck for forward and backward NaN detection. Returns a structurally identical model that throws DomainError (including the layer's KeyPath) at the first NaN.

The function uses Functors.fmap_with_path to walk the model tree and only wraps layers for which trackable returns true. Already-wrapped NaNCheck nodes are left unchanged (safe to call twice).

Stats tracking

Enable enable_stats!() before a training step to record norm and maxabs of every activation and gradient at each checked layer. When NaN is detected the recent trajectory is printed automatically. Query stats at any time with dump_stats() or recent_stats().

See also nanuntrack, trackable, enable_stats!.

source
NaNTracker.trackableFunction
trackable(::KeyPath, layer) :: Bool

Predicate that decides whether layer should be wrapped with NaNCheck. Returns true for common Flux leaf layers (Dense, Embedding, LayerNorm, Scale, Conv).

Functions are not wrapped. Pure functions (relu, swish, identity, etc.) have no parameters and cannot introduce NaN through weights. Wrapping them breaks GPU broadcasting (the NaNCheck wrapper is not isbits, which CUDA kernels require) and interferes with libraries like Onion that store activation functions as struct fields and broadcast them over GPU arrays.

Extend for your own leaf layers:

NaNTracker.trackable(::KeyPath, ::MyCustomLeaf) = true
source
NaNTracker.NaNCheckType
NaNCheck{P,L}

Thin wrapper around a Flux layer that checks for NaN on every forward and backward pass. P is the path type (KeyPath), L is the wrapped layer type.

This struct:

  • Has no custom rrule — the inner layer is differentiated normally by whatever AD backend is active.
  • Forwards getproperty for unknown fields to the wrapped layer, making it transparent to code that accesses layer internals (e.g. .weight).
  • Is registered with Functors.@functor (not Flux.@layer) so that fmap / Optimisers.update! can reach the trainable parameters inside layer.
source

Stats Tracking

NaNTracker.enable_stats!Function
enable_stats!(; capacity=1000)

Turn on activation/gradient stats collection. Each forward input, forward output, and gradient flowing through a NaNCheck layer records norm, maxabs, and NaN/Inf flags into a ring buffer.

Note: On GPU this introduces sync points (scalar transfers) at every checked layer. Use for debugging only.

source
NaNTracker.recent_statsFunction
recent_stats(; n=50, path_contains="")

Return recent StatsEntry records. Optionally filter by path substring. Returns empty vector when stats are disabled.

source
NaNTracker.dump_statsFunction
dump_stats(; n=50, path_contains="", io=stderr)

Print recent stats entries to io. Useful for inspecting activation/gradient magnitudes during training without waiting for a NaN.

Example

enable_stats!()
# ... run one training step ...
dump_stats(path_contains="attention")  # show only attention layers
clear_stats!()                          # reset for next step
source

Internal

These are not exported but can be extended.

NaNTracker.hasnanFunction
hasnan(x) :: Bool

Check whether x contains any NaN values. Dispatches on type so it works for arrays, scalars, tuples, and falls back to false for anything else.

source