API Reference

Public API

NaNTracker.nantrack — Function

nantrack(model)

Wrap trackable leaf layers of model with NaNCheck for forward and backward NaN detection. Returns a structurally identical model that throws DomainError (including the layer's KeyPath) at the first NaN.

The function uses Functors.fmap_with_path to walk the model tree and only wraps layers for which trackable returns true. Already-wrapped NaNCheck nodes are left unchanged (safe to call twice).

Stats tracking

Enable enable_stats!() before a training step to record norm and maxabs of every activation and gradient at each checked layer. When NaN is detected the recent trajectory is printed automatically. Query stats at any time with dump_stats() or recent_stats().

source

NaNTracker.nanuntrack — Function

nanuntrack(model)

Strip all NaNCheck wrappers, restoring the original model.

source

NaNTracker.trackable — Function

trackable(::KeyPath, layer) :: Bool

Predicate that decides whether layer should be wrapped with NaNCheck. Returns true for common Flux leaf layers (Dense, Embedding, LayerNorm, Scale, Conv).

Functions are not wrapped. Pure functions (relu, swish, identity, etc.) have no parameters and cannot introduce NaN through weights. Wrapping them breaks GPU broadcasting (the NaNCheck wrapper is not isbits, which CUDA kernels require) and interferes with libraries like Onion that store activation functions as struct fields and broadcast them over GPU arrays.

Extend for your own leaf layers:

NaNTracker.trackable(::KeyPath, ::MyCustomLeaf) = true

source

NaNTracker.NaNCheck — Type

NaNCheck{P,L}

Thin wrapper around a Flux layer that checks for NaN on every forward and backward pass. P is the path type (KeyPath), L is the wrapped layer type.

This struct:

Has no custom rrule — the inner layer is differentiated normally by whatever AD backend is active.
Forwards getproperty for unknown fields to the wrapped layer, making it transparent to code that accesses layer internals (e.g. .weight).
Is registered with Functors.@functor (not Flux.@layer) so that fmap / Optimisers.update! can reach the trainable parameters inside layer.

source

Stats Tracking

NaNTracker.enable_stats! — Function

enable_stats!(; capacity=1000)

Turn on activation/gradient stats collection. Each forward input, forward output, and gradient flowing through a NaNCheck layer records norm, maxabs, and NaN/Inf flags into a ring buffer.

Note: On GPU this introduces sync points (scalar transfers) at every checked layer. Use for debugging only.

source

NaNTracker.disable_stats! — Function

Turn off stats collection and release the buffer.

source

NaNTracker.clear_stats! — Function

Clear recorded stats without disabling collection.

source

NaNTracker.recent_stats — Function

recent_stats(; n=50, path_contains="")

Return recent StatsEntry records. Optionally filter by path substring. Returns empty vector when stats are disabled.

source

NaNTracker.dump_stats — Function

dump_stats(; n=50, path_contains="", io=stderr)

Print recent stats entries to io. Useful for inspecting activation/gradient magnitudes during training without waiting for a NaN.

Example

enable_stats!()
# ... run one training step ...
dump_stats(path_contains="attention")  # show only attention layers
clear_stats!()                          # reset for next step

source

Internal

These are not exported but can be extended.

NaNTracker.hasnan — Function

hasnan(x) :: Bool

Check whether x contains any NaN values. Dispatches on type so it works for arrays, scalars, tuples, and falls back to false for anything else.

source