Animals
All experiments were performed under the UK Animals (Scientific Procedures) Act of 1986 (PPL: PD867676F) following local ethical approval by the Sainsbury Wellcome Centre Animal Welfare Ethical Review Body. A total of 21 C57BL/6 J male mice (age = 34.5 ± 15.8 weeks (mean ± s.d.)) were used for electrophysiological recordings. Fifteen mice first underwent headfixed behavioural training prior to acute electrophysiological recordings (see ‘Task and training stages’), and six mice (untrained mice) only underwent habituation to the recording setup prior to acute electrophysiological recording.
Prior to behavioural training and recordings, all mice were implanted with a headfixation bar under approximately 1.5% isoflurane and administration of Meloxicam (5 mg kg^{−1}) to allow for headfixation during behavioural training and electrophysiological recordings.
During training, mice were cohoused with littermates in individually vented cages. After implantation of the recording chamber, mice were singly housed to protect the implant. Mice were housed in reversed day–night cycle lighting conditions, with the ambient temperature and humidity set to 23 °C and 56% relative humidity, respectively.
Behavioural task
The design of the behavioural task was as previously described in ref. ^{14}. In brief, mice were headfixed and placed on a polystyrene wheel. Two monitors (21.5 inch, 1,920 × 1,080, 60 Hz) were placed on each side of the mouse at approximately 20 cm from the mouse head. The monitors were gamma corrected to 40 cd m^{−2} of maximum luminance using custom MATLAB scripts utilizing PsychToolbox3. The stimulus presentation was controlled by custom written software in MATLAB utilizing PsychToolbox3. The visual stimulus was a sinusoidal grating with the spatial frequency of 0.04 cycles per degree resulting in 3 grating periods shown on a screen. Each trial began with a presentation of a grey texture covering both screens. After a randomized delay (at least 3 s plus a random sample from an exponential distribution with the mean of 0.5 s), the baseline stimulus appeared. The TF of the grating was drawn every 50 ms (3 monitor frames) from a lognormal distribution, such that log_{2}transformed TF had the mean of 0 and s.d. of 0.25 octaves and the geometric mean of 1 Hz. The direction of drift was randomized trial to trial between upward or downward drift. The sustained increase in TF, referred to in the text as change period, occurred after a randomized delay (3–15.5 s) from the start of baseline period and lasted for 2.15 s. For early and late blocks training (stage 8), change period times were sampled between [3, 8] s and [10.5, 15.5] s, respectively, with the delay from the earliest allowed change period sampled from an exponential distribution with a mean of 4 s. Random 15% of trials were assigned as nochange trials and did not have a change period. For stage 8 training, 10% of trials were designated to be probe trials and had a change time drown from the distribution of the other block type. Because there were no qualitative differences in neural TF pulse response between early and late blocks (data not shown) we have combined data from both block types for analyses throughout this manuscript. Findings related to stage 8 (early and late blocks) will be presented in an upcoming paper.
Mice were trained to report sustained increases in TF by licking the spout to trigger reward delivery (drop of soy milk). Licks that occurred outside of the change period are referred in the text as early licks. If mice moved on the wheel (movement exceeding 2.5 mm in a 50ms window) in either direction, the trial was aborted (stages 7 and 8). If mice did not lick within 2.15 s from the change onset, the trial was considered a miss trial.
Training stages
Following the implantation of the headplate, mice were allowed to recover for a week. After that, mice went through several stages of training:

(1)
Mice were handled for 3 to 7 days, until mice were comfortable with being handled by the experimenter. During this stage mice were also habituated to being restrained by being placed into a soft cloth for a short period of time. After the brief restraints they were given a small amount of soy milk as reward.

(2)
Next, mice were put on food restriction. Mouse weight was monitored daily with the amount of food given adjusted per mouse to keep them sufficiently motivated for getting rewards and keep their weight no lower than 85% of the original weight prior to food restriction.

(3)
Next, mice were headfixed and placed on the running wheel of the behavioural training setup with the monitors turned off. Mice were allowed to freely run on the wheel, but not encouraged to. Typically, there were 3 habituation sessions, with the duration progressive increasing from 15 to 45 min.

(4)
Next, mice were introduced to the visual stimuli used in the task. Mice were initially shown only trials with two largest changes of TF (2 and 4 Hz, lasting 2.15 s), followed by a reward autodelivery 1.5 s after the change onset. After mice started to robustly make licks during the change period that preceded the reward autodelivery, they were transitioned to the next stage.

(5)
Here only hit trials were rewarded, early licks and running did not result in termination of the trial.

(6)
After mice robustly detected strong changes in the previous step, we introduced trials with weaker changes in TF (1.25 Hz, 1.35 Hz and 1.5 Hz). Additionally, a consequence of an early lick outside of the change period was a mild airpuff to the mouse’s right cheek and a termination of the trial.

(7)
After mice detected weaker changes as well (assessed as higher hit rate compared to nochange trials), they were transitioned to the next stage where in order to initiate the trial start (start of the baseline stimulus), mice were required to remain stationary on the running wheel for at least 3 s plus a random sample from an exponential distribution with the mean of 0.5 s. Additionally, after the trial start, a trial was aborted as a consequence of a movement on the wheel.

(8)
Finally, after mice reached sufficient proficiency at the previous stage, early and late blocks were introduced. During the session start, a block type was randomly chosen. A block was defined as a period of the session during which a mouse completed 30 hit trials. After completion of a block of trials, the block type was switched to the other block type (early to late or vice versa).
Six mice that were used in the untrained control experiment (Fig. 4e–h) went through training stages 1–3 above. Following that, they were shown the same stimuli as the trained mice, with the difference that their movements on the wheel or licking the spout did not terminate a trial nor trigger reward. Instead, they were given rewards at random times with interreward intervals drawn from the uniform distribution of 60 ± 15 s.
Behavioural setup and data acquisition
Reward delivery (soya milk) was controlled by a solenoid pinch valve (161P011, NResearch) and delivered to the mouse via a spout positioned in front of it. Mouse licking the spout was measured by a piezo element (TDK PS1550L40N) coupled to the spout and amplified with a custommade amplifier system. Running wheel movement was measured with a rotary encoder (model Kübler) that was connected to the wheel axle. All behavioural data and events, such as piezo signal voltage trace, valve or change period on/off state, etc., were acquired via analogue and digital channels of PXIe6341 acquisition card (National Instruments) with SpikeGLX (https://github.com/billkarsh/SpikeGLX) at 8,474 Hz.
Behavioural data analysis
Psychometric performance, reaction times and licktriggered stimulus average
Psychometric curves were calculated per session by counting the amount of hits relative to all trials where mice did not early lick nor abort. Mean hit rates (performance) and parametric 95% confidence intervals (s.e.m. × 1.96) of hit rates were calculated across sessions (n = 114) per change size. Mean reaction times and parametric 95% confidence intervals were calculated across sessions (n = 114) per change size, and pvalues were estimated from ttests.
Licktriggered stimulus average was estimated by extracting the TF pulses from −1.5 to 0 s preceding early licks and averaged across all trials, revealing mean stimulus TF prior to early licks. Parametric 95% confidence intervals were estimated by calculating the s.e.m. of TF values at each 50 ms bin (TF pulse resolution) prior to an early lick and multiplying the s.e.m. by 1.96.
Simple behavioural leakyintegrator model
In order to formally test if mice behaviourally integrated stimulus evidence (TF pulses) over time in our task, we constructed a simple behavioural leakyintegrator model with two adjustable parameters: decay time (τ), and threshold. We fitted these two parameters by estimating which decay time and threshold predicted most early lick times (from 2 s after trial start, to exclude trial onset licks) correctly for each mouse and then determined the average bestfit decay time and threshold values across mice. For each early lick trial, we calculated the integrated logscaled TF with decay across the entire trial up until the early lick.
For each early lick trial, we then estimated whether a threshold crossing of the integrated TF had been predicted within a second preceding an actual early lick onset. If this was the case, we considered the model to have predicted the early lick time. If not, we considered the model to not have correctly predicted that trial. We did this for all early lick trials, using a 58 × 151 parameter space: 58 possible decay times spanning from 0.05 s decay time (that is, no integration) to 1,000 s decay time (that is, perfect integration): (50 logspaced decay times spanning 0.050–3 s, as well as 8 additional very long decay times: 4, 5, 6, 7, 8, 9, 20, 1,000 s), and 151 linearly spaced thresholds spanning [0.01–0.16]. Significance testing of best decay time across mice (that is, larger than no integration (0.05 s)) was done with a ttest.
We also tested if the bestfit decay/integration time parameter estimated from predicting early lick times also outperformed a model with no integration when predicting singletrial hit reaction times (that is, a trial type which the parameters were not optimized on). We did this by comparing actual and predicted reaction times per change size, and calculated Pearson’s correlation between actual reaction times and predicted reaction times per change size. We calculated this by either looking at all reaction times, or only including a subset of trials with reaction times under a defined value (that is, reactiontime cutoff). This was done to better detect if any of the models specifically struggled to predict very late reaction times which may be modulated by nonsensory factors such as such as inattention or lack of engagement. Finally, for significance testing (that is, paired ttest) of whether a model with no integration (decay time = 0.05 s) versus a model with the bestfit decay/integration time (estimated from early lick trials as described above), were significantly different at predicting singletrial reaction times, we zscored actual and predicted reaction times per change size (to account for change size mean reactiontime differences), and calculated the correlation between all actual reaction times (1 s reactiontime cutoff) and all predicted reaction times of a model with or without integration per mouse, and performed a paired ttest (across mice) of the correlation values from integration versus no integration models.
Outlier detection agent
To test whether mice accumulate evidence over time or merely respond to the instantaneous stimulus, we formulated a null model where behavioural responses are produced via a stochastic outlier detection strategy. Here, an internal decision occurs when a noisy sensory representation of the stimulus crosses a decision boundary, and a response occurs after a stochastic delay. The response is triggered by a single, instantaneous value of the stimulus. However, owing to the stochastic delay, responses may show a gradually decaying statistical dependence on the stimulus history, and may even mimic evidence accumulation strategies such as integration^{42}.
Model. According to the outlier detection model, behavioural responses are generated independently for each trial as follows. Let s_{i} be the stimulus amplitude (log TF) at each time point t_{i}. We chose time points to correspond with video frames of the stimulus, which were presented at 60 Hz (3 frames per TF pulse). At each time point, a noisy sensory representation Z_{i} is formed as the sum of the stimulus amplitude and independent and identically distributed (i.i.d.) Gaussian sensory noise ε_{i} (with mean zero and variance σ^{2}):
$${Z}_{i}={s}_{i}+{\varepsilon }_{i}$$
$${\varepsilon }_{i}\mathop{ \sim }\limits_{\text{i.i.d.}}{\mathscr{N}}\left(0,{\sigma }^{2}\right)$$
An internal decision to respond occurs at time D, given by the first time point where the sensory representation exceeds a decision bound b (or ∞ if the bound is not crossed before the stimulus ends):
$$D=\,\min \{{t}_{i} {Z}_{i} > b\}\cup \{\infty \}$$
The hazard function of the decision time is thus:
$${H}_{D}\left(d\right)=\prod _{i{\rm{ }}{t}_{i}\le d}p\left({Z}_{i}\le b\right)=\prod _{i{\rm{ }}{t}_{i}\le d}\varPhi \left(\frac{b{s}_{i}}{\sigma }\right)$$
where Φ is the standard normal cumulative density function (CDF).
A motor response begins at time R, given by the decision time plus an independent, nonnegative stochastic delay ∆ representing the duration of nondecision processes (for example, decision to motor delays):
The delay has a shifted loglogistic distribution with location α, scale β and shape γ, and can be obtained by exponentiating a logistic random variable and then adding a constant. We constrained the location (α > 0) and shape (γ > 1) to give the distribution nonnegative support and a bumplike density that decreases on both sides of the mode. The delay time probability density function (PDF) and CDF are:
$${p}_{\varDelta }\left(\varDelta \right)=\frac{\gamma {\left(\frac{\varDelta \alpha }{\beta }\right)}^{\gamma 1}}{\beta {\left(1+{\left(\frac{\varDelta \alpha }{\beta }\right)}^{\gamma }\right)}^{2}}$$
$${F}_{\varDelta }\left(\varDelta \right)=\frac{1}{1+{\left(\frac{\varDelta \alpha }{\beta }\right)}^{\gamma }}$$
Because the decision and delay times are independent, the marginal response time distribution is given by the convolution of the decision and delay time distributions. The marginal PDF and CDF of the response time are:
$${p}_{R}\left(r\right)=\sum _{d}{p}_{D}\left(d\right){p}_{\varDelta }\left(rd\right)$$
$${F}_{R}\left(r\right)=\sum _{d}{p}_{D}\left(d\right){F}_{\varDelta }\left(rd\right)$$
where the decision time probability mass function (PMF) p_{D} can be computed from the hazard function H_{D} above. Because delays are nonnegative, p_{Δ}(r − d) = F_{Δ}(r − d) = 0 for all d > r, so the above sums need only be computed over time steps up to the given response time.
The outlier detection model was implemented using custom Python software using the NumPy, SciPy, and PyTorch libraries. All computations involving probabilities were performed in log space, using functions designed to avoid numerical under/overflow.
Fitting. A separate model was fit for each mouse in two stages. We first fit the delay time distribution using only trials with the largest change magnitude, then fit the remaining decision parameters using the entire dataset (excluding the abort trials). This twostage approach relies on the assumption that delays are identically distributed across trials. In return, it allows more direct estimation of the delay time distribution, providing better ability to distinguish between outlier detection and longertimescale strategies such as integration.
For each trial i, let n^{(i)} be the number of time points, \({s}^{(i)}=\{{s}_{1}^{(i)},\ldots ,{s}_{{n}^{(i)}}^{(i)}\}\) be the stimulus amplitudes, and c^{(i)} be the time of the change point. For trials where a response occurred, let r^{(i)} be the response time, measured as the onset of facial movement (see ‘Motion onset time estimation’ section) and ℓ^{(i)} be the subsequent lick time (measured at the reward spout).
Fitting the delay time distribution. We assumed that the greatest change magnitude (geometric mean TF 4 Hz) was large enough to trigger an immediate decision at or near the change point. Under this assumption, the delay time on largechange hit trials can be approximated by the reaction time, which can be directly measured as the time elapsed between the change point and the onset of facial movement. Thus, we fit the delay time distribution (shifted loglogistic distribution) to reaction times on largechange hit trials (denoted \({{\mathcal{T}}}_{{\rm{bighit}}}\)) by maximum likelihood, subject to the constraints described above:
$$\mathop{\max }\limits_{\alpha > 0,\beta > 0,\gamma > 1}\sum _{i\in {{\mathcal{T}}}_{{\rm{bighit}}}}\log {p}_{\varDelta }({r}^{(i)}{c}^{(i)})$$
This approach is conservative for our use of outlier detection as a null model. If the largest changes were not immediately followed by a decision, then delays would tend to be overestimated, causing the fitted outlier detection model to display longertimescale dependencies that are typically associated with evidence accumulation strategies such as integration. Thus, the risk of falsely rejecting this null model would not increase.
For the largest change magnitude, miss trials predominantly reflected task disengagement rather than typical sensory/motor delays, and were therefore excluded when fitting the delay time distribution. According to a hidden Markov model, disengagement was the a posteriori most probable state for the majority of largechange miss trials (95.2% of largechange misses were during a disengaged state).
Fitting decision parameters. The decision parameters (sensory noise variance and decision threshold) were subsequently fit using the entire dataset, holding the delay time distribution fixed. Here in the general case, the decision and delay times cannot be directly observed, and were marginalized out as latent variables. The decision parameters were chosen to maximize the log marginal likelihood of the observed response data:
$$\mathop{\max }\limits_{{\sigma }^{2},b}\sum _{i\in {{\mathcal{T}}}_{{\rm{nonmiss}}}}\log {p}_{R}({r}^{(i)})+\sum _{i\in {{\mathcal{T}}}_{{\rm{miss}}}}\log (1{F}_{R}({t}_{{n}^{(i)}}))$$
For hit and early lick trials (denoted \({{\mathcal{T}}}_{{\rm{nonmiss}}}\)), the likelihood is given by the marginal probability density of a response at the observed movement onset time. For miss trials (denoted \({{\mathcal{T}}}_{{\rm{miss}}}\)), the response time is treated as rightcensored; its precise value is unknown, but is known to exceed the last time point in the trial. The likelihood for miss trials is thus given by the marginal probability mass lying beyond this point.
Sampling. To statistically compare mouse behaviour to the outlier detection null model, we sampled 10,000 synthetic datasets from the model fitted for each mouse. For every quantity of interest, the value computed from the real data was compared to values computed from each synthetic dataset, comprising 10,000 samples from the null distribution. Synthetic datasets were generated for each mouse as follows.
Each trial used the same change point and stimulus amplitudes presented in the real data. The real stimulus ended after the lick on trials where mice responded, leaving unknown future values that would have been presented had a lick not occurred. Such missing stimulus values were filled in by sampling from the same distribution used to produce the original stimuli (independently for each synthetic dataset).
Given the stimulus, a decision time and delay time were sampled from the distributions p_{D} and p_{Δ} described above. The sum of these quantities yielded a synthetic response time, representing movement onset.
To generate synthetic lick times, we assumed that the additional delay between movement onset and licking was i.i.d. across trials. We therefore sampled with replacement from the measured movementtolick delays in the real data. Synthetic lick times were obtained by adding sampled movementtolick delays to synthetic movement onset times.
Synthetic lick times were used to determine trial outcomes (hit, early lick, miss). Each trial was classified as a: hit if the lick occurred during the change period; early lick if the lick occurred before the change point; or miss if no lick occurred before the end of the change period.
Effect of magnitude and timing of TF pulses on probability of early licks
For analyses of the effect of TF pulses on probability of early licks we used the training data of the same 15 mice used for Neuropixels recordings. Here we only used sessions where mice reached robust proficiency of the task and were at the final training protocol (mean of 77.5 sessions per mouse). Note that here the time of lick onset was measured from the registration of lick by the spout as opposed to the videography analysis on Neuropixels recording sessions elsewhere in the manuscript. We used only trials where early licks happened at least 2 s after the baseline onset to decrease the influence of impulsive licks on results.
To empirically validate that mice use multiple pulses of sensory evidence to influence their decision to lick during the baseline period, we analysed how early lick probability is influenced by magnitudes and timing of preceding TF pulses. First, we tested whether the deviation of a single TF pulse relative to the mean baseline 1 Hz makes mice correspondingly more or less likely to make an early lick within the subsequent 0.2–1.0 s. For that we separated TF pulses by magnitude (in octaves) into 15 bins such that each bin contained approximately equal number of TF pulses. To calculate the conditional probability of early lick at a certain time after a TF pulse of a given magnitude, we found instances of such events (pulled across all sessions with robust performance for each mouse) and divided them by the total amount of early licks (Extended Data Fig. 6c). To calculate an overall influence of a TF pulse on early lick probability, we summed conditional probabilities within a [−1, −0.2] s window relative to early lick onset (Extended Data Fig. 6d):
$$P({\rm{L}} {\rm{TF}})=\mathop{\sum }\limits_{t=1}^{0.2}P({\rm{L}} {\rm{TF}}(t))$$
which can also be written as: P(LTF) = P_{0} + ΔP(LTF), where
$${P}_{0}=\mathop{\sum }\limits_{t=1}^{0.2}P\left({\rm{L}} {\rm{T}}{\rm{F}}\left(t\right)=1\,{\rm{Hz}}\right)$$
And can be thought as a chance level of making a lick without a deviation of stimulus TF from the mean baseline TF value.
The empirical effect of two TF pulses on lick probability was calculated from behavioural data in a similar way. To compare the measured effect of two TF pulses with their expected effect if they influenced the lick probability independently, we calculated their cumulative independent effect on early lick probability based on empirically measured effect of a single TF pulse on early lick probability. The independent effect of two TF pulses with a delay of Δt s between them can then be written as follows:
$${P}_{{\rm{Ind}}}({\rm{L}} {{\rm{T}}{\rm{F}}}_{1},{{\rm{T}}{\rm{F}}}_{2})={P}_{0}+\Delta P({\rm{L}} {\Delta {\rm{T}}{\rm{F}}}_{1})+\Delta P({\rm{L}} {\Delta {\rm{T}}{\rm{F}}}_{2})\Delta P({\rm{L}} {\Delta {\rm{T}}{\rm{F}}}_{1})\times \Delta P({\rm{L}} {\Delta {\rm{T}}{\rm{F}}}_{2})$$
where:
$$\Delta P\left({\rm{L}} {\Delta {\rm{T}}{\rm{F}}}_{1}\right)=\mathop{\sum }\limits_{t=1\Delta t}^{0.2\Delta t}P\left({\rm{L}} {\rm{T}}{\rm{F}}\left(t\right)\right){P}_{0}$$
$$\Delta P\left({\rm{L}} {\Delta {\rm{T}}{\rm{F}}}_{2}\right)=\mathop{\sum }\limits_{t=1}^{0.2}P\left({\rm{L}} {\rm{T}}{\rm{F}}\left(t\right)\right){P}_{0}$$
A deviation of lick probability after two TF pulses from the probability predicted by the independent effect of two TF pulses would indicate an interactive effect between pulses, which should be expected if mice utilize integration of sensory evidence. To measure the relative difference between the behavioural result and the expected independent effect of two fast TF pulses (Fig. 3d and Extended Data Fig. 6i), we calculated:
$$I=\frac{P({\rm{L}} {{\rm{T}}{\rm{F}}}_{fast},{{\rm{T}}{\rm{F}}}_{fast}){P}_{{\rm{Ind}}}({\rm{L}} {{\rm{T}}{\rm{F}}}_{fast},{{\rm{T}}{\rm{F}}}_{fast})}{{P}_{{\rm{Ind}}}({\rm{L}} {{\rm{T}}{\rm{F}}}_{fast},{{\rm{T}}{\rm{F}}}_{fast})}$$
When applying this analysis to the outlier detection agent data, we used data only from trials that resulted in early licks, meaning that the model made a decision to initiate a lick during the baseline period and before the TF change epoch. For outlier detection agent model that was fitted to a particular mouse data, we sampled the same number of early lick trials across 4,000 synthetic datasets (see section above) as there were present across all behavioural sessions of that mouse. The data was then pulled across all models corresponding to different mice and analysis steps were applied to the combined dataset as described above for the mice data. This procedure was repeated 4,000 times to estimate nonparametric 95% confidence intervals of results from the outlier detection agent.
Electrophysiological recordings
Prior to acute electrophysiological recordings, we habituated mice to the electrophysiological recording setup for 2–7 days (depending on the performance of the mouse in the electrophysiological recording setup), to allow mice to perform optimally during electrophysiological recording sessions.
Surgery
Once mice were habituated to the recording setup, we implanted a recording chamber with one or two 3 mm craniotomies inside, together with a stainlesssteel grounding wire in the contralateral hemisphere, under 1.5% isoflurane together with administration of meloxicam (5 mg kg^{−1}) and dexamethasone (2–3 mg kg^{−1}). During surgery a kapton disk (Laser Micromachining Limited) was placed on top of the dura inside each craniotomy. The disk had 19 holes with 0.5 mm diameter, arranged in a honeycomb shape, for keeping track of probe insertions. The craniotomy and disk were covered with DuraGel (Cambridge NeuroTech) to protect the brain. A 1–2 mm tall plastic enclosure was then positioned around craniotomies and sealed around the edges with bone cement. Finally, we covered the plastic enclosure with a removable plastic cover, to create a rigid physical barrier over the DuraGel sealed craniotomy, to provide robust protection of the recording preparation between recording sessions. The mice were allowed to recover for 24 h before the first recording session took place.
Recordings
Electrophysiological data collection was done using Neuropixels 1.0 probes (IMEC, Belgium) and collected with a PXI based system (National Instruments), and saved using SpikeGLX (https://github.com/billkarsh/SpikeGLX). For trained mice, we recorded up to 13 sessions per mouse (167 probe insertion from 114 sessions total (15 mice)). For untrained mice, we recorded up to 9 sessions per mouse (89 probe insertions from 45 sessions total (6 mice)). Probes were dipped in CMDiI (SigmaAldrich) prior to insertion. In each session, we inserted up to 2 probes at a time. The probes were always inserted at the same angle within the coronal plane (10° and −15° relative to the vertical axis) to aid subsequent histological probe tract tracing.
At the beginning of each session, we removed the plastic lid above the recording chamber exposing the DuraGel covered craniotomy, and inserted the probe(s) through the DuraGel using microcontrollers (Sensapex) at 5–10 μm s^{−1}. The probe(s) was allowed to settle for 20 min, to increase stability throughout the recording session. At the end of the session probes were removed (at 15 μm s^{−1}) and the plastic cover over the recording chamber was reattached for protection of recording preparation.
The setup for presenting stimuli and monitoring behaviour were identical to the setups in which mice had been trained (see ‘Behavioural task’).
Preprocessing and spike sorting of electrophysiological data
Electrophysiological data was first filtered using CatGT (https://billkarsh.github.io/SpikeGLX/#catgt) with modified form of common average referencing (dlbdmx flag).
Spike sorting. We spikesorted electrophysiological data from each probe in each session using KiloSort2.0^{65} (https://github.com/MouseLand/Kilosort). For initial selection of units undergoing further curation, we only selected units designated as ‘good’ (based on crosscorrelogram contamination) by KiloSort2.0.
Quality checks. For our electrophysiological recordings of trained mice, we manually inspected and curated, in Phy2.0 (https://github.com/kwikteam/phy), every unit which KiloSort2.0 had designated as ‘good’. For our recordings in trained mice this left 44,288 units to be manually inspected and curated, and 15,406 units were kept for analysis after manual curation. Based on the manual curation data from trained mouse recordings (see ‘Manual curation of spikesorted units from trained mice’), we established a series of heuristics for creating automatic curation of units (see ‘Automatic curation of spikesorted units from untrained mice’) and used these for recordings from untrained mice.
Manual curation of spikesorted units from trained mice. We manually inspected and curated all units which KiloSort2.0 had designated as good, based on crosscorrelogram contamination. In Phy2.0, we first inspected and merged units that clearly belonged to the same cluster, but had been split by KiloSort2.0, or split the noise from signal in units with clearly separatable noise contamination. We then designated each unit into one of five categories:

(1)
Perfect, or almost perfect, with no/very minimal noise, drifting, cutting in/out for the full duration of recording.

(2)
Usable and good signal with some noise that cannot be extracted that lasts for the full duration of the recording.

(3)
Some drift, but possible physiological change in signal. Clear signal for most of duration of the recording.

(4)
Drifting/sudden loss, but otherwise usable/close to perfect. Clear signal for over 50% of the duration of the recording but requires only using a subset of the session.

(5)
Noise/useless. Spike shape is not physiological.
Our goal was to remove from analyses units that had large contamination with multiunit activity, were not recorded throughout the full duration of a session, or were a result of artifacts in recorded signals. We therefore used units designated as category 1–3 above for all further analysis from trained mice.
Automatic curation of spikesorted units from untrained mice. We next used the manual designations of units to establish a set of criteria for automatic detection of units we would include with manual curation. Based on the manual curation data above we established the following 7 criteria for considering a unit good for analysis:
Firing rate criteria:

(1)
Mean firing rate must be above 0.5 Hz.

(2)
Rolling 20min average firing rate cannot drop below 30% (that is, 70% drop from mean) of its mean firing rate.

(3)
Rolling 10min average firing rate cannot drop below 20% (that is, 80% drop from mean) of its mean firing rate.

(4)
Rolling 5min average firing rate cannot drop below 10% (that is, 90% drop from mean) of its mean firing rate.

(5)
Interspike interval (ISI) violations. Absolute refractory period needs to have <20% estimated contamination rate from other neurons (this is what Kilosort2.0 calls ‘good’).

(6)
If there are some spikes in the refractory period, the ISI peak in the first 5 ms cannot be within the first 2 ms.

(7)
ISI histograms cannot have sudden large spikes in their shape (that is, peak of ISI cannot be 4 times larger than the second highest peak—that is usually its immediate neighbour).
These criteria selected approximately 90% of units we would have designated with categories 1 (perfect, or almost perfect) or 2 (usable and good signal with some noise) with manual curation, and excluded approximately 85% we would have designated as 4 (drifting/sudden loss) or 5 (noise) with manual curation.
This automatic selection of units was used to select units for analysis from untrained mice recordings and yielded 6,215 units out of 20,292 ‘good’ KiloSort2.0 units.
Clockdrift correction. A shared 1 Hz square wave signal was recorded on the clock of each headstage and National Instruments (NI) acquisition card using a SYNC option in SpikeGLX. Clock drift between spike times from different probes and behavioural events extracted from NI acquisition card recording was corrected posthoc via TPrime (https://billkarsh.github.io/SpikeGLX/#tprime) using the shared square wave signal.
Videography
Acquisition
Highspeed videography of front (100 frames s^{−1}, 640 × 512 pixels) and side view (50 frames s^{−1}, 976 × 1,024 pixels) of the mouse face was acquired using two Chameleon3 cameras (CM3U313Y3MCS, FLIR) with infrared illumination. The videos were acquired in an 8 bit greyscale format. Cameras were configured to send a TTL signal to the National Instruments PXIe board at the start of exposure of every acquired frame. These TTL signals were used to align frame times to the time of behavioural events and spike times.
Pupil size
In order to estimate the pupil size, we trained DeepLabCut^{66} to track the pupil size and position using videos acquired with the side camera. The model was trained to track 12 points surrounding the mouse pupil. In order to assess the model performance, after the training the model was tested on videos from sessions not used for training. Pupil size was estimated as an area of an ellipsoidal best fit to the tracked 12 points surrounding the pupil.
Motion energy
For calculation of motion energy, we primarily used videos acquired with the front camera to access a finer temporal resolution (with the exception of 2 sessions where for technical reasons we used a lower jaw ROI from side camera video). To estimate motion onset times, we used ROI centred around the mouse’s face, though nearly identical results were obtained with lower jaw or whisker pad ROIs from the side camera (data not shown). Motion energy was defined as a square root of the sum of squared frametoframe pixel value differences, divided by the number of pixels within the ROI.
Movement onset time estimation
In order to find the onset times of orofacial movements, we wanted to estimate the typical noise level of the motion energy signal and find the time points where the signal significantly deviated from the noiseband level. As a first step, we calculated the distribution of motion energy values in a 2s window centred around the lick registration times. We next fitted a mixture of Gaussian distributions with the goal to capture both contribution of the variance of motion energy values during the lick as well as due to noise. The mixture of three Gaussian distributions worked well to fit the data across all sessions and mice. The threshold for the presence of movement was defined as the mean plus two standard deviations of the Gaussian with the lowest value of the mean from the Gaussian mixture.
Finally, to find the time of motion onset time, we looked backwards in time from the time of lick registration by the piezo signal. The time point preceding the first instance of motion energy going below the threshold value defined above was considered the onset time of the orofacial movement.
Histology
For histological identification of the location of the recording probes and allocation of unit location in the mouse brain, we followed a protocol similar to ref. ^{67}.
Serial 2photon tomography for Neuropixels probe tract tracing
Following a terminal administration of pentobarbital, mice were perfused with a phosphate buffer solution (PBS) followed by 4% paraformaldehyde (PFA) solution. We postfixed the brain in the 4% paraformaldehyde for a minimum of 24 h at approximately 5 C. Following fixation, brains were moved to PBS for a minimum of 12 h prior to imaging. For imaging, brains were embedded in 5% agarose gel and mounted onto a vibratome cutting stage under the microscope objective. The brains were imaged using serial section twophoton microscopy^{68}. The microscope was controlled with ScanImage Basic (Vidrio Technologies), and custom software (BakingTray (https://github.com/SainsburyWellcomeCentre/BakingTray)). Images were stitched into a full 3D rendering of the brain using custom software (StitchIt (https://github.com/SainsburyWellcomeCentre/StitchIt)). We imaged the entire brain (from the olfactory bulb to the beginning of the spinal cord) with a resolution of x: approximately 2 μm, y: approximately 2 μm, z: 20 μm, with a 920 nm twophoton laser (100–150 mW power at sample). We sliced the brain in 40μm sections, and imaged 2 zplanes (around 25 μm and around 45 μm from the tissue surface) into the remaining tissue following each 40μm section. Two PMTs, one for capturing green (bandpass filter ET525/50 m) and red (bandpass filter ET570lp) fluorescence acquired the 2 channels of data subsequently used for analysis.
Neuropixels probe tract alignment to the Allen Common Coordinate Framework atlas and estimation of unit location
Prior to image processing, we downsampled microscopy images to 10μm voxels and registered the brain to the standardized Allen Common Coordinate Framework (Allen CCF^{69}) using custom software (BrainRegister (https://github.com/stevenjwest/brainregister)). We then manually traced each neuropixels probe tract through the brain in 3D using custom software (Lasagna (https://github.com/SainsburyWellcomeCentre/lasagna)). Finally, we assessed the overall firing rates and LFP spectra of individual Neuropixels channels and compared it to atlas positions. Where needed, we manually adjusted the scaling of brain regions along the probe track to align responses on channels with features associated with anatomical locations using custom software (Ephys alignment tool (https://github.com/intbrainlab/iblapps/tree/master/atlaselectrophysiology)^{70}). Unit location was estimated from the location of the channel that had the largest absolute peak value of the mean waveform. For all analyses, we combined units across all subdivisions of a brain region (layers of cerebral cortex, dorsal and ventral divisions as ACAd and ACAv and in some cases functionally similar brain regions—see Supplementary Tables 1 and 2).
Neural data analysis
Only brain regions with at least 40 units were analysed. Analyses specific to TFresponsive units were done only for brain regions with ≥10 such units. No further sample size calculations were performed. Manual curation of units’ quality and stability was done without the knowledge of brain regions from which recordings were made. The subsequent analyses pipeline was applied in the same manner to data from all applicable brain regions, but the custom nature of analyses prevented investigators to remain blind to the identity of brain regions or dataset type (trained versus naive mice).
GLM of neural activity
Model. We binned neural activity in 50ms bins (matching the duration of each TF pulse) aligned to trial start. We then fitted a Poisson generalized linear model to predict trialtotrial neural activity as a function of a set of temporally unfolded taskrelated predictors that were present during a trial. Each predictor was extended temporally prior and/or post the timing of the predictor in 50ms discretized steps (matching neural activity binning), with an independent weight estimated for each time step around the predictor. We predicted neural activity using 19 taskrelated predictors:
(1) TF fluctuations during baseline period (kernel length: 0–1.5 s); (2) Trial start (0–1 s); (3) Time since baseline start (from 1 s from trial start to change onset); (4–9) Six change onsets (a separate predictor for each change size (0–2 s)); (10) Lick preparation (−1.25–0 s prior to lick); (11) Lick execution (0–0.5 s post lick); (12) Airpuff (0–0.25 s); (13) Reward (0–0.4); (14) Abort (−1.25–0.25); (15) Phase of grating for upwards drift (12 phase bins from 0–360°); (16) Phase of grating for downwards drift (12 phase bins from 0–360°); (17) Video motion energy (−0.05–0.8 s); (18) Running wheel movement (−0.05–0.8 s); (19) Pupil diameter (−0.75–0.75 s).
We fit the model with L2 (ridge) regularization, optimized with cyclical coordinate descent as implemented in GLMnet^{71} (α = 0). We trained a model for each neuron on 90% of the data, and crossvalidated on 10% of the data, and iterated the predictions over a tenfold crossvalidation. Within the training dataset we tuned the L2 regularization term using tenfold crossvalidation.
Identification of units encoding TF, lick preparatory activity and/or lick execution activity. To identify which cells significantly responded to a predictor of interest (that is TF fluctuations during baseline, lick preparation epoch, or lick execution epoch), we first refitted reduced models similar to the full model on 90% of the data, with 10fold crossvalidation, except we removed a predictor(s) of interest: (1) For identification of TFresponsive units, we estimated a model where we removed the predictor estimating the responses to TF fluctuations during baseline. (2) For identification of units with lick preparation activity, we estimated a model where we removed the predictor estimating the activity leading up to a lick. (3) For identification of units responding to lick execution, we estimated a model where we removed the predictor estimating activity during lick execution, the predictor estimating activity modulation by motion energy captured by videography, and the predictor estimating activity modulation by running wheel movement.
For each 10% test set, for each neuron we then calculated the mean actual perievent time histogram (PETH) as well as the mean predicted PETH of both the full model and the reduced model for the following types of events: (1) −0.15 to 0.75 s around fast and slow TF pulses (that is, TF values 0.5 s.d. from the mean TF during baseline); (2) −1.5 to 0 s prior to early lick onsets; and (3) 0 to 0.4 s post lick onset.
A unit was considered significantly encoding TF pulses during the baseline period if two criteria were satisfied: (1) The mean Pearson’s correlation prediction of the full model (across kfolds) from the combined mean fast and slow TF pulse response (that is, mean fast TF pulse and mean slow TF pulse responses subtracted from each other) was >0.2; and (2) if the crossvalidated prediction of the TF response after subtracting the predicted TF response of the reduced model with no TF fluctuation predictor—that is, residual prediction—was significant (P < 0.01 (ttest), n = 10 independent crossvalidations). A unit was considered significantly encoding lick preparation if (1) the mean Pearson’s correlation prediction of the full model (across kfolds) of the mean activity leading up to a lick (−1.25 to 0 s) was >0.2; and (2) if the crossvalidated prediction of the mean activity after subtracting the predicted mean activity of the reduced model with no lick preparation kernel—that is, residual prediction—was significant (P < 0.01 (ttest), n = 10 independent crossvalidations). Finally, a unit was considered significantly encoding lick execution if (1) the mean Pearson’s correlation prediction of the full model (across kfolds) of the mean activity following a lick (0 to 0.25 s) was >0.2; and (2) if the crossvalidated prediction of the mean activity after subtracting the predicted mean activity of the reduced model with no lick preparation kernel—that is, residual prediction—was significant (P < 0.01 (ttest), n = 10 independent crossvalidations).
Focality index. To assess how distributed TF encoding was across brain areas, before and after learning, we computed a focality index (F) (similar to Steinmetz et al.^{3}) of the TF encoding:
$$F=\frac{\sum ({p}_{a}^{2})}{{(\sum {p}_{a})}^{2}}$$
where p_{a} is the proportion of neurons in an area that is encoding stimulus TF during the baseline period. If all TF encoding neurons were confined to a single area, this measure would take on the value of 1. If encoding was perfectly distributed across all areas recorded this measure would take on the value 1/N_{areas}. In order to compare between untrained and trained mice, we identified the common areas which had more than 40 units recorded in both trained and untrained mice. This left N = 24 areas from which to estimate the focality index. We estimated 95% confidence intervals and P values by bootstrapping the neurons included in the estimation 10,000 times with replacement.
Peak time and width of GLM estimated TF kernels for TFresponsive neurons. To investigate the peak time and width of the GLM estimated TF kernel for assessing how sustained responses to TF fluctuations were based on GLM weights, we first identified the absolute peak value of the TF kernel; because the GLM was based on 50 ms binning of spike counts, peak times for the GLM TF kernel was in 50 ms resolution. In cases where the absolute peak position within 1 s was a negative weight, we flipped the kernel in order to calculate the width. We then estimated the full width at half maximum (FWHM) of each TF kernel around its peak using findpeaks in MATLAB. For each area, we calculated the median peak time and median FWHM across all TFresponsive units.
Ramping differences in GLM change kernels. To test how neurons accumulated evidence when they were presented with a rewarding sustained change in stimulus speed, we tested how the slope of the visual evoked ramping activity following a change onset was dependent on the amount of evidence (change size) being presented. To isolate the visual component of the activity following change onset, we used the GLM kernel which fits the activity following change onset until change offset, while linearly taking into account other variables which may contribute to activity such as pupil size, preparatory activity and movementrelated activity (see Model).
We estimated the mean change kernel for each change size for TFresponsive and nonTFresponsive units separately for each area. In cases where responses to fast TF pulses were negative, we flipped the change kernel so every unit had responses aligned to positive fast TF pulses—this allowed the mean to capture the visual evidence activity ramp irrespective of sign. We then identified the time point for each change size where the change kernel reached 50% of its maximum weight (To control for noise fluctuations in kernel weights, we approximated the 50th percentile crossing by taking the mean time point of the 33.33rd percentile, 50th percentile and 66.66th percentile crossing). We then calculated the degree to which activity ramping time scaled with change size, by regressing the 50th percentile crossing against change size. We estimated the nonparametric 95% confidence intervals and P values of the relationship between change size and 50th percentile crossing (that is, ramping time/change size) by bootstrapping with replacement (10,000 times) the neurons went into the mean change kernels, and then estimating the slope of the regression for each bootstrapped mean change kernels.
Propagation and widening of TF pulse evoked activity
Identification of TF pulse outlier events. Fast TF pulse was defined as TF fluctuations larger than 1 s.d. of baseline TF fluctuations (in log_{2} scale) above the mean TF value (TF > 1.19 Hz). Similarly, slow TF pulse was defined as TF fluctuations below 0.84 Hz.
For calculation of average response to TF outlier events, we considered only TF outlier events satisfying the following criteria:

(1)
Later than 1 s from the baseline onset.

(2)
Earlier than 2 s + post pulse analysis window from the motion onset time on early lick or abort trials.

(3)
Excluding the change period plus a post pulse analysis window.
The aim of these criteria was to exclude the influence of baseline onset, movement, or preparatory activity on the response to TF pulses.
Estimation of peak time and width of TF pulse evoked activity. For each unit defined as TFresponsive by the GLM analysis described above, we calculated a mean response to a fast pulse using outlier events that occurred during the baseline period and satisfied the criteria outlined above. Additionally, we calculated a mean response to TF pulses within [−0.5, 0.5] s.d. of the baseline TF fluctuations. The goal of this procedure was to capture continuous ramps of activity that some units exhibited and exclude their influence on the shape of response to a TF pulse. We applied the subtraction of this baseline response for all TF pulse response analysis unless explicitly stated.
Next, for the baseline subtracted mean response to a fast TF pulse, we calculated its peak time, as the time of the largest absolute change in firing rate within 1 s from the pulse onset, and a corresponding halfpeak width.
Integration of multiple TF pulses. Because the noise in TF fluctuations is random, by chance there are occurrences of two fast pulses separated by a certain delay. To study the integration of TF pulses, we found such instances of events where two fast pulses occurred at a given delay between the offset of the first and the onset of the second, additionally also satisfying the exclusion criteria outlined above. The mean response aligned to such events was considered a response to a sequence of two fast pulses.
For computing the mean response across all TFresponsive units within a brain region, in order to avoid averaging across responses with different signs, we flipped the sign of response for units that showed decreases in activity after a single fast pulse. For computing a zscore of response, the mean and s.d. were estimated from 0.5 s preceding the first pulse onset.
Facilitation by the second fast pulse. First, we measured an average of zscored responses across the population of TFresponsive units within a brain region to a single fast TF pulse. We then computed the peak value of that response (r_{1fast}), and a corresponding peak time. To find the size of response to a sequence of two fast pulses (r_{2fast}), we found a time point at the same delay from the onset of the second fast pulse as the peak time of response to a single fast pulse and found a peak value of response within 100 ms centred around that time point. The relative facilitation to a sequence of fast pulses was defined as \(\varDelta \,=\,\frac{{r}_{2{\rm{fast}}}\,\,{r}_{1{\rm{fast}}}}{{r}_{1{\rm{fast}}}}\).
To determine the confidence intervals for the results of this analysis, we bootstrapped with replacement (2,000 times) across TFresponsive neurons and repeated the analysis described above for each sample of neurons. Shaded regions indicate 2.5 and 97.5 percentiles of the resulting distribution.
Preparatory activity before the lick onset
To study changealigned (Fig. 3) or hit lickaligned (Fig. 5) activity, we computed zscore of mean PETH for each unit. zScoring was done using the mean and s.d. estimated from activity during 2 s before the change onset.
For analysis shown on Fig. 5, for each brain region the fraction of significantly active units within a group (that is, TFresponsive) was measured by calculating at every time point a fraction of units with the absolute value of zscore larger than the significance threshold of 2.576 (corresponding to P < 0.01). Additionally, we subtracted the ‘baseline’ level of activity calculated within [−2, −1.8] s before hitlick onset, which for a few brain regions was larger than chance level likely due to nonnormal distribution of firing rates or a small number of events used for estimation of the mean and s.d. The confidence intervals were estimated by bootstrapping with replacement (5,000 times) across TFresponsive (or TF nonresponsive) neurons and repeating the estimation of fraction of significantly active neurons for each sample of neurons.
The latency of activation of TFresponsive or TF nonresponsive populations was defined as the earliest time point following which within a 100ms window for at least 80 ms: (1) the lower 95% confidence interval of fraction of active units was above zero; and (2) the mean fraction of active units was above 0.1.
The latency of significant difference in activation between TFresponsive and TF nonresponsive populations was estimated as the first time point where within a 100ms window for at least 80 ms the confidence intervals of the difference in activation were above zero.
The latency of significant difference in activation across all units in each brain region (Extended Data Fig. 9a) was estimated as the first time point where within a 100ms window: (1) the lower 95% confidence interval of fraction of active units was above zero; and (2) the mean fraction of active units was above 0.05.
Intrinsic timescales
We binned the neural activity into 50ms bins (same binning was used in ref. ^{23}). We then calculated the temporal autocorrelation (20 lags = 1 s) of spike counts using Pearson’s correlation in the intertrial intervals between −2.5 s to −0.5 s prior to trial onset for each neuron (in this period mice were seeing a grey screen, and trained mice had to remain stationary for at least 3 s for the trial to begin).
To determine the intrinsic timescale for each area, we fit an exponential decay function to the mean autocorrelation function of all the units recorded in the area. For singleneuron analysis of relationship between intrinsic timescales and TF width, we estimated the autocorrelation for each TFresponsive neuron separately. For areas or neurons with autocorrelation functions with nonmonotonic decay, we fit the exponential decay from the part of the autocorrelation where monotonic decay was happening (in a subset of areas this would mean offsetting the fit 1–3 time bins). Finally, we calculated the τ (that is, the intrinsic timescale value) of the exponential decay (accounting for offset where necessary).
Population analysis
Similarity of TF pulse responses and lick preparation activity in TFresponsive populations
We assessed the similarity of TF responses and lick preparation activity across TFresponsive populations in each area by estimating the Pearson’s correlation of mean firing rates (within a 50ms window around the mean activity peak time across neurons within the each area) following fast TF pulses (that is, >1 s.d. TF value) and their mean activity prior to early lick [−0.3 to 0 s] (after normalizing firing rates by subtracting baseline firing rates from both TF responses and lick preparation activity). We estimated the nonparametric 95% confidence intervals and P values by bootstrapping with replacement (10,000 times) the neurons going into the correlation.
Preprocessing steps
For all units located within a given brain region, but not necessary simultaneously recorded, we first computed the mean neural responses across a given trial type (for results shown on Fig. 6b–h: hit trials during weak TF changes (1.25 and 1.35 Hz) aligned to the lick onset times, [−2, 1.5] s time window). Only trials with hitlick onset times larger or equal 0.4 s from change onset were used. Neurons from sessions with less than 10 trials of a given type were excluded from this analysis. Firing rates were calculated as spike counts averaged in 10 ms bins and smoothened by convolution with twosided Gaussian with 30 ms s.d. The mean neural responses were combined into a firing rate matrix (but also see crossvalidation section) with dimensions of Neurons × Time.
Neural data was preprocessed in the following way: first, to limit the dominant influence of highfiring units, we applied softnormalization to each neuron’s firing rate, such that the neurons with strong responses had close to unity range of responses \({r}^{/}\,=\,\frac{r}{7+(\max (r)\min (r))}\). The constant 7 was chosen as the roughly 20th percentile value of the firing rate range across all units. Second, the neural responses were meancentred by subtracting the mean of each neuron’s activity across time and the mean activity across all neurons at every time point.
Definitions of movement and movementnull subspaces
We used the approach first utilized in ref. ^{33}. There, the authors formalized a method to find a linear mapping between lowdimensional representation of activity in PMd/M1 and the muscles EMG data, which defines a movement subspace. A nullspace relative to that subspace forms an orthogonal set of dimensions which activity can occupy without directly affecting the movement execution. To extend this analysis on our data, we used combined recordings of orofacial motor and premotor nuclei (V, IRN, SPVI and SPVO) as a proxy for activity of orofacial muscles involved in execution of a lick. While recordings from GRN could have also been included into this group, we kept it separate to allow the population analysis to be applied to that region because (1) we had a large number of units recorded from that region alone; and (2) it was the only nucleus in medulla with abovechance number of TFresponsive units, warranting a separate analysis.
We considered a possible mapping onto the movement subspace for each brain region. Our rationale was the following: there exist several parallel neural pathways that can drive the activity of orofacial nuclei neurons–from primary motor cortex, basal ganglia, cerebellar or midbrain output regions^{56,57,72}. Thus, the modes of activity within these regions that map onto the movement subspace may have a causal role for the execution of licks. In general, however, these signals can also be caused by movement afference that is broadcasted globally^{3,5,9} (Fig. 1k,l). It is impossible to differentiate between these two possibilities from our data alone and thus the existence of mapping of activity onto the movement subspace does not necessarily imply that the brain region is causally involved in execution of the lick. With that said, we did not find a good mapping onto a movement subspace for most of the early visual areas, olfactory regions and hippocampal input regions (Extended Data Fig. 10a), suggesting that existence of mapping onto the movement subspace is not possible across all brain regions.
The mapping onto movement subspace was defined as:
$$\widetilde{M}=W\widetilde{N}$$
(1)
where \(\widetilde{M}\) and \(\widetilde{N}\) are lowdimensional representations of activity (projections onto main the principal components, the latter found via svd Matlab function) of neurons within the orofacial nuclei group and the target brain region, respectively, and W is a linear mapping operator onto the movement subspace.
Before finding a linear mapping, we also zeroed the initial state across projections on principal components by subtracting from each projection the mean value within [−2, −1.5] s from lick onset. This step avoided the need for using intercept in the linear fit and simplified the visualization of projections on principal components and movement/movementnull dimensions. Linear mapping was found using only the timeperiod containing movementrelated activity of orofacial nuclei [−0.1, 1.5] s around lick onset. This way we did not preclude the presence of preparatory activity on movement dimensions from the definition of the linear fit itself. A linear mapping to movement dimensions was found using linear regression with the Matlab function lsqnonlin.
Correspondingly, W_{null} was a nullspace of W and was found using the Matlab function null. We used two top principal components of orofacial nuclei activity (which captured 61% of the total crossvalidated variance; Extended Data Fig. 10a,b) and 4 top principal components of activity in a target brain region to find W and W_{null} operators (see Extended Data Fig. 10b–d). This choice resulted in both movement and movementnull subspaces being twodimensional. We additionally ensured that norms of these operator are equal \(\parallel {W}_{{\rm{null}}}\parallel =\parallel W\parallel \) in order to make the comparison between the movement and movementnull subspaces fair.
Since the definition of specific dimensions in movementnull subspace is to a degree arbitrary, we defined the first movementnull dimension by finding a rotation within the movementnull subspace that maximized the amount of variance captured by that dimension prior to lick onset. The second movementnull dimension was then simply orthogonal to the first dimension in movementnull subspace. This was used mainly to simplify visualization, with all subspacerelated analyses done using both dimensions in each subspace.
The positive direction of movement dimensions was chosen such that the mean value of projection of orofacial nuclei activity within [−2, 0.5] s around lick onset was positive. The positive direction for movementnull dimensions was chosen such that the mean value of projection of activity within [−2, 0] s around lick onset was positive.
Subspace occupancy
Relative subspace occupancy at a moment of time t was defined as
$${O}_{{\rm{R}}}(t)=\frac{{E}_{{\rm{null}}}(t){E}_{{\rm{m}}}(t)}{{E}_{{\rm{null}}}(t)+{E}_{{\rm{m}}}(t)}$$
where E_{null}(t) and E_{m}(t) are Euclidean distances within movementnull and movement subspaces, measured between the neural state at the current moment of time t and the initial time point (the mean across 2 and 1.5 s before the lick onset). Values close to zero signify equal occupancy between subspaces and positive values indicate a preferential occupancy of the movementnull subspace. The peaknormalized occupancy (Extended Data Fig. 11a,b) was defined as \(O(t)=\frac{E(t)}{\max (E)}\)
Decomposition of projections onto contributions from TFresponsive and TF nonresponsive units
We decomposed the projections on main principal components into a sum of contributions from TFresponsive and the TF nonresponsive units. For that, we used the knowledge of identity of each unit as TFresponsive or TF nonresponsive and wrote down the principal components U (from the singular value decomposition (SVD) of the firing rate matrix N = USV^{T}) as a sum of two parts as:
$$U={U}_{{\rm{TF}}}\left(\begin{array}{c}{U}_{i},i\in {\rm{TF}}\,{\rm{unit}}\\ 0,i\in {\rm{non}}\,{\rm{TF}}\,{\rm{units}}\end{array}\right)+{U}_{{\rm{non}}}\,{\rm{TF}}\left(\begin{array}{c}0,i\in {\rm{TF}}\,{\rm{unit}}\\ {U}_{i},i\in {\rm{non}}\,{\rm{TF}}\,{\rm{units}}\end{array}\right)$$
(2)
where U_{i} is a loading of the ith unit.
With that, projections on principal components can be written as:
$$\widetilde{N}={U}^{T}N={U}_{{\rm{TF}}}^{T}N+{U}_{{\rm{non}}}\,{{\rm{TF}}}^{T}N={\widetilde{N}}_{{\rm{TF}}}+{\widetilde{N}}_{{\rm{non}}}\,{\rm{TF}}$$
(3)
Substituting equation (3) into equation (1) gives projections onto movement dimensions as:
$${\widetilde{N}}^{{\rm{mov}}}=W\widetilde{N}={\widetilde{N}}_{{\rm{TF}}}^{{\rm{mov}}}+{\widetilde{N}}_{{\rm{non}}}\,{{\rm{TF}}}^{{\rm{mov}}}$$
and, correspondingly, projections on movementnull dimensions are written as:
$${\widetilde{N}}^{{\rm{null}}}={W}_{{\rm{null}}}\widetilde{N}={\widetilde{N}}_{{\rm{TF}}}^{{\rm{null}}}+{\widetilde{N}}_{{\rm{non}}}\,{{\rm{TF}}}^{{\rm{null}}}$$
The relative contribution of TFresponsive units within movement and movementnull subspaces at the moment of time t was then defined as following:
$$\begin{array}{l}{w}_{{\rm{mov}}}(t)=\frac{{\widetilde{N}}_{{\rm{TF}}}^{{\rm{mov}}}(t)\times {\widetilde{N}}^{{\rm{mov}}}(t)}{{\parallel {\widetilde{N}}^{{\rm{mov}}}(t)\parallel }^{2}}\times \frac{{\widetilde{N}}^{{\rm{mov}}}(t)}{ {\widetilde{N}}^{{\rm{mov}}}(t) }\\ {w}_{{\rm{null}}}(t)=\frac{{\widetilde{N}}_{{\rm{TF}}}^{{\rm{null}}}(t)\times {\widetilde{N}}^{{\rm{null}}}(t)}{{\parallel {\widetilde{N}}^{{\rm{null}}}(t)\parallel }^{2}}\times \frac{{\widetilde{N}}^{{\rm{null}}}(t)}{ {\widetilde{N}}^{{\rm{null}}}(t) }\end{array}$$
where the second multiplicative term ensures that the sign of contribution is relative to the defined positive direction (see above) of dimensions within each subspace.
In order to test whether the contribution of TFresponsive units is larger than what is expected from a uniform contribution of the full population, we repeatedly randomly selected (2,000 times) the same number of units as there were TFresponsive ones from the whole population and computed their contribution to projections on movement and movementnull dimensions as described above.
In addition to the analysis described above, we have also checked whether the abovechance contribution of TFresponsive units is a consequence of their level of activity, despite the normalization method that we used, or does it reflect a better correspondence of their activity to the population modes of activity within the movementnull subspace. For that we looked at the distribution of loadings along the first movementnull dimension–that captured the majority of preparatory activity there. We found that the majority of brain regions where TFresponsive units had abovechance contribution to the preparatory activity also had larger absolute values of loadings along that dimension than the rest of the population (Extended Data Fig. 11d,e).
Crossvalidation
Since our analyses were focused on characterizing the mean neural responses, the crossvalidation procedure that we used was designed to test the stability of the mean neural responses and their corresponding lowdimensional representations across trials. For that, we split trials into two randomly assigned and equally sized groups (fit and test trials) and calculated the mean neural response per unit across each group of trials. We next combined firing rates of neurons from the same brain region(s) (but not necessarily simultaneously recorded) into a joint matrix. After applying the preprocessing steps outlined above, we had two firing rate matrices from fit and test trials.
For crossvalidated PCA (Extended Data Fig. 10a), we applied SVD on the first (fit) matrix and measured how well the remaining (test) matrix is predicted by the reconstruction from SVD components found from the first matrix. Similarly, the projections of activity on main principal components (Extended Data Fig. 13) were done using the test data, projected onto principal components found from the fit data.
For further analyses utilizing movement and movementnull subspaces, we applied SVD separately on each matrix and found their projections on first four main principal components. We then used lowdimensional representation of fit trials data to find linear mapping W and W_{null} onto the movement and momentnull subspaces. Finally, we applied W and W_{null} found from the fit data to the lowdimensional representation of the test data. This procedure was repeated 2,000 times, the 95% confidence intervals shown in Fig. 6 illustrate the 2.5 and 97.5 percentiles across projections of the test data. Because the sign of projection is arbitrary defined, we additionally applied a potential flipping of the sign of eigenvectors from each draw based on which direction had better alignment with the eigenvectors computed from the full firing rate matrix without the split into fit and test trials.
Responses to TF pulses
For each brain region, we constructed a firing rate matrix of all units responses to a fast TF pulse (or concatenating in time responses of each unit to different types of TF pulses for analysis shown in Fig. 6i,k–m), and used the same preprocessing steps as described above. The projections onto the movement and movementnull dimensions were done using loadings found from the analysis of hit licks activity described above (using the full firing rate matrix of hitlick responses without the split into fit and test trials). Crossvalidation of consistency of projections was done by randomly selecting half of TF outlier events, computing the mean firing rate across those events for each unit, applying the steps above to find the projections, and repeating this procedure 2,000 times. For analyses where different brain regions were combined into a common group, all units from those brain regions were combined into a joined firing rate matrix and the steps described above were applied.
Alignment of fast TF pulse response with a given dimension in movement or movementnull subspace was calculated as a cosine of an angle between the projection onto a target dimension and a 4dimensional vector of TF pulse response (2 movement and 2 movementnull dimensions) at a time of the maximum Euclidean distance from the initial state across 4 dimensions within a 0.75s window from the pulse onset. Similarly, for calculating the scaling of responses to different TF pulses along the first movementnull dimension, we found the sizes of projections at times of maximal Euclidean distance from the initial state within a 0.75s window from the first TF pulse onset.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.