Animals
All animal experiments conformed to Berlin state, German federal and European Union animal welfare regulations and were approved by the LAGeSo, the Berlin authority for animal experiments. D. cerebrum were kept in commercial zebrafish aquaria (Tecniplast) with the following water parameters: pH 7.3, conductivity 350 µS cm−1, temperature 27 °C. We used male and female adult fish between 4 and 11 months of age.
Behavioural setup and protocol
The experimental setup comprised an inner 10 cm × 10 cm (length × width) tank with <200 µm thin optically opaque but acoustically transparent polypropylene sheet walls (cut out of plastic folders), surrounded by an outer tank with submerged speakers (4 × 3 Ekulit LSF-27M/SC 8 Ω in custom waterproof enclosures). Thus, the speakers were visually shielded from the fish inside the inner tank, and the fish were confined between the speakers (Fig. 1c provides a schematic; further details are provided in Extended Data Fig. 1a,b). The height of the water was 10 cm, and the transparent bottom of the inner tank was at 6.3 cm, leaving 3.7 cm to the water surface as a water column for the fish to swim in. The speakers were level with this water column, and all sounds were targeted for this water column. Infrared light-emitting diodes illuminated the fish from below. The inner tank was filmed with an overhead camera at 120 fps at 336 × 336 pixel resolution, and live tracking of the fish was carried out on a subset of frames at 15 fps. White light-emitting diodes lit the setup indirectly via reflections from the room walls. The room and water temperature was kept at 27 °C.
Each fish was tested once, and one fish was tested at a time. In the first minutes of the recording, a 10 cm × 10 cm acrylic plate with centimetre markings was placed in the inner tank to match the sound calibration grid to the video frame. Three minutes after placing the fish in the inner tank, playbacks were triggered for 45 min.
To probe into left–right directional hearing, playbacks from the front or back of the fish should be avoided. We prompted D. cerebrum to align with respect to the left–right speaker x axis to increase experimental throughput: previously, we observed that D. cerebrum swim closer to white than to black walls. By using two black plastic films as walls across the x axis and two white films across the y axis, we encouraged D. cerebrum to oscillate between the white walls, along the y axis (Extended Data Fig. 1c). Consequently, the ratio of distance covered along the y axis to the distance covered along the x axis was 1.6. Sound playback occurred only when fish were orthogonally oriented within a 45° angle measured from the orthogonal axis and within a 1.5 cm × 3 cm trigger zone at the centre, leaving at least 3.5 cm distance to the nearest wall (the typical startle displacement is mean ± s.d. = 1 cm ± 0.4 cm after the first 50 ms). The minimal delay between playbacks was set to ≥5 s with a minimum delay of 5 s plus a random delay, drawn from an exponential distribution with a mean of 5 s for each trial. This paradigm averaged to about 1.3 playbacks per minute in untreated fish and to about 0.6 playbacks per minute in lateral line-ablated fish.
Twelve target sounds were generated from a recorded pressure waveform (see the section of the Methods entitled Sound stimulation waveforms), targeted to the fish’s current position to cancel reverberations (see the section of the Methods entitled Calibration and reverberation cancellation), and presented to the fish in random order following trigger events using custom-written code in Python 3.
The data in Figs. 1–4 and Extended Data Figs. 4–7 and 12a stem from 65 untreated fish (3,798 playbacks, 1,415 startles, about 37% startles). For each stimulus, we indicated the number of fish that responded with at least one startle. The same experiment, also comprising 12 sound configurations, was repeated with 74 lateral line-ablated fish (Extended Data Figs. 7 and 9; 2013 playbacks, 910 startles, 45% startles). A third sound playback experiment was carried out in the dark in 43 untreated fish, testing a subset of 4 sound configurations (Extended Data Fig. 12b).
Behavioural analysis
Tracking
Pose tracking of D. cerebrum’s swimming behaviour was carried out with SLEAP75. In total, 140 frames across nine random recordings of male and female fish were hand-labelled with a skeleton consisting of 7 equidistant nodes along the fish’s body segments and 2 additional nodes, 1 for each eye. The ‘single-animal’ model was used for training. The model parameters and the trained model are available at the G-Node repository (see Data availability).
Startle detection
Plotting the fish’s velocity against time around playback revealed a sharp increase in velocity after playback, clearly visible across all playbacks (Fig. 1e and Extended Data Fig. 4a). We defined a 25-ms time window around the time of peak velocity at which the speed distribution is bimodal and computed the average velocity in this time window for each trial to classify all playback trials with an average velocity above 17 cm s−1 as startle trials (Extended Data Fig. 4b,c). The remaining ones were classified as non-startle trials. The decision criterion based on speed also resulted in a clear separation in terms of body bend (Extended Data Fig. 4d).
Directional bias
To classify startles into left or right, we measured the fish’s x displacement during the first 50 ms after startle initiation (Fig. 1e). This duration was chosen because displacement heat maps at varying delays revealed that the initial, lateral displacement phase of the startle response peaks after 50 ms (Extended Data Fig. 4f). Wherever centred trajectories are shown, these initial 50 ms are depicted. The directional bias of the startle response is the fraction of startles to one indicated direction (left or right, away or towards speaker). This bias can be computed in two ways.
Directional bias across trials and fish. For each stimulus or set of stimuli, startle trials were pooled across all fish, and the fraction of startles in one direction was calculated. Using the two-sided binomial test, we calculated how likely a measured directional bias (approach or escape) would have been observed if the response was unbiased.
Directional bias per fish. In the analysis of bias across trials and fish, theoretically, all trials could stem from one performing animal (not of concern here; Extended Data Fig. 5a,b). To complement this measure, we also quantified the directional bias per fish. We had 12 sound configurations in each experiment and startles averaged to a total of about 22 startles per fish in an experiment; hence, a meaningful per-fish bias could be computed only on pooled sound configurations and for fish with many startles. To estimate the directional bias of individual fish, we filtered for fish that had ≥10 startles in both the single-speaker condition (pooled over 4 stimuli) and the trick condition (pooled over 4 stimuli; Extended Data Fig. 6c,d). Although the value reflects directional behaviour in the population and estimates fish-to-fish variability, it selects for fish that trigger many playbacks and startle often.
Micro-CT
A 12-month-old male wild-type D. cerebrum was euthanized by ice shock and fixed with 4% paraformaldehyde in phosphate-buffered saline (PBS) at 4 °C overnight. The next day, the fish was washed for 15 min in PBS before being stained with 5% phosphomolybdic acid (Sigma Aldrich) solution in PBS at 4 °C overnight. After staining, the fish was washed in PBS for 15 min before embedding in 1% PBS-buffered agarose inside a cryo tube. The micro-CT scan was carried out at the ANATOMIX beamline at SOLEIL synchrotron by XPLORAYTION. The sample was placed into a 40-keV polychromatic (white) X-ray beam. A scan consisted of 3,200 projections collected at about ×10 optical magnification by a digital camera (Orca Flash 4.0 V2) with a sensor pixel size of 6.5 µm at 150 ms exposure time, yielding an effective pixel size of 0.6485 µm. The registered data were binned to 1.2970 µm voxel size. Key structures of the hearing apparatus were manually segmented. To this end, planes were hand-labelled using 3D Slicer76 (v5.6, https://slicer.org) and then interpolated using Biomedisa (v23)77. FIJI ImageJ (v1.5)78 was used to convert between different file types. The segments were turned into mesh grids and loaded into Blender for cleaning and rendering.
Lateral line ablation and DASPEI staining
To rule out that the lateral line organ senses sound directionality in our experiments, we ablated the lateral line using neomycin79. To ablate the neuromasts, fish were placed in a 200 µM neomycin solution for about 30 min. Afterwards, they were transferred to a beaker with tank water. Behavioural experiments started after ≥30 min. To confirm the reliability of the lateral line ablation protocol, we stained 30 neomycin-treated fish. After the behaviour experiment, they were transferred to a 100 µM DASPEI (2-[4-(dimethylamino)styryl]-1-ethylpyridinium iodide) solution and then to a beaker with tank water to wash out unbound DASPEI. Afterwards, the fish were euthanized with an ice shock and imaged with an epifluorescence microscope. Neuromasts were reliably stained in control fish but not in neomycin-treated fish, indicating reliable ablation (see Extended Data Fig. 9e,f for example images). As functional metrics we report an increase in number of wall contacts after startles (Extended Data Fig. 9g) and a decrease in foraging strikes in the dark (Extended Data Fig. 9h) in neomycin-treated fish.
Vibrometry
Confocal microscope
The confocal reflectance microscope was based on a custom-built laser-scanning two-photon microscope (Extended Data Fig. 10a). The illumination source was a Ti:sapphire laser (MaiTai DeepSee; SpectraPhysics) operated at 810 nm (with or without mode-locking). Before entering a laser-scanning two-photon microscope, the beam passed through a 90:10 beam splitter (90% reflection, 10% transmission). The light back-scattered by the fish inner structures was descanned, reflected by the 90:10 beam splitter, and then focused by a lens (f = 50 mm) into a single-mode fibre (core diameter: 25 µm, numerical aperture: 0.1) acting as a confocal pinhole. The microscope was controlled by custom-written software (https://github.com/danionella/lsmaq).
Acoustic stimulation
Fish were anaesthetized in 120 mg l−1 fish water-buffered MS-222. They were subsequently placed on a preformed agarose mould, which allowed the gill covers to move freely, and immobilized with 2% low-melting-point agarose (melting point 25 °C). A flow of aerated aquarium water (with anaesthetic) was delivered to their mouth through a glass capillary.
The fish was acoustically stimulated using two facing speakers sealed in custom-made waterproof enclosures. The diaphragms were exposed to water. The speakers were each placed about 1.3 cm away from the fish. They were driven using a DAQ card (National Instruments USB-6211), connected through audio amplifiers (Kemo M031N, 3.5 W). Pressures of up to about 176 dB (referenced to 1 µPa) were thus generated at the fish position in the pressure-only configuration and particle motion of up to about 8 mm s−1 in the particle-motion-only configuration, consistent with the expected amplitude relationship between pressure and particle motion in the sound monopole near field.
Motion phase maps
The principle of the laser-scanning vibrometric measurement is illustrated in Fig. 3b and Extended Data Fig. 10b. The sample (Extended Data Fig. 10b(i)) was stimulated with an acoustic sinusoidal wave at frequency \({f}_{{\rm{stim}}}\), and imaged with a laser-scanning microscope with a line rate \({f}_{{\rm{scan}}}\) (Extended Data Fig. 10b(ii)).
To reconstruct amplitudes and relative phases of sinusoidal object motion, we needed to measure each pixel under more than two different phases according to the Shannon–Nyquist sampling theorem. As noise can influence this measurement, we used four phase steps here, ensuring proper phase reconstruction while keeping acquisition sessions reasonably short.
To reconstruct the displacement of the moving structures inside the fish, each line of the image was repeatedly scanned \({\rm{nStep}}=4\) times, with a phase offset of π/2 between each line (Extended Data Fig. 10b(iii)). To this end, the stimulation frequency and the line rate must follow the relationship:
$${f}_{{\rm{stim}}}={(N+1/{\rm{nStep}})f}_{{\rm{scan}}}$$
with N being an integer. To maximize the line rate, we took \(N={\rm{floor}}(\,{f}_{{\rm{stim}}}/{f}_{{\rm{scan}}}\,).\)
This in turn set additional constraints on the various scanning parameters. We used \({f}_{{\rm{scan}}}=800\,{\rm{Hz}}\) and \({f}_{{\rm{stim}}}=\mathrm{1,000}\,{\rm{Hz}}\) for the data presented in Fig. 3 and Extended Data Fig. 11.
To ensure repeatable measurements, the acoustic stimulation and the galvanometric scanning mirrors were synchronized so that each pixel was recorded at a known sound phase. This was achieved by triggering the sound generation on each single frame scan trigger.
Doing so, each pixel was stroboscopically probed at \({\rm{nStep}}=\,4\) different phases of the acoustic stimulation cycle. As sound propagates while scanning two consecutive pixels, the probed acoustic phase is shifted by \(2{\rm{\pi }}\times {\rm{pixelPeriod}}\), which was taken into account in the motion reconstruction of the imaged structures (Extended Data Fig. 10c). These images were then reshaped to yield an (Nx,Ny,nStep) dataset (Extended Data Fig. 10b(iv)).
To analyse the motion of the inner structures of the fish, we used Matlab 2019b and a particle image velocimetry toolbox PIVlab80, originally developed to characterize the motion of flowing particles for fluid mechanics. Essentially, the particle displacement is assessed by cross-correlating subregions with decreasing sizes of consecutive images (Extended Data Fig. 10b(v)). The contrast of the reflectance images was enhanced before the displacement analysis, and the results were curated in post-processing by removing outliers and interpolating detection gaps.
The motion detection yielded x– and y-displacement maps at each of the four phases in the acoustic stimulation period. The first Fourier component was computed for each pixel to extract the amplitude and phase of the local displacement (Extended Data Fig. 10b(vi)). The phase was finally corrected for the accumulating phase offset along the horizontal x direction due to the line scanning procedure (Extended Data Fig. 10c). Owing to the synchronization of the acoustic stimulation with the line scanning process, we could carry out this measurement in several planes and obtain a consistent volumetric complex map characterizing the motion response of the various inner structures to the acoustic stimulation. Maximum-amplitude projections across planes delivered the shown two-dimensional phase maps, one for motion along the speaker–speaker axis (x) and one for motion orthogonal to the speaker–speaker axis (y).
Sound stimulation waveforms
We reasoned that D. cerebrum sense pressure and particle motion. Hence, our sound stimuli were defined in terms of three quantities: pressure, x acceleration and y acceleration, which were delivered to the fish’s current position by utilizing the frequency responses of speakers to cancel position-dependent reverberations (see the section of the Methods entitled Calibration and reverberation cancellation). y acceleration was always kept at zero, and only pressure and x acceleration were varied. In summary, 12 sounds were generated from a recorded pressure waveform and presented to the fish in a random sequence upon trigger events. The 12 sounds consisted of four single-speaker sounds (left or right × positive polarity or negative polarity), two sounds with only a pressure component (positive polarity or negative polarity), two sounds with only horizontal x-motion components (positive polarity or negative polarity) and four trick conditions, which exactly matched the four single-speaker target waveforms, but differed by the speakers that were active to realize these.
We observed that D. cerebrum startle when we drop a cylindrical piece of rubber into the water. We recorded the pressure waveform of this sound, high-pass filtered it at 100 Hz, and extracted a 12-ms snippet to serve as our pressure waveform template (note that conditioned sounds—that is, the actual speaker signals—were band-pass-filtered between 200 Hz and 1,200 Hz; see the following section). The target pressure amplitude was set to a peak sound pressure level of 167 dB (referenced to 1 µPa) by rescaling this pressure waveform accordingly. This amplitude was loud enough to elicit startles reliably and still supported by our small 2.7-cm-diameter speakers. The first peak’s rise time (10% to 90% absolute amplitude) was 0.664 ms and the centre frequency of the pulse was about 780 Hz. The target horizontal particle acceleration waveform was computed from the pressure waveform using monopole theory for each Fourier component, as follows.
The pressure signal decays as \(1/r\) with radial distance \(r\) away from a sound monopole with amplitude \({p}_{0}\) at distance \({r}_{0}\)
$$\widehat{p}(r,t)={\widehat{p}}_{0}\frac{{r}_{0}{{\rm{e}}}^{ikr}}{r}{{\rm{e}}}^{-i\omega t}$$
and with frequency f, \(\omega =2{\rm{\pi }}f\), wavenumber \(k=2{\rm{\pi }}/\lambda \), wavelength \(\lambda \) and speed of sound \({c}=\,\lambda f\).
In a medium of density \(\rho \), the radial particle velocity decays quadratically with distance in the near field (\({kr}\ll 1\), limit dependent on frequency):
$${\hat{v}}_{r}(r,t)=\left[\frac{1}{\rho c}\left(1+\frac{i}{{kr}}\right)\hat{p}(r,t)\right]$$
By contrast, particle acceleration—the temporal derivative of particle velocity—decays quadratically with distance for nearby sounds (\(r\ll 1\), limit independent of frequency):
$${\hat{a}}_{r}(r,t)=-\,i\omega {\hat{v}}_{r}(r,t)=\frac{1}{\rho }\left(\frac{1}{r}-{ik}\right)\hat{p}(r,t)$$
To compute the particle acceleration \({a}_{r}(r,t)\) at a distance \(r\) to a sound monopole with pressure \(p(r,t)\) for discrete signals of arbitrary waveform, we applied this equation separately for each Fourier component. Given a pressure waveform \(\{{{\bf{p}}}_{n}\}\,:={p}_{0}{,p}_{1},\cdots ,{p}_{N-1}\) with \(N\) samples \({p}_{n}\), spaced at \(T=1/{sr}\) with sample rate \({sr}\), the particle acceleration \(\{{{\bf{a}}}_{n}\}\,:={a}_{0}{,a}_{1},\cdots ,{a}_{N-1}\) that would be observed at a distance \(r={r}_{0}\) from a sound monopole was calculated by carrying out the discrete Fourier transform \(\{{{\bf{P}}}_{l}\}\,:={P}_{0}{,P}_{1},\cdots ,{P}_{N-1}\)
$${P}_{l}=\mathop{\sum }\limits_{n=0}^{N-1}{p}_{n}{{\rm{e}}}^{-\frac{i2{\rm{\pi }}}{N}{\rm{ln}}}$$
and deriving particle acceleration for each Fourier component \(\{{{\bf{A}}}_{l}\}\,:=\,{A}_{0}{,A}_{1},\cdots ,{A}_{N-1}\) independently. With corresponding frequencies \({f}_{l}\approx l/({NT})\), such that \(k\approx 2\pi l/({NTc})\), and the relationship between pressure and particle acceleration, \({A}_{l}\), is calculated as
$${A}_{l}=\frac{1}{\rho }\left(\frac{1}{{r}_{0}}-i{k}_{l}\right){P}_{l}$$
which defines the radial particle acceleration through the inverse Fourier transform:
$${a}_{n}=\frac{1}{N}\mathop{\sum }\limits_{l=0}^{N-1}{A}_{l}\,{{\rm{e}}}^{i\frac{2{\rm{\pi }}}{N}{\rm{ln}}}$$
In the experiments, \({r}_{0}\) was set to 3 cm, thus simulating a monopole sound source at 3 cm, irrespective of D. cerebrum’s relative position to the speakers. This resulted in a peak particle acceleration of 7.59 m s−2. Other parameters used were \(c=\,\mathrm{1,500}\,{\rm{m}}\,{{\rm{s}}}^{-1}\), \(\rho \,=\,\mathrm{1,000}\,{\rm{kg}}\,{{\rm{m}}}^{-3}\) and \({\rm{sr}}=\,\mathrm{51,200}\,{\rm{Hz}}\). In terms of pressure, x acceleration and y acceleration (p, ax and ay), there were eight different target configurations, with ‘+’ indicating polarity of the template waveform and ‘−’ indicating opposite polarity: four monopole configurations (+,+,0), (−,−,0), (+,−,0) and (−,+,0); two pressure configurations (+,0,0) and (−,0,0); and two motion configurations (0,+,0) and (0,−,0). Despite a total of eight target configurations, there were 12 sound configurations as the four monopole configurations can be realized in two ways, either with a single speaker or with three speakers (trick configuration; see the next section).
Calibration and reverberation cancellation
Conducting experiments in small tanks presents challenges as both tank geometry and the receiver’s position affect the sound amplitude and waveform sensed by the receiver (Extended Data Fig. 2c). By recording the speakers’ impulse responses inside the inner tank in terms of pressure and particle acceleration (Extended Data Fig. 2b), speakers could be activated to precisely control pressure and particle acceleration components at the fish’s location (Extended Data Fig. 2d).
Pressure and acceleration measurements
Pressure. Pressure was measured with a hydrophone (Aquarian Scientific AS-1, preamplifier: Aquarian Scientific PA-4, acquisition: NI-9231 sound and vibration module, National Instruments; Extended Data Fig. 1d). During repeated playback of the same sound, a single hydrophone was automatically moved across a 5 × 5 grid inside the inner tank, sampling with a spacing of 1.5 cm (Extended Data Fig. 1c,e). Hence, a 25-point pressure field was obtained for each sound configuration, spanning a 6 cm × 6 cm square at the centre of the inner tank between the speakers.
Acceleration. Particle acceleration was measured in two ways.
In the first method, particle acceleration was measured indirectly through the pressure gradient. Newton’s second law of motion (pressure gradient force)
$${\bf{a}}=-\frac{1}{\rho }{\boldsymbol{\nabla }}P$$
links the spatial pressure gradient to particle acceleration. In water, with density \(\rho =\mathrm{1,000}\,{\rm{kg}}\,{{\rm{m}}}^{-3}\) and speed of sound \(c=\mathrm{1,500}\,{\rm{m}}\,{{\rm{s}}}^{-1}\), the following approximation holds for pressure signal frequencies \(f\ll \,100\,{\rm{kHz}}\), if the pressure gradient is sampled with step size \({x}_{2}-{x}_{1}=\,1.5\,{\rm{cm}}\):
$${a}_{x}=-\frac{1}{\rho }\frac{p({x}_{2})-p({x}_{1})}{{x}_{2}-{x}_{1}}$$
The approximation holds for all frequencies used in this experiment. For measuring gradients, moving a single hydrophone is preferred over a hydrophone array, as the gradient could be biased by small differences in hydrophone sensitivity and perturbations of the sound field by the presence of other hydrophones. We calculated x and y acceleration on the basis of the 25-point pressure field recorded with a single hydrophone. The pressure field included points outside the trigger zone to compute pressure gradients (that is, acceleration) across the trigger zone boundary.
In the second method, particle acceleration was additionally directly measured along all three axes with an acceleration sensor (Triaxial ICP – Model 356A45, PCB Piezotronics, acquired with NI-9231 sound and vibration module, National Instruments; Extended Data Fig. 1d). Like the hydrophone, the acceleration sensor was moved across all 5 × 5 grid positions during repeated playback of the same sound, giving measurements for x, y and z acceleration.
Whereas hydrophones are manufactured and calibrated for underwater use, the particle acceleration sensor is not made to measure particle acceleration underwater and is meant to be glued onto the vibrating object. Owing to an acoustic impedance mismatch between metal and water, we expected the PCB sensor to underestimate particle acceleration.
We compared x and y acceleration waveforms for both measurement methods and found that the acceleration waveforms acquired through the direct method match the waveforms acquired through the indirect method after multiplication by a factor of about 2.4. The close match in rescaled waveforms confirms the validity of the gradient approximation in the indirect method.
Hence, in all experiments, x and y acceleration were measured through the indirect method, on the basis of spatial pressure gradients. The particle acceleration sensor still proved useful in measuring the vertical z acceleration in our setup.
Impulse response-based sound targeting
To create the same sounds at any position inside the inner tank, impulse responses for all 4 speakers were measured across 25 positions on a 5 × 5 grid with 1.5-cm spacing. In the following, the sound targeting method is described for one position.
Let \({k}_{i,p}\) be the pressure impulse response kernel, \({k}_{i,{a}_{x}}\) be the x acceleration impulse response kernel, and \({k}_{i,{a}_{y}}\) be the y acceleration impulse response kernel for the ith speaker. Using \(M\) speakers with signal \({s}_{i}\), pressure and acceleration can be predicted through convolution (\(* \)):
$$\begin{array}{l}\,p=\mathop{\sum }\limits_{i=0}^{M-1}{k}_{i,p}\ast {s}_{i}\\ {a}_{x}=\mathop{\sum }\limits_{i=0}^{M-1}{k}_{i,{a}_{x}}\ast {s}_{i}\\ {a}_{y}=\mathop{\sum }\limits_{i=0}^{M-1}{k}_{i,{a}_{y}}\ast {s}_{i}\end{array}$$
In the Fourier domain, utilizing the convolution theorem, these become a system of equations for each Fourier component \(l\).
$$\begin{array}{l}\,{P}_{l}=\mathop{\sum }\limits_{i=0}^{M-1}{K}_{i,p,l}\,{S}_{i,l}\\ {A}_{x,l}=\mathop{\sum }\limits_{i=0}^{M-1}{K}_{i,{a}_{x},l}{S}_{i,l}\\ {A}_{y,l}=\mathop{\sum }\limits_{i=0}^{M-1}{K}_{i,{a}_{y},l}{S}_{i,l}\end{array}$$
On the basis of the Fourier components of the target waveforms (see the section of the Methods entitled Sound stimulation waveforms), \({P}_{l}\), \({A}_{x,l}\) and \({A}_{y,l}\), and the Fourier components of the impulse response kernel \({K}_{i,p,l}\),\({K}_{i,{a}_{x},l}\) and \({K}_{i,{a}_{y},l}\), the system of equations can be solved for the Fourier components of the speaker signals \({S}_{i,l}\) as long as \({M}\ge 3\) and the kernel components are non-zero and non-identical. The time-domain signal for the ith speaker is then given by the inverse Fourier transform using components \({S}_{i,l}\).
To increase robustness of the solutions (for example, to avoid speakers cancelling themselves unnecessarily and to limit speaker amplitude), speaker signal waveforms were forced to become similar to the target waveform. This was implemented by solving the system of equations with a least-square solver (scipy.optimize.lsq_linear) with bounds \(-{B}_{i,l} < {S}_{i,l} < {B}_{i,l}\). The bound \({B}_{i,l}\) was computed as a rescaling of the absolute Fourier components of the target pressure waveform \({P}_{l}\)
$${B}_{l}={\alpha }_{i}\,y| {P}_{l}| $$
in which \(\gamma \) is fixed and scales pressure to voltage and \({\alpha }_{i}\) is a rescaling parameter set independently for each speaker to give additional control over active speakers. We list our values for αi used in different sound configurations in Supplementary Table 2.
After conditioning, all computed speaker signals were band-pass-filtered between 200 Hz and 1,200 Hz to avoid activating the lateral line.
To ensure that the trick configuration differed from the single-speaker configuration only by selective pressure inversion, a two-step sound conditioning was carried out. First, the speaker signals for the single-speaker configuration were calculated. Then, these signals were effectively fixed to closely resemble the single-speaker signal and only activations of the two speakers along the orthogonal axis were conditioned.
The above calculation was carried out for the 25 grid positions. The computed speaker signals accurately delivered the target waveforms to the target position (Extended Data Figs. 2d and 3 and Supplementary Table 2). To ensure consistency over experiments, the water level was kept at precisely 10 cm, and the pressure and acceleration fields inside the inner tank were measured several times (this includes before the first recording and after the last recording).
During the experiment, the fish’s x–y position was detected at 15 Hz, and the loading for the speakers was linearly interpolated on the basis of targeted sounds at neighbouring grid positions.
In the section entitled Sound stimulation waveforms, we describe how we defined the pressure and particle motion target waveforms that were conditioned this way.
Estimation of binaural cues (P-ILD, M-ILD, P-ITD, M-ITD)
ILDs
To estimate binaural cues in our behavioural experiment, we analysed the pressure and particle motion at sound calibration grid points 3 cm apart, (\({x}_{0}\)) 1.5 cm to the left and (\({x}_{1}\)) 1.5 cm to the right of the centre grid point. To estimate sign and peak amplitude of level differences, we calculated P-ILD (pressure ILD) as \({\max }_{t}({\rm{abs}}(p({x}_{0},t)))\,-{\max }_{t}({\rm{abs}}(p({x}_{1},t)))\) and M-ILD (particle motion ILD) as \({\max }_{t}({\rm{abs}}({a}_{x}({x}_{0},t)))\,-{\max }_{t}({\rm{abs}}({a}_{x}({x}_{1},t)))\). The level differences between these two points were divided by a factor 50 to estimate the level difference across the left-to-right inner ear axis of the fish (about 0.6 mm). Comparing the single-speaker configuration with the trick configuration, these data show that the sign of M-ILD remains the same (+0.11 m s−2 versus +0.30 m s−2), but the sign of P-ILD is inverted (+4.4 Pa versus −4.6 Pa). For a geometrical illustration of the inversion of P-ILD, see Extended Data Fig. 8d.
ITDs
ITDs were estimated by calculating the phase propagation in different sound configurations under a monopole approximation (Extended Data Fig. 8c–f).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.