First detection of frequency-dependent, time-variable dispersion measures

Context. High-precision pulsar-timing experiments are affected by temporal variations of the Dispersion Measure (DM), which are related to spatial variations in the interstellar electron content. Correcting for DM variations relies on the cold-plasma dispersion law which states that the dispersive delay varies with the squared inverse of the observing frequency. This may however give incorrect measurements if the probed electron content (and therefore the DM) varies with observing frequency, as is predicted theoretically. Aims. We study small-scale density variations in the ionised interstellar medium. These structures may lead to frequency-dependent DMs in pulsar signals and could inhibit the use of lower-frequency pulsar observations to correct time-variable interstellar dispersion in higher-frequency pulsar-timing data. Methods. We used high-cadence, low-frequency observations with three stations from the German LOng-Wavelength (GLOW) consortium, which are part of the LOw Frequency ARray (LOFAR). Specifically, 3.5 years of weekly observations of PSR J2219+4754 are presented. Results. We present the first detection of frequency-dependent DMs towards any interstellar object and a precise multi-year time-series of the time- and frequency-dependence of the measured DMs. The observed DM variability is significant and may be caused by extreme scattering events. Potential causes for frequency-dependent DMs are quantified and evaluated. Conclusions. We conclude that frequency-dependence of DMs has been reliably detected and is caused by small-scale (up to 10s of AUs) but steep density variations in the interstellar electron content. We find that long-term trends in DM variability equally affect DMs measured at both ends of our frequency band and hence the negative impact on long-term high-precision timing projects is expected to be limited.


Introduction
Pulsars (first discovered by Hewish et al. 1968) are highly magnetised, rapidly rotating neutron stars, the remnants of massive stars that ended their life in a supernova.Generally it is thought that pulsars emit beams of radiation at their magnetic poles due to magnetospheric effects that are not fully understood (e.g.Karastergiou et al. 2015).The magnetic and spin axes of pulsars are generally not aligned, which causes the emission beams to sweep around in space as the neutron star rotates.If one or both of the emission beams cross the line of sight towards Earth during the rotation, regular pulses of radiation can be detected, in which case the neutron star is called a pulsar.
The pulsed nature of the emission received from pulsars enables unique and highly precise measurements of the electron density in the ionised interstellar medium (IISM).This is due to the frequency-dependent propagation speed of electromagnetic radiation in an ionised medium, a phenomenon termed dispersion.Specifically, the additional travel time for a wave at frequency ν when compared to a wave at infinite frequency, is ap-proximated as (Lorimer & Kramer 2005): where D 4.149 × 103 MHz2 pc −1 cm 3 s is the dispersion constant 1 and the 'dispersion measure' DM (expressed in pc/cm 3 ) is defined as: where d is the distance to the pulsar (expressed in pc) and n e is the electron density (in cm −3 ).This approximation makes use of the fact that ν ν p and ν ν c , where (see Lorimer & Kramer 2005) ν p = e 2 n e πm e = 8.98 kHz • n e cm −3 and (3) are the plasma and cyclotron frequency respectively (in cgsunits), with e and m e the charge and mass of an electron, respectively, c the speed of light in vacuum and B the magnetic field strength (in Gauss).First-order deviations from the dispersion law (Eq.1) were described by Tanenbaum et al. (1968): with T 2 = ±2ν c cos γ (ν 3 2 − ν 3 1 ) / ν 2 ν 1 (ν 2 2 − ν 2 1 ).( 7) T 1 and T 2 are only dependent on the electron density and the magnetic field strength along the line of sight, respectively, as well as the observing frequencies in question.Tanenbaum et al. (1968) could not find any evidence for a deviation from Eq. 1 in their dataset.More recently, Hassall et al. (2012) has come to the same conclusion.
Since pulsars are typically high-velocity objects (Gunn & Ostriker 1970;Lyne & Lorimer 1994), their lines of sight travel through the Galaxy with sufficient speed that variations of DM in time (corresponding to spatial inhomogeneities in the IISM) are regularly observed (see, for example, Rawley et al. 1988).However, such variations can only be accurately measured and distinguished from other noise sources if the fractional bandwidth of the observations is sufficiently large or if a range of observing frequencies are available, even though they do (possibly significantly) affect narrow-band or frequency-integrated observations as well (Lentati et al. 2016).
The importance of accurate measurements of time-variable DM values lies in the main applications of pulsars.Due to their extremely high rotational stability (rivalling atomic clocks on the long term: Hobbs et al. 2012), pulsars have become one of the main tools with which to test a wide variety of physics, from the equation of state at super-nuclear densities (Lattimer & Prakash 2016) to general relativity and a variety of alternative theories of 1 Often the inverse is defined in the literature.gravity (Will 2014, and references therein).These pulsar-timing 2 tests, however, typically take place at relatively high frequencies -generally around 1.4 GHz, where for most pulsars a useful balance is found between the brightness of the pulsar itself and the Galactic synchrotron background noise; where RFI is relatively limited; and where high-quality receiver systems are commonly available.The fact that the interstellar dispersion (and hence the variations in interstellar electron content) only has a limited impact at these frequencies, is also in principle a positive aspect, as it prevents corrupting effects from IISM turbulence in the pulsartiming data.
For long-term high-precision timing projects, however, this situation may change since the power spectrum of the turbulent structures in the IISM is steep, with significantly more power at the larger scales (Armstrong et al. 1995).This implies that for the most precise and longest-term pulsar-timing projects (like the 'pulsar timing array' (PTA) projects which aim to detect gravitational waves, see Tiburzi 2018) it may not suffice to opt for these higher frequencies, since sooner or later IISM turbulence may still become a problem.In their discussion of PTA experiments with the Square Kilometre Array (SKA), Janssen et al. (2015) propose ways to mitigate or prevent corruption of PTA data by variations in pulsar DMs.One approach uses lowfrequency data (specified as between ∼100 and ∼300 MHz) to independently monitor the IISM and construct DM time series that can be used to correct time variability in the DMs at higher frequencies (which are more sensitive to the pulsar but less to the interstellar dispersion).
While most of the observed DM variations to date have been interpreted as being caused by the IISM's turbulence, a second source of variability was identified in the so-called 'extreme scattering events' (ESEs). 3ESEs were first discovered as rapid variations in AGN flux densities (Fiedler et al. 1987), but were soon also detected in the flux density, arrival times and DMs of pulsars (Cognard et al. 1993;Maitia et al. 2003;Kerr et al. 2018).A variety of models and origins of ESEs has been proposed (see, e.g.Walker 2001), though commonly two prime models are used: either the ESEs are treated as individual 'lenses', that is, local overdensities or self-contained clouds (see, e.g.Romani et al. 1987;Walker & Wardle 1998;Cognard et al. 1993), or they are seen as part of the larger-scale turbulent structure in the IISM (as suggested by Fiedler et al. 1994;Coles et al. 2015).One possible way to differentiate between these two scenarios would be to probe the turbulence within an ESE, as attempted by Lazio et al. (2000).As reviewed by Bignall et al. (2015), the origin and IISM role of ESEs are at present not fully understood.However, their potential relation to scintillation arcs (higher-order interference in pulsar scintillation, see Stinebring et al. 2001) does suggest that filament-like structures (Brisken et al. 2010), which may be part of the larger-scale IISM turbulence (Pen & King 2012;Pen & Levin 2014), could contribute to the solution.More recently, Coles et al. (2015) found that several ESEs that were observed in pulsar observations did appear to be related to the IISM's Kolmogorov turbulence.In the context of pulsar timing, understanding the prevalence, origin and nature of ESEs is crucial before their impact on timing experiments (and their potential mitigation) can be evaluated.
In correcting time-variable DM delays in high-precision pulsar timing data (as discussed above), a major potential problem lies in the possibility of frequency-dependent DMs, also known as 'chromaticity'.Such a phenomenon could be induced in two different ways.As the dispersion law (Eq.1) is an approximation for the case ν ν p and ν ν c , it could be invalid in extreme cases (e.g.low frequencies, large electron densities or magnetic field strengths).In that case, assuming the dispersion law to be accurate would lead to a different DM measured at different frequencies (see Eqs. 5 to 7).
Chromaticity could also be induced by the fact that variations in electron density in the IISM do not only cause time variations in the measured dispersion, they also induce refraction of the radiation on several different spatial scales.This refraction causes rays to not travel along a perfectly straight line from the pulsar to the observer, but rather over some sort of 'random walk'.Since the strength of the refraction is frequency-dependent, this also means that the photons we receive at different frequencies traverse different parts of the IISM and therefore may sample regions with different electron densities.In principle this would lead to frequency-dependent measurements of DM in the case of inhomogeneous media.As demonstrated by Cordes et al. (2016) both theoretically and through simulations, the fact that the IISM volumes sampled by the radio waves differ across frequencies effectively causes the DM time series observed at low frequencies to be very similar to a low-pass-filtered (or smoothed) version of the DM time series measured at higher observing frequencies (see Figures 3 and 4 of Cordes et al. 2016).
In practice, this effect has not been observed, although limits have been placed using extremely wide ranges of frequencies (Hassall et al. 2012;Pennucci et al. 2014).An observational test of this phenomenon would allow realistic tests of how such DM chromaticity may affect the usefulness of low-frequency DM time series for correcting higher-frequency pulsar-timing data.
In this paper, we present high-cadence low-frequency observations of PSR J2219+4754, a slow pulsar discovered by Taylor & Huguenin (1969) with a DM of 43.5 cm −3 pc.Observed with the LOw Frequency ARray (LOFAR) telescope, it is one of the brightest sources in the northern sky (see Bilous et al. 2016).Combined with the high fractional bandwidth of LOFAR, this leads to a very high DM measurement precision (see Verbiest & Shaifullah 2018).Strong DM variations have previously been reported for this source by Ahuja et al. (2005).Additionally, this pulsar is known to show profile-shape variations.While Suleymanova & Shitov (1994) concluded that the profile-shape variations they observed are intrinsic to the pulsar, more recently, the analysis of our companion paper (Michilli et al. 2018) suggests an interstellar origin.
In Section 2 we describe the observations used in our work, while Section 3 explains the steps taken in deriving the DM time series (and the detected frequency dependence of the measured DM values).Section 4 discusses the nature of the DM variations observed and assesses possible implications for pulsar timing.Section 5 concludes the paper by summarising our main findings.

Observations
Our analysis is based on data from three German stations of the International LOFAR Telescope (ILT, van Haarlem et al. 2013), namely the stations in Effelsberg (telescope identifier DE601), Tautenburg (DE603) and Jülich (DE605), between 25 February 2013 and 25 November 2016 (see Table 1).LOFAR is described in detail by van Haarlem et al. (2013) and some aspects of particular relevance to pulsars are described more in-depth by Stappers et al. (2011).In contrast to the set-up described in those papers, the observations used in our work were carried-out in a 'stand-alone' mode, in which the individual stations were disconnected from the ILT network and used as independent telescopes.While in stand-alone mode, the beamformed data were sent from the stations in Effelsberg, Tautenburg and Jülich to the Max-Planck-Institut für Radioastronomie (MPIfR) on dedicated, high-speed links, where recording computers ran the dedicated LOFAR und MPIfR Pulsare (LuMP4 ) data-taking software, which formats and otherwise prepares the beamformed pulsar data for subsequent (offline but near-real-time) phase-resolved averaging (commonly referred to as 'folding') using the dspsr software package (van Straten & Bailes 2011).This produces data cubes with resolution in frequency (195.3125kHz-wide channels), time (10-sec sub-integrations), polarisation (four coherency products) and rotational phase (1024 phase bins).
A few changes to the set-up of this network were made over the years during which these observations were taken.Specifically, in August 2013 a new data-taking mode was deployed to allow a reduction in the number of bits with which the data were recorded (reduced to eight bits from the original 16 bits).Since the total data rate which the recording computers and network links can keep up with limits the total amount of data that can be recorded, this reduction in bits enabled an increase in observing bandwidth.Consequently, the observing bandwidth doubled from 47.7 MHz to 95.3 MHz starting in mid August 2013.This change in bandwidth also slightly changed the centre frequency of the observations, which moved from 138.77 MHz to 149.90 MHz at that time.This implies a shift of the centre frequency by an integer number (57) of frequency channels.(The last observation with 47.7 MHz of bandwidth in our data set was taken on 19 August 2013 while the first with 95.3 MHz of bandwidth was recorded on 27 August 2013.) For technical reasons, we restricted the bandwidth of observing to 71.5 MHz from February 2015 onwards.In order to minimise the impact of this bandwidth reduction on the scientific quality of the data, the observed bandwidth was kept centred on the most sensitive part of the bandpass, thereby causing the centre frequency to shift slightly from 149.90 MHz to 153.81 MHz.This shift was again made by an integer number of frequency channels (20 channels in this case), so that the frequencies of individual channels remained constant over the entire dataset.

Data analysis
The data analysis has been carried out using the psrchive (Hotan et al. 2004;van Straten et al. 2012), tempo2 (Hobbs et al. 2006) and coastguard (Lazarus et al. 2016) software packages, as detailed below.

Pre-processing
Before any of the other analysis steps were carried out, the data were inspected for man-made radio-frequency interference (RFI) and any affected channel-subintegration combinations were removed from the data using the 'Surgical' algorithm from the clean.pyscript, which is part of the coastguard python Table 1.Summary of observations.Given are the telescope identifier; the number of observations N obs ; the time span of the observations; the range of observation lengths and the median observation length; the bandwidth of the observations (which changed for DE601 and DE605 as discussed in Section 2) and the geographical location of the stations.package. 5In addition to this, throughout the analysis the original data from outlier points were also visually inspected to ensure the absence of RFI.Where needed, RFI was manually removed using the psrchive program pazi upon which the processing was repeated.(The number of data points that were reprocessed in this way was minimal.)Typically about 25% of our data were excised due to presence of RFI.This percentage varied depending on the time of day of the observation (with higher prevalence of RFI during the day and lower during the night) but no significant difference was seen in the RFI fraction of the different stations.
The data were calibrated in polarisation following the methods outlined in Noutsos et al. (2015), after which the coherency products were combined to yield total intensity.Given the limited length of the observations (typically shorter than two hours) and the fact that the hour angle and elevation of all observations were highly similar as the observations were always scheduled close to or across transit, the calibration (and therefore also its imperfections) did not significantly affect our analysis.Next, the total-intensity profiles were averaged in time, resulting in a single, frequency and phase-resolved pulse profile for every observation.
The data were not averaged in frequency, but in order to produce as homogeneous a data set as possible, the data observed with 95.3 MHz of bandwidth were downsized to 71.5 MHz instead, by cutting out the edges of the band, where the sensitivity is low due to the presence of filters (van Haarlem et al. 2013).The full-bandwidth data remains available upon request for possible follow-up investigations, when needed.

Timing and DM time series
In order to determine the DM time series, we used the pulsartiming technique (see Lorimer & Kramer 2005).While an accurate timing model is in principle not required for instantaneous measurements of DM, the time-averaging of observations does improve if the pulsar timing model is of good quality.Consequently we carried out an initial, straightforward timing analysis based on a single analytic template profile constructed of von Mises functions (see, e.g.Jammalamadaka & SenGupta 2001) of the form: with A being the amplitude of the component, κ the so-called compactness, and µ the pulse phase.One full rotation corresponds to one unit in x.Around five of these functions were fitted to an arbitrary, fully frequency-averaged high-S/N observation, using the psrchive program paas.This analytic template was cross-correlated against the profiles of each frequency channel of each observation (using the standard method described by Taylor 1992), after which the tempo2 software package was used to fit for the DM at each observing epoch.Subsequently these daily DM measurements were held fixed in the timing model of Hobbs et al. (2004) while the entire data set was used to fit the pulsar's spin period, spin period derivative and position.This updated timing model was then used to re-do the time-averaging of the observations, after which the process was iterated until the timing model and DM time series converged.After this initial timing analysis, the proper motion in the timing model was updated from the values of Lyne et al. (1982) to those of Michilli et al. (2018) (which was not available at the start of our analysis), although this had no significant impact on our results given the short time-span of our observations.Since this pulsar exhibits large amounts of timing noise (Hobbs et al. 2004) and because our timing analysis fully ignored frequencydependent profile evolution and the temporal evolution of the profile described in our companion paper (Michilli et al. 2018), the results of our timing analysis were not ideal and certainly not predictive enough to warrant publication of the timing model.However, the timing model thus obtained did succeed in phasealigning our observations to within ∼ 400 µs, that is, within a phase bin; and much more precisely within any given observation.We therefore used this timing model for the final timeaveraging of our data.
In order to determine highly precise and reliable DM values, a more advanced timing analysis was carried out.For this analysis we used a frequency-resolved template.To create this template we combined two long observations taken with DE601 on 19 May 2015 (MJD 57161), from 03:00 to 12:00 UTC and on 20 May 2015 (MJD 57162), from 00:00 to 07:50 UTC, for a total effective duration of 16.6 hours.This observation was averaged in time and summed to total intensity, providing a frequency-resolved pulse profile with a S/N a few times that of the typical observation.This template was subsequently used as the phase reference for timing and was otherwise fully omitted from our analysis.The pulse times-of-arrival (ToAs) were in this case determined using the Fourier-Domain Monte-Carlo (FDM) approach6 , as advised by Verbiest et al. (2016), on a channelby-channel basis (i.e.resulting in up to 366 simultaneous ToAs per observation).Since the template profile is data-derived and frequency-resolved, any static frequency dependence of the template shape would not affect our analysis, as it is inherently taken into account.(Also, any DM measurement derived from this analysis is by definition referred to the DM incorporated in this template, thereby rendering an absolute DM measurement im- possible.)However, time variability of the pulse profile shape (as investigated by Michilli et al. 2018), does have the potential to negatively impact the reliability of our results.This is discussed in detail in Section 3.3.
In this more advanced timing analysis we do not fit any time-dependent timing-model parameter but exclusively for DM on an observation-by-observation basis.Since these DM fits are based on a set of simultaneous ToAs (i.e.those derived from the different channels of the given observation), no timedependent timing-model parameters affect our results.An arbitrary phase offset is routinely subtracted along with every fit, in order to prevent biases discussed by Keith et al. (2013).The DM measurements from this analysis had a median uncertainty of 3.7 × 10 −5 cm −3 pc and their time series is presented in Figure 1, but before deeper consideration of these DM values, some corrupting influences will be discussed below.

Impact of pulse-shape variations on the DMs
As mentioned in the previous paragraph, time variations in the shape of the pulse profile could corrupt our measured DM values, particularly if this time variability is also frequency dependent.The profile-shape evolution discussed in our companion paper (Michilli et al. 2018) is of particular concern since it is known to affect our data and has been shown to be frequency dependent.Specifically, Michilli et al. (2018) report time-variable scattering that shows up as additional pulsed components at and near the trailing edge of the pulse profile.As scattering is frequency dependent, so are these components, being more pronounced at lower frequencies than at higher frequencies.The net effect of such additional profile components on our DM measurements would be to delay the measured ToAs; and since the additional components are more pronounced at lower frequen- cies, this would lead to an overestimate of the DM. Figure 2 shows the worst-case profile-shape variability7 present in our data set, comparing the pulse profile at MJD 56570, when the scattering components are most pronounced, to the standard template from MJD 57161 when the scattering is minimal, but some reflected 'echo' images of the main pulse are faintly visible at longer lags (see Michilli et al. 2018, for a full discussion of these pulse-shape variations).
To quantify the impact of the scattered power in the trailing edge of the pulse profile shown in Figure 2, an analytic model of the (unscattered) template profile was extended with extra power in the trailing edge to match the one shown in the observation of MJD 56570.By timing both of these analytic pulse models (the one with additional power in the trailing edge and the one without) and repeating the process at three different frequencies, we could directly determine the impact the scattering has on the ToAs at different frequencies.
Fitting for the DM for the scattered and unscattered profiles returns a difference of ∆DM = 1.5(4) × 10 −4 cm −3 pc.We note this DM difference is the maximum offset induced by profileshape changes.A similar test for the reflected 'echo' images visible at later MJDs lead to an insignificant impact on the DM.In comparison to the DM variations and DM measurement precision shown in Figure 1, we note that enhanced scattering by this amount does slightly alter the results, but does not fundamentally change the shape of the (much stronger) DM variations we identified.
Now that the maximum impact of the profile-shape variations on our DM measurements has been established, we investigate how the amplitude of these profile-shape differences evolves in time.Specifically we are concerned with the amplitude variations of the scattered power in the trailing edge that is visible primarily at the earlier epochs: MJD 56400-57000 at pulse phase Fig. 3. Pulse-shape variations for PSR J2219+4754.Shown are the profile-shape differences between the observations and the standard template, which is indicated by the dashed horizontal line at MJD 57161.The vertical line indicates the position of the peak of the pulse profile.For a clearer view, only one observation is plotted for every four-week interval.The pulse profiles used for this plot were fully averaged in frequency and downsampled to 256 profile bins (to reduce the radiometer noise in the plot).The differences between observations and standard template were calculated after peak-normalising and aligning the profiles by their leading edge and peak.This figure is consistent with Figure 4 of Michilli et al. (2018), who used a slightly extended dataset.
elongations of ≤ 2% from the pulse peak.This is in contrast to the lower-lying 'echo' images analysed by Michilli et al. (2018), which have much lower amplitude (and consequently less impact on timing or DM measurements) and lie at pulse longitudes further away from the pulse peak (> 2% of a pulse period).
Figure 3 shows the profile-shape differences of our observations with respect to the template observation.This clearly shows the excess power in the trailing edge at early epochs, with an amplitude that decays in time.At the epoch of the template profile, the scattering reaches a minimum and the pulse-profile is stable henceforth, with the exception of radiometer noise and the lower-lying variations documented by Michilli et al. (2018).To more quantitatively evaluate this evolution, the maximum and Fig. 4. Maxima and minima of the profile-shape differences shown in Figure 3. Error bars indicate 2.88 times the off-pulse RMS (since for Gaussian noise there is a 1/256 chance of a measurement falling more than 2.88σ away from the mean of the distribution and the pulse profile differences considered here consist of 256 bins).The profile-shape distortion due to scattering is clearly visible before MJD 57000 with a residual amplitude that decays linearly in time.Between MJDs 57000 and 57600 the profile differences are not fully consistent with Gaussian noise, but are stable.After MJD 57600 some further, lower-level, variations occur.minimum values of these profile-shape differences are plotted as a function of time in Figure 4.Here the linear decay of the profile residuals between the start of our data set (MJD 56436) and MJD 57000 is clearly seen.Noting that the DM impact of the worst-scattered profile (as quantified above) is 1.5 × 10 −4 cm −3 pc and that the amplitudes of any profile-shape variations after MJD 57000 are lower by at least a factor of five (or more), we do not expect DM corruptions at levels beyond 10 −4 cm −3 pc beyond this date.For dates before MJD 57000 we consider a potential DM overestimation with amplitude 1.5 × 10 −4 cm −3 pc at MJD 56570 and linearly decreasing to zero by MJD 57000.

Frequency-dependence of DM
The DM time series described earlier (Figure 1, Section 3.2) was derived by carrying out a standard least-squares fit to ToAs from the various frequency channels of any given observation, using the tempo2 pulsar-timing software.During these fits, the median reduced χ 2 value was 2.62, indicating that either the input ToA uncertainties were underestimated, or that unmodelled (i.e.frequency-dependent) structure was present in the post-fit timing residuals.Non-unity reduced χ 2 values are not abnormal in pulsar timing since several reasons for inaccurate estimation of ToA errors could be present (see Verbiest & Shaifullah 2018, for an extensive review).Typically, however, these effects only cause reduced χ 2 values that are lower than two (Verbiest et al. 2016).A more astrophysical potential cause, which is expected to be most pronounced at low frequencies, is chromaticity of the observed DMs.
As described in Section 3.2, since our timing is based on a data-derived template, we are only sensitive to differences in DM between the observation and the template observation and hence mere 'DM chromaticity' would not be visible in our analysis.However, if this frequency-dependent DM would be variable in time (as might be expected given the significant changes in the overall DM), then any non-ν −2 dispersive effects should be equally time-dependent.
Figure 5 shows the time series for the DMs measured from the top part centred at 169 MHz) and bottom part (118-149 MHz, centred at 133 MHz) of our band, along with the difference between these.The frequency-dependence of these DM measurements is highly significant and exceeds the maximal impact of the profile-shape variability, as quantified in Section 3.3, by an order of magnitude.

Structure function analysis
Given the strong and significant DM variations we detected -as well as the frequency-dependence of these measurements -it is worthwhile to evaluate the overall structure of the IISM towards this pulsar in order to verify whether the DM variability could be explained by standard IISM turbulence or not.To this end, we compute the structure functions of the DM variations shown in Figures 1 and 5 and compare these to a Kolmogorov turbulence density spectrum, which is known to usually be a good approximation for the IISM density spectrum in general (Armstrong et al. 1995;Keith et al. 2013), although deviations have been reported (see Gupta 2000, for a review).The structure function at a given time lag τ is derived from the DM time series using the following equation: using a weighted mean (i.e. the ∆DM values were weighted by 1/(σ 2 DM(t+τ) + σ 2 DM(t) )).The uncertainties of D DM (τ) are derived through Monte-Carlo simulations, by varying the DM time series according to the DM measurement uncertainties and thereby identifying the 68% confidence intervals of D DM (τ) over 10,000 simulations.
Figure 6 shows the structure function of the DM time series presented in Figures 1 and 5. Due to the high observing cadence with the GLOW stations, the shortest lags we sample in our structure function go down to a few days.Previously published structure functions of DM time series sampled time lags down to 10s of days (You et al. 2007;Jones et al. 2017) or about 100 days (Keith et al. 2013), so with our dataset we extend the range at which Kolmogorov turbulence has been tested in DM time series by one order of magnitude.
We find that the structure function of the DM time series agrees extremely well with a Kolmogorov spectrum.This conformity seems to corroborate the findings by Pen & King (2012); Pen & Levin (2014) that ESEs may actually be part of the largerscale IISM turbulent structure rather than separate, localised, events.
Using Eq. 28 of Lam et al. (2016) for Kolmogorov turbulence, neglecting Earth's motion and assuming a constant amplitude of the electron density power-law spectrum (C 2 N ) along the ling of sight, the latter can be directly related to the amplitude of the DM structure function: with the proper motion µ and the pulsar distance z p .From the Kolmogorov power-law fit to the structure function for time scales up to 200 days, we get With a proper motion of 22.2 mas/yr (see Michilli et al. 2018) and a pulsar distance of 2200 pc (Cordes & Lazio 2002) 8 , we get C 2 N = 0.9 • 10 −3 m −6.67 .This is in close agreement with the findings of Armstrong et al. (1995), that C 2 N = 10 −3 m −6.67 fits a huge range of spatial scales and thus supports the idea that the observed variability is part of the general IISM turbulence rather than a stand-alone ESE.

Origin of the DM variability
Both the amplitude and shape of the DM variations shown in Figure 1 are reminiscent of ESEs presented elsewhere in the literature (e.g.Cognard et al. 1993;Coles et al. 2015) and interpreted as individual lenses of ionised matter.In addition to this, the scattering events seen in this pulsar and discussed in our companion paper (Michilli et al. 2018) are consistent with a number of contained, refractive lenses near the line of sight.Nevertheless, the structure function (Figure 6) is fully consistent with a Kolmogorov spectrum.In the following, we will consider a simplistic model based on three individual, spherical interstellar lenses in order to compare the required lens sizes and densities to those previously published for ESEs. 9We presume these lenses to be the cause of the DM excess around MJDs 56600 and 57000.The steep drop-off in DM towards the end of our data set will be disregarded as further monitoring of this pulsar's DM time series is required for this event.(Particularly the apparently third ESE which seems to only affect the lower-frequency part of our data deserves further analysis and continued observations.) In order to identify the presumed ESEs clearly, we define a baseline DM level of 43.482 cm −3 pc, which is the average DM value from MJD 57100 to MJD 57200.This value is chosen as reference because it describes the only time window in our data where no variations in DM are observed.It is also used as the reference value for all plots.The reference observation against which the timing was performed was also selected from within this MJD range.We furthermore only consider the DM peak near MJD 57000 because this is the most easily identified and the differences with the other potential ESEs are well within the uncertainties of our model, so similar results can be considered to hold for all three presumed clouds.
We will now model the aforementioned DM peak, assuming a spherical, homogeneous cloud of ionised gas to be causing it.The time from the maximum to the end of the peak is 150 days, so the total duration of the cloud passage is estimated to be 300 days.Using a proper motion of 22.2 mas/yr (see Michilli et al. 2018), the angular size of the cloud can be calculated as θ = 18 mas.To estimate the physical size, the distance to the cloud is needed.This distance can only be estimated, but it has to be lower than the distance to the pulsar, which is estimated to be 2.2 kpc (Cordes & Lazio 2002).Combining the angular size of the ESE and the distance to the pulsar, the maximum size of the cloud (if it was directly in front of the pulsar) is 40 AU.
Given the assumption of a spherical object, the maximum path length through the cloud is equal to its lateral extent, and from this path length and the maximum DM increase one can https://www.nrl.navy.mil/rsd/RORF/ne2001/.Recently, Yao et al. (2017) published a new Galactic electron density model (http://119.78.162.254/dmodel/), which gives a consistent distance estimate of 2.4 kpc. 9 We cannot analyse potential correlations between scintillation and DM variability like Coles et al. (2015) did because the scintillation bandwidth is smaller than our frequency resolution.
Article number, page 7 of 11 calculate the average excess electron density in the cloud.From the upper limit on the cloud size follows a lower limit on the average extra electron density, which is 15 cm −3 .Compared to a typical electron density in the Warm Ionised Medium of 0.1−0.5 cm −3 (Haverkorn & Spangler 2013), our model suggests an overdensity of about two orders of magnitude.The solid lines in Figure 7 show the estimated cloud size and its electron density depending on its distance to the Earth.Some values from the literature of similar estimates are added for comparison: the first ESE observed by Fiedler et al. (1987, without distance estimate), an ESE observed in flux density and timing residuals modelled as two clouds of identical size by Cognard et al. (1993), the threeyear-long ESE observed in flux density by Maitia et al. (2003, without uncertainties), and the models of two ESEs observed by Coles et al. (2015).These latter two ESEs were rescaled to be compatible with pulsar distances based on the parallax measurements of Ng et al. (2014) and Reardon et al. (2016), corrected for the Lutz-Kelker bias following the analysis by Verbiest et al. (2012) as corrected by Igoshev et al. (2016).
When comparing the size and density estimates to typical values from the literature, this particular ESE could be anywhere along our line of sight without falling out of the sample, as the densities and sizes in the literature span the entire range of possi-ble values (with an exception for clouds that are very close to the Earth as the implied density would become unrealistically high).In particular a cloud roughly halfway to the pulsar (in agreement with Michilli et al. 2018) with a size of about 20 AU and a density of a few tens of electrons per cm 3 would be highly comparable to previously observed ESEs.All calculations for this simple model are only very rough estimates, as it is impossible to disentangle the different components of the DM time series and to find the correct DM baseline.It is, for example, highly likely that there are multiple clouds of various sizes, which overlap (see also the discussion by Michilli et al. 2018).The steepest DM decrease, which starts around MJD 57000, can be interpreted as the edge of one cloud and lasts about 75 days.Estimating the cloud to be located half-way to the pulsar, this edge is about 5 AU thick.This is roughly of the order of what Brisken et al. (2010) found for elongated filaments, so it could be the case that the filaments they see are actually the edges of ionised clouds (see also Pen & Levin 2014;Liu et al. 2016).  5.The spatial scale is estimated for turbulence half-way to the pulsar, using the proper motion published by Michilli et al. (2018).The Kolmogorov turbulence model is characterised by the structure functions of simulated DM time series derived from a power spectrum with spectral index 5/3.The dotted lines represent the sample average and 1σcontours of 1000 iterations.The amplitude of the Kolmogorov model was scaled to fit the first 200 days of the data, because the structure function becomes very uncertain at the longest lags.

Origin of the DM frequency-dependence
In the following, we will assume that the observed DM difference was caused by deviations from the dispersion law (Eq.1) due to invalidity of the assumptions that ν ν p or ν ν c .The observed variability of ∆DM of the order of 10 −3 cm −3 pc between our two bands centred at 133 MHz and 169 MHz corresponds to an additional time delay of the order of ∼ 200µs, which is smaller than the total dispersive delay by a factor of about 5e-5, so the variability of the sum of T 1 and T 2 would have to be of the order of 5e-5 as well (see Eq. 5) if we assume a homogeneous medium along the line of sight.To relate T 1 to an electron density, we insert Eq. 3 into Eq.6 and solve for n e .To relate T 2 to a magnetic field strength along the line of sight, we similarly insert Eq. 4 into Eq.7 and solve for B = B cos γ: T 1 or T 2 of the order of 5e-5 would imply variations of the electron density of the order of 10 4 cm −3 , or variations of the magnetic field along the line of sight of the order of 10 −3 G.This is equivalent to the DM and RM varying by many orders of magnitude of their actual value.As shown in this paper, the DM only changes by fractions of its total value.A detailed analysis of the RMs in these data is in progress and will be published in due course; but any RM variations towards this source are minimal and would not satisfy this scenario, either.
As DM chromaticity is usually not observed, a localised structure as the origin seems far more likely.If a structure corresponding to a DM excess of the order of 10 −2 cm −3 pc was causing the observed variations of the frequency-dependence (as expected from the DM time series in Fig. 1), this would require variations in T 1 + T 2 of the order of 0.2, implying electron density variations of the order of 4 • 10 7 cm −3 , or variations of the magnetic field strength of the order of 3 G.To only contribute 10 −2 cm −3 pc to the DM, the structures would need to be 5e-5 AU thick, which is in strong contradiction with the long duration of these variations.Specifically, a 5e-5 AU-thick region with an electron-density excess of the order 10 7 cm −3 could be generated by a chance alignment of a star with the line of sight, but given the rapid spatial motion of the line of sight, such an alignment would pass quickly.Michilli et al. (2018) also discussed the potential impact of stellar winds on the observations of this pulsar, but in that scenario the offset of the relevant star to the line of sight is too far to allow the mechanism proposed by Tanenbaum et al. (1968) as an explanation for the frequency dependence of the DM since the size and density of the stellar-wind bubble along the line of sight would not match the predictions derived above.
As these scenarios are highly unrealistic, we conclude that deviations from the dispersion law as described by Tanenbaum et al. (1968) are not the cause of the frequency-dependent DMs we detected.We therefore favour the explanation from Cordes et al. (2016), that refractive effects lead to a frequencydependence of the medium the pulsar radiation passes through.While the DM time series at 133 MHz does on the whole look smoother than the variations at 169 MHz (as expected from the analysis by Cordes et al. 2016), at a few epochs (e.g.around MJD 57100 and shortly after MJD 57200) the lower frequencies show more dramatic DM trends, which may be in tension with the theoretical expectations of this model.

Consequences for high-precision pulsar timing
Variations in the DM that cannot be accurately and precisely measured and modelled are a problem for pulsar timing, as they add a time-dependent extra delay to the ToAs.If the DM variations we reported here were to occur along the line of sight to pulsars used in high-precision timing experiments, they would corrupt astrophysically relevant parameters and would significantly reduce sensitivity to interesting signals (see, e.g.You et al. 2007) if their impact on the data was not removed by, for example, measuring the DM at every epoch and correcting for it.This correction can only be done with simultaneous multi-frequency or low-frequency data, which are not always available (see e.g. the first IPTA release, Verbiest et al. 2016).In the potential ESE discussed in this paper, the maximum difference in the extra time-delay across the entire dataset (caused by a DM difference of 6 × 10 −3 cm −3 pc) at the commonly used 21-cm wavelength would imply structures of the order of 13 µs in the timing residuals.This is well above the precision needed for high-precision pulsar timing experiments, which require sub-microsecond precision (see, e.g.Jenet et al. 2005).The ToA difference across a 250-MHz bandwidth centred at 1.4 GHz due to the extra dispersion is 2 µs, so the ToA precision has to be substantially better to properly measure and correct the impact the ESE has on the data (we note that the median timing precision in the IPTA is currently 2.5 µs, Verbiest et al. 2016).As shown by Lee et al. (2014, Eq. 12), the precision of this correction would however be about an order of magnitude worse than the ToA precision and averaging the DM values to increase precision is typically not a valid solution because of the usually low sampling rate (see Verbiest et al. 2016, and references therein) and possible short time scales of the variations.Thus, correcting high-frequency observations with DM values measured from that same observation would not suffice to correct for DM variations similar to the ones presented in this paper.
While low-frequency data are very useful in computing highly detailed measurements of DM variations, the chromaticity we have presented does cause concern as it may imply a mismatch between the DM values observed at lower frequencies and those observed at higher frequencies.We note, however, that chromaticity mostly perturbs DM variations on short timescales, whereas the long-term DM trends tend to show reasonable levels of agreement (as suggested by Cordes et al. 2016, and confirmed by our analysis, see the bottom panel of Figure 5).Since the spatial electron-density spectrum in our Galaxy (and therefore the spectrum of DM variations) has more power at lower frequencies (Armstrong et al. 1995), it is particularly the longer-term DM variations that require correction in highfrequency pulsar-timing data, which implies the impact of chromaticity may be limited.Further studies of chromatic DM variations in pulsars like PSR J2219+4754 -particularly studies that extend our present analysis to include a wider range of observing frequencies -should allow more conclusive answers to these questions.

Conclusions
We have presented strong and rapid DM variations along the line of sight towards PSR J2219+4754, which have a similar amplitude as the variations commonly seen in millisecond pulsars (see, e.g.Keith et al. 2013).The variations we reported may be caused by a group of interstellar clouds typically referred to as ESEs.The ESEs in this paper would be some of the longest and most persistent observed to date.This is also the first time an ESE is observed in electron density and scattering at the same time, although the scattering may well be caused by different IISM structures that have only a minimal impact on the DM (for the quantitative analysis of the scattering, see the companion paper by Michilli et al. 2018).
Our frequent observations with the international LOFAR stations allow detailed and highly precise monitoring of the DM time evolution.The high measurement precision makes us sensitive to details, which complicates any efforts to provide an accurate model for the underlying IISM structures.Additionally, there is a large number of unknown parameters like the distance or proper motion of the IISM structures and the DM baseline level.The simple spherical model discussed in this paper does not provide definite values, but does provide limits on the potential ESE's electron density and size, which are of the same order of magnitude as previous results for ESEs.Further observations of ESEs will help to improve the constraints on the size, because the lack of knowledge of the distance to the IISM structures is less of an issue when a larger sample of observations is given, assuming a homogeneous distribution of ESEs in the Galaxy.We furthermore point out that the variations presented here could well be part of a uniform spectrum rather than separate, distinct, structures.
Finally, we have presented the first observational evidence for frequency-dependent DMs and have confirmed that the longterm DM trends are consistent across the frequencies we probed, whereas the shorter-term DM structure is highly chromatic.This bodes well for efforts to apply low-frequency DM time series as corrections to high-frequency pulsar-timing data, although further study across a wider range of frequencies should be undertaken to quantify any potential corruptions such corrections would cause.

Fig. 1 .
Fig.1.DM variations in the direction of PSR J2219+4754.The vertical line indicates the time at which the last change to the centre frequency and bandwidth of our data occurred, i.e. all data to the right of the dashed line have identical bandwidth and centre frequency (except for slight deviations due to variations in RFI excision).The two arrows indicate the observations that were used to quantify the effect of scattering (see Section 3.3), with the latter of these two (MJD 57161) also being the standard template.A DM baseline of 43.48205 cm −3 pc has been subtracted.The additional error bars on the lower end of the pre-MJD 57000 data points indicate the expected impact of profile scattering, as described and quantified in Section 3.3.The second y-axis indicates the corresponding dispersive delay at an observing frequency of 1.4 GHz.

Fig. 2 .
Fig.2.Peak-aligned pulse profiles for a scattered observation on MJD 56570 and an unscattered observation on MJD 57161, which is part of the standard template.The profiles were aligned to have the leading edge and peak (which have identical shape at both dates) to align; and the pulses shown are integrated over the full bandwidth of 71.5 MHz.

Fig. 5 .
Fig. 5. (top) DM time series for the upper and lower halves of the observing band.We note that until MJD 56524, the observing bandwidth was only 47 MHz, which leads to worse DM precision and a smaller DM difference.(bottom) Difference between the DM measured in the top and bottom part of the band.By definition (i.e. through selection of the standard profile) no DM difference is present at MJD 57161.While the long-term DM gradient appears consistent between the two bands, the higher-frequency DM structures are shown to affect the high-frequency band more than the low-frequency band.The second y-axis in both panels indicates the corresponding dispersive delay at an observing frequency of 1.4 GHz.

Fig. 6 .
Fig.6.Structure functions of the DM time series from Figures1 and 5.The spatial scale is estimated for turbulence half-way to the pulsar, using the proper motion published byMichilli et al. (2018).The Kolmogorov turbulence model is characterised by the structure functions of simulated DM time series derived from a power spectrum with spectral index 5/3.The dotted lines represent the sample average and 1σcontours of 1000 iterations.The amplitude of the Kolmogorov model was scaled to fit the first 200 days of the data, because the structure function becomes very uncertain at the longest lags.

Fig. 7 .
Fig. 7.Estimated cloud size and electron density for the second component of the ESE depending on its distance to the Earth, represented as solid lines in the plot.Horizontal lines represent literature values without distance estimates.The vertical line represents the distance estimate for the structures that cause the scattering 'echoes' from our companion paper(Michilli et al. 2018).The cloud size estimates ofCognard et al. (1993) for the two clouds forming the ESE are very similar (0.050 AU and 0.094 AU) and thus indistinguishable in the plot.The error bars on the cloud sizes ofColes et al. (2015) represent the non-spherical character of the clouds and not the actual uncertainty of the cloud size.