Spoofing attacks on GNSS signals become more serious than ever due to the advancement of spoofing algorithms and computational capabilities and processing. These spoofing attacks transmit a counterfeit signal that leads to a wrong positioning, timing and navigation (PNT) solutions.
Meanwhile, similar to spoofing effects, evil waveform (EWF) attacks, although very rare, can have a severe impact on PNT solutions. However, instead of originating from an attacker like spoofing, EWF is originated from a genuine satellite that has a failure in the transmission payload of the satellite.
To detect and mitigate these spoofing and EWF attacks, research in these topics is very important and urgently needed. And, to support research in these topics, reference public datasets are needed for developing spoofing and EWF detection and mitigation algorithms.
These datasets can be freely downloaded from publicly accessible links. With the same reference datasets, difference developed algorithms can be fairly compared among each other so that the performance of algorithms can be correctly assessed.
By the end of this post, we will be aware of available public dataset for signals with spoofing or with EWF and will know what the datasets are about as well as where to download the datasets.
The datasets are TEXBAT, OAKBAT and ESA EWF dataset.
TEXBAT spoofing dataset
TEXBAT datasets are developed by Prof. Todd Humphreys and his research group at The University of Texas at Austin . This dataset focuses on overlapping (aligned) spoofing attack.
The datasets contain six spoofed GPS L1 signals (ds1-ds6)  and additional two datasets (ds7 and ds8) . The authentic GPS signals in the dataset are recorded authentic signals. Meanwhile, the spoofing signals are simulated with parameters estimated from the authentic signals.
The datasets are obtained by combining or mixing the authentic GPS signals with the simulated spoofing signals.
The six datasets contained various scenarios of spoofing such as, static and dynamic scenarios, position-push or time-push spoofing attacks. Table 1 shows the description of the six datasets.
The duration of the signal for each scenario is approximately 10 minutes.
The full bandwidth of GPS L1 signal is about 30 MHz. This full bandwidth is known by capturing GPS L1 signals with a high gain (52dBi) 46-meter-diameter radio telescope antenna with sampling rate of 46.08 Mhz .
Figure 1 below shows the full bandwidth of GPS L1 signals. From figure 1, the power density spectrum shows that the PRN components exists up to $\pm15 MHz$. Full bandwidth means that the captured signals contain 100% of transmitted GPS signal energy.
Based on figure 1, the I/Q sampling rate of the TEXBAT datasets are 25 MHz with 16-bit quantisation. This sampling rate selections are based on considerations as follows :
- Since the antenna has a high-quality front-end filtering, a flat 20 MHz frequency bandwidth response can be obtained at around GPS L1 frequency. This 20 MHz bandwidth captured 99.6% of the total power of the full GPS signals.
- For GPS P(Y) signals, the 20 MHz bandwidth preserve 99.54% of the full P(Y) spectral power.
- With 16-bit quantisation, the signal power, spilling out of the original band, can be minimised. Hence, broad-band noise power and C/N0 reduction can be minimised. Commonly, 4-bit quantisation will be enough to have only C/N0 reduction of 0.14 dB. However, with 16-bit quantisation, a high dynamic signal can be accommodated so that weak signals can be recovered from the data (it could be an important aspect for some spoofing detection and mitigation algorithms to work).
- Finally, 5 MHz reduction of sampling rate form 30 MHz to 25 MHz will significantly reduce the size of recorded spoofed signals. Hence, saving computational power and data storage requirements.
Figure 2 below shows the infrastructure of the experimental setup to generate TEXBAT spoofing data sets and an example of the effect of the spoofed signals to PNT solutions.
In figure 2, the authentic signals are real signals recorded from GPS satellites. Meanwhile, the spoofer signals are simulated and generated by estimating the authentic signals parameters (code delay, Doppler, power and other parameters). Then, the authentic and spoofer signals are combined and recorded into files.
From the overlapping (aligned) spoofing effect examples in figure 2 (using ds3 dataset), the overlapping (aligned) spoofer signal, with respect to the authentic signal, causes increase in power, distortion on the shape of the correlator output (deviate from a perfect triangle shapes) and errors in the calculated position (X,Y and Z).
The other two additional spoofed signals datasets are ds7 and ds8 .
The ds7 spoofing scenario is an extension based on the power-matched and time-push like ds3 scenario. This ds7 dataset employs carrier phase alignment between the spoofing and authentic signals . Hence, ds7 dataset is the enhanced version of ds3 dataset.
Meanwhile, the ds8 spoofing dataset scenario is an extension based on the ds7 dataset. This ds8 dataset represent a zero-delay security code estimation and replay (SCER) attack. This attack treats every received navigation data bit as an unpredictable low-rate security code and guesses the value of the data bit in real time .
The TEXBAT dataset (GPS L1) can be downloaded from here: https://radionavlab.ae.utexas.edu/texbat/
OAKBAT spoofing dataset
This dataset were produced by the team at Oakridge national laboratory and replicates and inspired by the TEXBAT dataset, especially the overlapping (aligned) spoofing scenarios . This OAKBAT dataset contains six spoofing scenarios similar to TEXBAT datasets.
The main contributions of this dataset with respect to TEXBAT datasets are as follows:
- The main idea is to use a common off-the-shelf commercial simulator to generate the OAKBAT dataset so that anyone can replicate the generation of the spoofing signals.
- This dataset also provides spoofing dataset for GALILEO E1 signals (TEXBAT datasets only contain GPS L1 signals).
The sampling rate of the I/Q files in OAKBAT datasets is 5 MHz with 16-bit quantisation. With 5 MHz sampling rate, the computational power and storage requirements will be reduced significantly compared to sampling with 25 MHz of data (as if in the TEXBAT datasets). However, the quality of data will be less than the ones sampled at 25 MHz (TEXBAT datasets).
The summary and comparison of the OAKBAT dataset with respect to TEXBAT dataset is presented in table 2 . Similar to TEXBAT datasets, the duration of the signal for each scenario in OAKBAT datasets is approximately 10 minutes.
Figure 3 below shows the experimental setups to generate the OAKBAT dataset with commercial off-the-shelf GNSS signal simulators. With the use of commercial simulators, anybody can replicate and create their own spoofing datasets.
In figure 3, two commercial GNSS signal simulators (generators) are used. one is used for generating authentic signals and the other one is used for generating the spoofing signals. These two signals are then combined to create the spoofed signal datasets. The generated spoofed signals are then captured and recorded by an USRP X310 software defined radio (SDR).
In addition, figure 3 above also shows an example of the spoofing effect with respect to the calculated position and position error with respect to X,Y and Z locations. The shown example uses OS1a dataset, that is equivalent to TEXBAT ds1 dataset.
The OAKBAT dataset (GPS L1) can be downloaded from here: https://doi.ccs.ornl.gov/ui/doi/100
The OAKBAT dataset (GALILEO E1) can be downloaded from here: https://doi.ccs.ornl.gov/ui/doi/102
ESA evil waveform (EWF) dataset
This EWF datasets were produced by the European Space Agency (ESA) in their navigation laboratory .
EWF signals is a faulty authentic GNSS signal that is caused by the failure in the GNSS satellite payload (the signal processing steps in a satellite transmission chain: from signal generation and modulation in digital domain until signal transmission with a transmission carrier in analog domain). This failure causes defected pseudo-random number (PRN) and/or baseband signals.
Although the occurrence probability of an EWF signal is very rare, EWF attacks have happened before. And, this attack may will also happen in future so that the development of algorithm to detect and mitigate EWF signals is important.
The most famous EWF signal was reported in 1993 on PRN 19 and caused “undetected” position errors up to 8 m. This error is unacceptable for critical applications such as in airplane guidance .
The EWF datasets, developed by ESA, are sampled at 90 MHz in float32 I/Q format . The datasets contain EWF signal for GPS L1, GPS L5 and GALILEO E1.
The reasons of using very high 90 MHz of sampling rate are:
- The correlator distortion caused by EWF signal is very detailed. Hence, a high-quality signal reconstruction by using a very high 90 MHz sampling rate is needed. That is, we want to capture or observe the correlator output distortion at chip level.
- The high sampling rate is required in order to generate code delay less than 0.12 chip. If the code delay is larger than 0.12 chip, then the EWF effect can be detected by GNSS monitoring stations around the world.
The ESA EWF signals have chip delay (either lead or lag) $\leq 0.12 chip$ .
Figure 4 above show the effect of the EWF on the correlator output for GPS L1 and GALILEO E1 signals.
In figure 4, we can observe that the EWF will cause a fine or detailed distortion on the correlator output of GPS and GALILEO signals (during signal tracking processes).
These distortions will affect the accuracy of GNSS tracking algorithm and can cause error in pseudorange estimations (leading to error in calculated positions).
The ESA EWF dataset can be downloaded from here: https://www.esa.int/Enabling_Support/Space_Engineering_Technology/Radio_Frequency_Systems/Evil_WaveForm
Spoofing and EWF attacks (although very rare) can cause a very severe effect. Because these attacks can cause a calculated PNT solutions to be significantly different with respect to authentic signals. For example, a wrong location or a wrong trajectory can be obtained that may lead to any major accidents or missing critical assets.
In this post, the main public dataset used for spoofing research are TEXBAT and OAKBAT. Meanwhile, for EWF datasets, the public dataset is ESA EWF dataset. These datasets are publicly downloadable and are free. The main descriptions and characteristics of these datasets have been discussed in this post.
These three datasets are important to know. They are used as reference to develop spoofing and EWF detection and mitigation algorithms as well as used for fair algorithm performance comparisons.
 Humphreys, T.E., Bhatti, J.A., Shepard, D. and Wesson, K., 2012. The Texas spoofing test battery: Toward a standard for evaluating GPS signal authentication techniques. In Radionavigation Laboratory Conference Proceedings.
 Humphreys, T., 2016. TEXBAT data sets 7 and 8. The University of Texas.
 Albright, A., Powers, S., Bonior, J. and Combs, F., 2020, September. A Tool for Furthering GNSS Security Research: The Oak Ridge Spoofing and Interference Test Battery (OAKBAT). In Proceedings of the 33rd International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2020) (pp. 3697-3712).
 D. Gómez-Casco, P. Crosta, M. Spangenberg, “Validation of Evil WaveForms in a GNSS Simulator for GPS and Galileo Signals”, Proc. 10th ESA Workshop on Satellite Navigation User Equipment Technologies (NAVITEC), April 2022.
 Edgar, C., Czopek, F. and Barker, B., 1999, September. A co-operative anomaly resolution on PRN-19. In Proceedings of the 12th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GPS 1999) (pp. 2269-2268)
You may find some interesting items by shopping here.