Enhancement of Algorithm to Detect Pulse signals

From setiquest wiki

Jump to: navigation, search


We have developed and currently operate the world's best system for searching for extraterrestrial signals. Data collected at the Allen Telescope Array (ATA) is analyzed using open source software SonATA. SonATA looks for strong, narrow signals in the 1 to 10 GHz, terrestrial microwave window, analyzing ~100 million spectral channels (covering 72 MHz) at a time. Further enhancements by the community in SonATA will help us look for different kinds of signals, and the addition of new servers will allow us to process a larger band of frequencies at once.

In this project we are looking for help in recognizing and reclassifying a type of signal that is currently found by our pulse detection software, if it is strong. The intern will complete the algorithm required to detect the new kind of signal, code it in C++, debug and test it, and then work with the current signal detection team to make it run as a SonATA function on the telescope.



This is a hard project, and suitable only for people with interest in applying their digital signal processing background to open source software. It will, however, be very satisfying for someone who is willing to put in the hard work. Clearly, open source community benefits, but good work can also be published in reputed technical journals.

The Project

Signals that are inelegantly called "squiggles" have a distinctive pattern in our frequency vs. time waterfall plots.


Currently, the software aggregates all of the pulse data in this pattern into a single report that characterizes the signal as having large bandwidth.

Band around squiggles.png

This is not a good representation for these modulated narrow band signals. Further, the angles at which these signals turn may give us additional information and allow us to recognize the signal when we detect it again and eventually to identify the source of the signal, which is suspected to be terrestrial rather than extra-terrestrial.

The person doing this project will follow these steps:

  1. Understand the characteristics of this particular signal type.
  2. Find in the literature, or develop, an efficient algorithm to detect and then characterize the details of this signal type (number of changes in drift rate, average time between transitions, opening angle of the transitions, drift rates per transition).
  3. Implement the algorithm in C++.
  4. Test the implementation.
  5. Verify the implementation using real data from the Allen Telescope Array.
  6. Integrate the algorithm into the open source SonATA code, test and debug.

Value to the Open Source Community

The goal of setiQuest project is to improve the search capability operating on the ATA by adding new algorithms to the existing open source software. The new detection algorithm developed by this project may find applications in other fields or analyzing other datasets seeking signals in noise.

Under this project, we are looking for someone to help develop this algorithms, and implement it in C++.

Background Required

You should know C++. In addition, knowledge of communication theory signal processing algorithms will be of great help.

The Mentor

Jill Tarter will be mentor for this project. Jill will be supported by Avinash Agrawal (avinash@seti.org).

Additional Information From Jill Tarter In Response to a Query

it is interesting that sound files give good representations of some of the features of our squiggle signals. eventually, this application will need to run in near-real-time. on the telescope we have a 2-stage pipeline of /collect new data + process previously collected data/collect new data + process previously collected data/..... at the end of each pipeline cycle we decide whether to collect new data + process last data collected OR break the pipeline to followup on signals found in previous processing. so i'm skeptical that complex spline fitting proposal will ever be able to keep up. these signals are found by our pulse detection algorithm that applies a relatively high threshold to the power data, leaving a sparse field in the frequency-time domain, and then looking at every pair of points to see if there is another point in the middle - yields 'triplets' of pulses that are reported into a database. question becomes how quickly does the pulse detection algorithm complete? if there is significant time remaining in the pipeline cycle, then characterization of the squiggle might best be done by reading the database entries, finding inflection points when sign of (Fn+1 - Fn) changes and then making series of straight line segment approximations and calculating the opening angles, with the aggregate report containing as it does now the signal bandwidth == highest frequency pulse-lowest frequency pulse, some average drift rate, estimate of the center frequency, plus the new information on the number of straight line segments, average length of segment, and average opening angle. so 6 numbers to represent the signal vs. current 3. that should allow for better opportunity to 'match' this signal the next time an example is found, as a way of recognizing interference. the actual information in the numerical values may help us identify the source of the signal - Doppler from spin-stablized spacecraft, encoded dithering rate, radar sweep rate ??????

successful algorithm doesn't need to run in near-real-time from the start, but we need to think about the choice of implementation that might be made fast enough in the end. there may be suitable characterization schemes in signal processing literature, since dithering and radar sweeping are common. it may be that with these complex squiggles, the pulse detection process doesn't complete until near the end of the cycle, with only time for crude bandwidth and drift rate aggregation possible, and no time left to query database entries for more info. in that case, a new algorithm working directly on the waterfall data matrix of intensity for frequency vs. time needs to be developed, and we need to hope that the DSP world has been there before us :-) .

i'm sure this is all very confusing --- we've spent decades developing the algorithms for finding narrowband signals quickly. it may help to study the information on the SonATA system in the setiQuest wiki http://setiquest.org/wiki/index.php/SonATA. the information you really need is about the interaction between the Seeker and Archiver, and that's just plain missing. Avinash may be able to help me look for documentation. or we'll just have to ask the living documents.

Some data to explore

Epsilon Eridani squiggle example from AWS stored datasets

Rob Ackermann has been working on the data that have been stored on AWS as a part of our setiQuest effort. He has used the channelizer and DX from our SonATA code (see the SonATA Overview description) and FFT’d the large V(t) raw datasets that are stored. The stored datasets cover 8 MHz of bandwidth (6.5 MHz of which is used – the rest is discarded due to roll-off) and the FFT process eventually generates data for waterfall plots with frequency channel resolution Δν ~ 1 Hz, and time resolution Δt ~1 second.

To access the stored voltage data, go to the all-sources page[1] and scroll down to the row that represents data taken near 1420 MHz while pointing at Epsilon Eridani on 2010-11-06 (actually a pointing error meant that weren’t looking at Eps Eri, but for this exercise, that is irrelevant) - the row is labeled: 2010-11-06 epsiloneridani_1420_1 (Dorothy obs.) If you click on ‘data’ you will be presented with a list of 16 data files .dat, a header file, a metadata file, an ephemeris file, and a readme file. The readme file [2] explains how to interpret the raw voltage data in the .dat files, which simply chunk the long observation into manageable time sequences for easier download.

According to Rob Ackermann, for this particular example of a squiggle signal

EpsEri squiggles.png

the event occurred early in the observation so that only dat file 1,2, and 3 are needed: Here are links to the three data files of which when concatenated contain that particular squiggle image:

For a better understanding of our narrowband processing, click on the ‘spectrum/waterfalls’ entry for this dataset. You will be presented with an average power spectrum (over 8 MHz) for the entire observation.

EpsEri powerspectrum.png

The large hump in the middle is the emission from neutral hydrogen gas in this direction. The frequency resolution of each point in this power spectrum is ~266 Hz (the average over 256 channels at our ~1Hz resolution). Note the tiny bump on the left hand side near channel # -1024. If you have good hand-eye coordination and move the cursor over that bump you will be presented with a screen of 8 waterfall plots. By using ‘previous’ and ‘next’ buttons at the bottom, you should be able to find the set with the squiggles as above (feel free to look at the rest of the data or any of the other examples on AWS). Each of those waterfalls has 256 frequency channels (each 1.04 Hz) on the horizontal axis and 340 time samples (each 0.96 sec) on the vertical axis. If you sum up all the power in each waterfall plot (and normalize) you will have the value of one of the data points in the average power spectrum above. Perhaps you are beginning to see just how much data there is.

You could convert these .png images to a file of intensity vs. frequency and time with reasonable accuracy using a utility like ImageMagick "convert", but it is probably better to work with the data files below (let me know if you want to do this conversion, as i have instructions from Rob). I’m just pointing you to these data from AWS because I think that this may de-mystify the waterfall plots we’ve been discussing. If we decide to build a ‘squiggle detector’, not just something that jumps into action on basis of reports from the SonATA pulse detections, you will need to start with the raw voltage data that’s available in real-time from SonATA.

Rob found these squiggles by looking at a lot of waterfall plots, our detectors find them (if they are strong enough) with algorithm that looks for pulses. Since we haven’t yet built a good ‘squiggle classifier or detector’ (that’s your job!) it isn’t easy to find the raw data to present to you.

Activity 4382 squiggle example from real-time SonATA analysis

Act4382 squiggle.png

Jane Jordan provided this example that I posted in the GSoC project description – it was found by the SonATA pulse detector during observations at the end of January this year, and reported as described by our signal aggregator. Jane has provided the complex-amplitude data (the raw V(t)) for the frequency region in which the squiggle was found, along with a program to read the data file and generate a waterfall plot. She has also provided a listing of the specific pulse detection reports from SonATA for that signal. According to Jane: WaterfallDisplay.java is the program that reads the compamp file, does an FFT and plots the power as a function of frequency and time. 2011-01-31_22-57-29_UTC.act4382.target1-on.dx2007.L.compamp is the complex pairs for power for one subchannel containing the signal. 2011-01-31_23-00-19_UTC.act4382.dx2007.id-0.L.archive-compamp is the complex pairs for power for 16 subchannels with the signal in the 8th subchannel. 2011-01-31_22-57-29_UTC.act4382.target1on.dx2007.expanded.sigreports.txt is the signal report that includes a listing of the individual pulses.

To get this posted as quickly as possible, I’ve uploaded these files to an ftp site for you to download (I’ll do something more elegant with the wiki later today). For now you’ll find what you need at ftp://ftp.seti.org/tarter/GSoC. This should give you something to play with, and manageable file sizes.


Personal tools