Yair Franco is a student at University of California, Berkeley currently completing research at University of Texas at Austin under Nadine Igonin & Akram Mostafanejad.
The Valles Caldera is a dormant volcanic complex in northern New Mexico, whose main feature is a 15-mile wide crater left behind by a massive eruption around 1.2 million years ago. While it is inactive now, its last eruption was quite geologically recent, which presents a possibility for future activity. Earthquakes are one way of gauging the level of activity in a volcano. I will be using data from seismic stations installed by students and faculty of SAGE in 2019, and apply machine learning methods to find and analyze current seismicity in the Valles Caldera.
(Figures are also available in this Google Drive folder)
Since my last post, I have continued my search for earthquakes in the Valles Caldera using the results from the earthquake detection algorithms. This process is quite tedious, especially given that I got a lot of false triggers from thunder and other sources of noise. The caldera has a lot of activity, both human and natural. The geophones could detect cars rolling, people hiking, elk grazing, bears roaming, and prairie dogs scuttling aronud. All of these potential triggers make it hard to find earthquakes by making my plots look like this:
Figure 1 - Fifteen days of triggers. I’ve got a funny feeling that not even 90% of these are even earthquakes.
I decided to keep doing what I did before finding the lightning strikes: zoom in on a particular time period and take a closer look at the triggers there. This time I would keep in mind the times when monsoon storms usually form, and avoid those. One day of triggers looks like this:
Some days had STA/LTA triggers with many wave arrival picks from Seisbench, which I thought was a good sign that both methods agreed that they might have found an earthquake. I took a closer look at those to see what caused those triggers. Some of them even had clear arrival times for P and S waves, with the triggers for each wave type separated by a few seconds.
Not all of these isolated picks had the wave types in the correct order, and accordingly, the data did not show something that looked like an earthquake.
Some other picks looked far more promising.
And I was very excited to find that the waveforms at these times showed what looked like two earthquakes to me.
At a first glance, it may look like this answers the main question my research is trying to answer, which is whether the Valles Caldera still has active seismicity. Just looking at this one might be tempted to say “yes,” but, as I’ve found to be a recurring theme during this research, the answer is not that simple.
While the data you record and analyze is important for answering questions, it is just as important to know how that data was recorded. I’ve talked a bit about the geophones whose data I’m using, and where in the caldera they were placed. Knowing their position has been a very useful tool in learning more about the signals the geophones detected, especially combined with knowledge about how seismic waves travel from an earthquake.
I looked at the first earthquake closer, and noticed that the P and S-wave arrivals had around a 20-second difference. This corresponds to a distance of about 200 km from the epicenter, based on the difference of P and S-wave velocity. That means that this earthquake did not occur in the caldera.
Figure 7 - This earthquake is pretty far away from the caldera.
An even closer look at the wave arrivals reveals a bit about where the earthquake came from. Most of the geophone arrays were installed in a straight line, meaning that there would be a visible delay between each station as a wave travels through the line (at least if it does so perpendicular to the line. If the wavefront is parallel to the stations, it will hit them all at the same time and there will be no visible time difference. This is probably how the lightning strike I analyzed a few weeks ago appeared to be way faster than it really was). This delay can be seen in the 4xx stations, which are oriented approximately north-to-south, with the southernmost station being 401.
This means that this earthquake might have originated 200 km to the north of the caldera.
Since they were so far away and still detected by the geophones, I figured the earthquakes would be large enough to have made it onto the USGS catalog. And sure enough, they are.
Figure 9 - Screenshot from the USGS earthquake catalog showing both of the earthquakes that were detected by the geophones on this day.
So, I did find an earthquake, just not one in the caldera. This is still a very good find, though, because I can get a rough idea of the P and S wave velocities in the caldera, which will then be useful in getting an idea of earthquake distance with more accuracy. This is because wave velocities near the surface can vary by area, depending on the composition of the rock through which they are traveling. The caldera has a very complex geology, so getting a good idea of the velocities within the caldera will be very useful. In the future, a more complete ground velocity model can be used to pinpoint where an earthquake occurred just by using seismic data. For now all I have are back-of-the-envelope estimations.
Finding this earthquake also proves that my strategy of looking for detections with wave detections with the correct P-S order would be a fruitful way to find earthquakes.
After this, I continued to search for earthquakes, and found a few more with distances generally above 30 km away from the caldera. These are what I’d categorize as “regional” earthquakes.
After some searching, I found an earthquake with an arrival difference of around one second, which corresponds to around 4 km of distance from the hypocenter. Note that this is to the hypocenter, the underground source of the earthquake, rather than the epicenter, the point on the surface directly above it. Arrival differences account for the time it takes for the waves to travel from their source underground all the way to the surface. These 4 km could be a combination of depth and lateral distance, or purely vertical distance. Given the volcanic structure of the area, a depth of 4 km, which is quite shallow for earthquakes in general, makes sense.
Figure 10 - Earthquake detected on June 11, 2019, located very close to the geophones. It is most visible on the 4xx stations (lowermost on graph).
Using the same logic as before to get an idea of where the earthquake originated, it looks like this earthquake came from within the caldera, as it arrived at a station in the middle of the 400 array first, rather than one of the two ends.
With all this, I have finally found an earthquake that came from within the caldera. My question is technically answered now: there are still earthquakes in the Valles Caldera. But this is also just the beginning of the answer. Knowing there was an earthquake is cool, but it would also be really helpful to know where exactly it happened, how often earthquakes occur here, its magnitude, its focal mechanism, the list goes on. All of this would give us a much greater idea of what is going on under the surface of the caldera.
It’s also very exciting for me, because I now have something to talk about before I finish my stay in Texas on the 12th.
(Figures are also available in this Google Drive folder)
My analysis of the 2019 geophone data has gotten pretty interesting quite quickly. After running the methods for around five days worth of data I got some promising results. Both the STA/LTA method and Seisbench were giving me lots and lots of detections. (Also, my mentor and I have decided to put a pin on EQCCT, the newest method developed here at TexNet, since the way it analyzes and returns data is more appropriate for smaller datasets; ours is around 300 GB and has proven too big).
Upon seeing this graph, I was sure that I would have plenty of earthquakes to work with:
Figure 1 - This graph shows detections from both methods in time, with probability on the y-axis where available. STA/LTA does not return probabilities, so they are just vertical lines.
One thing that might seem odd is that activity tends to come in clusters. You can see multiple times throughout just the five days of data shown above that there are times where there are dozens of detections within an hour or two from each other. These could be earthquake swarms.
I thought it would be useful to zoom in and take a look at the waveforms within one of these clusters. I have been working on a way to display the data from stations analyzed (which is only a subset of the full set of 58 stations, to make processing a little quicker) and plot their corresponding detections on top of the waveforms.
Figure 2 - A closer look at the detections in one of the clusters.
Figure 3 - A look at all the waveforms from that cluster, showing which picks correspond to which station. It's a little blurry—a consequence of trying to fit these huge images into a blog post.
It is clear that the geophones are picking up more than earthquake signals. For example, the last 10 stations, the 400s, are picking up some large signal, but it takes minutes to move from station to station. These are around a kilometer apart each, so they cannot be earthquake waves. My guess is a car drove through the road these stations are close to.
From this time scale, it’s not very clear whether any signal you see is an earthquake or something else. So it helps to zoom in at a much smaller time scale to see individual signals in greater detail. I got this very nice and clear signal from a different cluster of detections.
Figure 4 - A much closer look at a single event detected by almost all the stations—this one came from a different cluster. Note that the signal appears to have a very clear P and S wave arrival in most of the stations, which is a good sign that it is an earthquake.
After looking at this, I was quite convinced that I was looking at an earthquake. There were clear P and S wave arrivals, and they had some separation as they moved through different stations, especially the 100s, 200s, and 300s, which are located in the Valle Grande area of the caldera. This area is a valley of thick, relatively loose sediment that would amplify and slow down seismic waves, meaning that the arrivals would separate as they travel through the valley. Based on the positioning of the stations, it looks like this earthquake originated from the northwestern portion of the caldera.
So... case closed, right?
Unfortunately, as with most things in life, the immediate answer is not the correct one. Just as it was clear in the previous waveform that the large signal traversing through the stations labeled 400 was too slow to come from an earthquake, this one also turns out to be too slow. This one hides its secret better, though.
If you were to only look at the last few stations in the list, namely 407 through 410, you’d see exactly what you’d expect from an earthquake signal: a P and S arrival, and the signals being no more than a couple of second apart, as the waves travel rather fast.
Figure 5 - These waves arrived at these stations first. They look remarkably like an earthquake signal.
However, the other stations have the waves much more separated. Looking at stations starting with 100, which are spaced a few hundred meters apart from each other, the difference in arrival is also a few seconds. A few seconds to travel a few hundred meters is a velocity domain around a few hundreds of meters per second. P waves travel much faster, usually above 5-6 km/s. Also, the 100s were installed around 13 km to the southeast of the 400s, and the waves arrive at the 100s over 40 seconds after they arrive at the 400s. These very rough numbers give a velocity of around 325 m/s; these cannot be seismic waves.
Figure 6 - Note the time. It has been 40 seconds since these waves arrived at stations 407-410. That’s too slow to be seismic waves…
While 325 m/s is too slow to be a seismic wave speed, it is almost exactly the speed of sound. This signal occurred at 2 PM local time, in late spring. New Mexico is prone to early summer storms from the monsoon, especially high in the mountains, where it's colder. These usually occur during the middle of the day, as the sun heats up the ground and provides energy for storms to form. These range from a brief drizzle to harsh hailstorms with frequent thunder. While out at the Valles Caldera, I got to be right under a nice hailstorm at our campsite, with more thunder than I’ve ever heard in one day—I am from California.
Figure 7 - A picture I took at our campsite in the Valles Caldera, after a hailstorm on June 6, around 6 PM. My hand is next to hail that accumulated on my tent.
Knowing this, I decided to check NOAA’s doppler radar archive to see if there was a nearby storm at that exact time. And sure enough, there was.
Radar image from June 9, 2019, 18:10 UTC (12:10 PM local), showing storms south, and northwest of the Valles Caldera.
As it turns out, a lightning strike has a very similar-looking seismic signature to an earthquake. This is a pretty interesting discovery, and I had a lot of fun brainstorming and figuring it out with my mentor. However, it opens up a 12-oz can of worms. This probably means that all of the clusters of detections I keep finding are not earthquake swarms, but thunderstorms. I would need to find a way to filter these out, or try to ignore anything that is in a cluster. If an earthquake did happen during this time, it would be extremely difficult to find amongst all the lightning strikes. This presents an interesting dilemma with the machine learning methods I am using: they are indeed detecting something, but not earthquakes. At least not all of the time.
For me, it's really fun to think of this as Mother Nature hiding something, like she's making noise to deceive us. Does this volcano hold secrets we aren’t supposed to know…?
I'm writing this on the afternoon of the 18th, as I'm waiting for some STA/LTA code to run. The past few days have been very hard work… for my laptop. It has spent most of the past 36 hours transferring, copying, and filtering seismic data. By “filtering” I mean using some Obspy functions to remove very high and very low frequencies that will not be of use to me, as well as removing the instrument response from the data. I’ll talk more about this later. This filtering process alone took about 24 hours in total. I started on Friday, took a break over the weekend, and on Monday I decided to take the hard drive home and run the code overnight. When I woke up today, it still hadn’t finished! It just finished this morning, and I have finally started to do some real earthquake detection. That gives me time to write a little.
An important part of understanding the data you analyze is knowing where it comes from. I have a hard time remembering the names of the instruments whose data I’m using, so I think it’s a good idea to talk about them a little bit.
The geophones installed by SAGE in 2019 and 2022 are Fairfield ZLAND 3C nodes. They are 3-component instruments and are fairly small, which allowed for a fairly dense installation. This is a map of where they were installed within the Valles Caldera, obtained from the SAGE website:
Figure 1 - Note the size of the arrays in the southeast portion of the caldera compared to the city of Los Alamos to the east!
The instruments I installed in June are Nanometrics Trillium sensors, which are broadband seismometers. One of the main differences between the geophones and broadband sensors is the frequencies they can measure. Broadband sensors can measure a much wider range of frequencies, hence the name. Geophones, on the other hand, are generally more sensitive, but measure a smaller range of frequencies. This is a picture of one of the broadband sensors I installed in New Mexico, as well as a map of where these went:
Figures 2 & 3 - We did a LOT of driving to get to these sites!
One reason why it’s important to know your instruments is instrument response. Instrument response is a little hard for me to explain, but I like to think of it as essentially how the sensor sees the world. We see the world a different way, and to see our data from different instruments in a consistent way, we must factor in for how differently every instrument sees the world than us. Thus, removing the instrument response, which is essentially a function of amplitudes corresponding to different frequencies, is changing the data to a more concrete and “standard” scale. I realize this makes it sound like a grand transformation, but in reality the data looks quite similar, it just differs in scale. Here’s a quick example of a waveform I processed:
Figure 4 - This is a segment of data from station 101’s east-west horizontal component. The waveform above is the raw data, and the waveform below was plotted after the data was filtered. Note that the shape is almost identical; the only big change is the scale of amplitude (as seen on the y-axis).
Obspy has a function that reads an instrument response file and processes the data. This takes a few seconds. However, there are 58 stations in my catalog, with three components each, and 29 files per day for almost all of them. In total, I had 4,821 files to filter individually. Well… not me, my laptop did.
Thankfully, I have now gotten past that and it is time to apply the machine learning methods I have been learning about. To prepare for using them, I read a few papers about machine learning methods. Since I am using multiple methods, no paper is particularly central to what I’m doing. I also found that the best way for me to learn how a method works is to use it. I spent a lot more time messing with code than reading, and that’s where most of my learning happened. I understand the purpose of a paper introducing a computational method, but through all the equations and computational jargon, I think I’m better off learning by using the code myself.
I'm off to make my laptop run more code. I think it'll need a well-deserved break when this is over. I’ll keep reporting my findings here!
I am now beginning my third week at TexNet, in the Bureau of Economic Geology at UT Austin. The first two weeks here have been all about getting ready for data processing. My mentor Nadine and I spent the first week reviewing some important papers and going over the different methods I will be using to process the data. I also gave my first group meeting presentation during that first week. It was a bit hectic; I essentially had a one-and-a-half-day's notice for this presentation. Thankfully, being a Berkeley student, I am quite used to surprise assignments.
During the second week, while Nadine was away from the office for a well-deserved vacation, I focused on trying to get the code working on a small subset of our data, to learn how each method works, how long it takes to run, and if any of them are prone to errors. This may sound straightforward enough, but it turned out that I had my work well cut out for me. All of the code I’m working with is in Python, which is great because it is the only language I am very familiar with as of now. There are three Python-based tools that I will be using: Obspy’s coincidence trigger algorithm, Seisbench, and EQCCT.
The coincidence trigger algorithm uses a detection method called STA/LTA, which stands for “short-term average/long-term average.” To understand STA/LTA, I like to picture a seismogram, and a piece of glass (a “window”) that can slide along the seismogram. Imagine a line near the right edge of the window, dividing it into a short segment and a long segment. The average amplitude of the data inside these segments would be the short-term and long-term averages, respectively. The algorithm calculates the ratio of these averages along the entire seismogram. If the ratio exceeds a certain defined threshold at any point, it will place a timestamp there.
A very rough visual explainer for STA/LTA. Note that this signal does not look like it came from an earthquake. This is a potential downside of using STA/LTA methods.
The coincidence trigger algorithm does this for a set of seismograms. If all (or a defined amount) of them have a trigger at some point, the algorithm will create a timestamp for that event. This method is widely used in seismic detection and has been a standard for many years, due to its relative simplicity. However, thanks to machine learning, new tools have been developed which have proven to have different benefits.
One such tool is Seisbench, a library of machine learning methods for earthquake detection. Seisbench includes pretrained methods, and it also allows users to train their algorithms with their own data. One advantage of having pretrained methods is that it removes the need to spend the time and resources needed to train the algorithms. Of course, we are keeping in mind that this means that the machine will be looking for earthquakes more characteristic of the region it is trained on. One of these regions, for example, is southern California, which has earthquakes of different mechanisms and environments than potential earthquakes in the Valles Caldera. One of our short-term goals is to decide which trained method is most appropriate for the area we are studying.
The third tool we will be employing is EQCCT, which is another machine learning algorithm for earthquake detection, developed here at TexNet by Yangkang Chen, among others. This algorithm is essentially fresh out of the oven. So new, in fact, that Yangkang’s paper is still under review. I myself don’t fully understand how it works; all I know is that it uses “compact convolutional transformers.” Thankfully, I’m not responsible for knowing how to explain that! Still, this is meant to be an even faster, more reliable tool for earthquake detection than existing machine learning methods, so I’m looking forward to seeing what results I can get from this.
Getting all this code to actually work was a fairly arduous process. It required lots of thinking, troubleshooting, thinking, asking questions, thinking, rewriting things, thinking, and thinking. If you’ve ever used Python, then you’ll know that in an ideal world, every package and library you install would work from the get-go. However, this is the real world, and this is never the case. Thankfully, after spending lots of time printing line after line of code to understand how it works, and asking questions directly to the people who wrote some of these methods, I made good progress and I am happy with how I’ve gotten it to work.
At this point I would have liked to have a good portion of the data processed and have a cool and fancy figure to show, but I haven’t gotten quite that far yet. I'll make it another goal to do that within the next couple of weeks. Still, I do have some promising, even if not as fancy, figures to show:
Figure 1. A very blurry image of the full set of stations (installed by SAGE students in 2019) whose data I will be processing. The full image is quite large and Obspy would not allow me to copy something with a readable DPI, but at least it is still fairly easy to see what looks to me like an earthquake signal here. I was able to find this using a timestamp provided by the coincidence trigger function. The time scale is not visible, but this window is looking at around 15 seconds of data. From this it is already possible to tell that the earthquake occurred nearby, and get a sense of its direction of origin by knowing where the stations were installed (though knowing its epicenter more precisely would require a lot more work).
Figure 2. A visualization of event timestamps provided by all three algorithms. This is a very primitive version of what my results may look like. The black rectangles are where, if I had to guess, there were earthquakes. This can be verified by looking at the data at those times. This figure shows where my data processing by EQCCT was cut short due to my laptop running out of RAM and the code shutting down. I will have to find a way to be more RAM efficient. (The STA/LTA run I shut down myself so I could have time to finish this figure).
Hopefully soon I will be able to post more cool images like these!
I have just returned to the New Mexico Tech campus after two very busy weeks. The URISE orientation week was quite fun and intense, I had lots of fun getting to know the people in my cohort, learning about their interests and what they will be doing in their projects.
After orientation, I headed up to the Valles Caldera. I was able to see first-hand the area where the data I will be working on was collected. I also got to learn and help install some broadband seismometers, as well as magnetotelluric stations—which Elliz will be working with!. This required a lot more digging than I thought, and by the end of the week we were all tired and looking forward to go back to town.
I am now taking a day off before starting to work at PASSCAL for a couple of weeks. I'm excited to have my own office! I'll be working on getting started with data collection. I'm curious as to what that will look like.
Since I have a break, I think it's a good time to think about my goals for the summer.
One of my main goals is to determine what areas of the full workflow of seismic data are the most enjoyable for me. Part of this goal has already been met. Spending a week in the field has confirmed that field work isn't really for me. I already knew this from having done a class on geologic field mapping at Berkeley, but this was a very helpful confirmation of that. I'm just not so good with being in the sun, getting muddy, and dealing with bugs—my goodness, the bugs! I'm totally fine with this, though. I've always found myself enjoying the data analysis side of things much more anyways. I'm looking forward to experiencing more aspects of that as I progress through my project.
Another goal is to broaden my knowledge of seismology, and to learn how it might apply to what I want to do reasearch on in the future. I already know I find real-time seismology concepts like earthquake early warning really interesting, but I want to learn what else I might find interesting. As I work throughout the summer, I will be spending some time thinking about this, and I hope that learning about topics adjacent to what interests me will also help me determine what kind of work I would like to do in the future. On a related note, I also hope that I get to hone important skills like networking and communication during my internship.
On a more personal note, another goal is to learn to experience places outside of home more fully. I rarely travel, and this is my first time I have been outside my home state of California. I think this is a great opportunity to learn how to be independent and getting familiar with new places.
Overall, I am still very excited to see and experience what the rest of the summer has in store. I don't think it has yet fully set in that I am doing seismological research; it still sounds like a fantasy!