CatAmount

This has been my favorite project in recent memory. I worked with the client to create an advanced solution to a real problem; I enjoyed working with the charismatic subject of the data (mountain lions); I was able to donate a decent in-kind gift to a science-based non-profit; and in the long term the work will benefit our understanding of mountain lions.

Project Background

The Teton Cougar Project is an ongoing study of mountain lions (Puma concolor) in Wyoming. One part of their work is equipping certain cats with GPS collars.

GPS collar data can give the researchers insight into the behavior of mountain lions, including how they use the landscape at different times of the year, how they interact with each other, and more. But GPS data is just a bunch of numbers; interpreting the data is the job of the researchers.

The researchers have a large and growing database of GPS data that tells a story. This data can then be analyzed, interpreted, and even verified in the field by researchers.

Read more about the Teton Cougar Project.

Project Specifics

Inside a GPS collar, the data is stored in a proprietary, presumably space-efficient format. The collar manufacturer provides software to convert this data into something usable, often CSV text data.

without_custom_software

Once you have the data in an interchangeable text format, you can import it into a sophisticated mapping application such as ArcGIS. But the general-purpose mapping application was not made with this particular species in mind.

with_custom_software

The opportunity for custom software is in knowing the specifics of mountain lion behavior, and being able to find patterns in the raw data. For example, researchers know that certain mountain lion behaviors lead to distinct patterns in the data. Custom software can find these patterns, and researchers can use the findings for more analysis. Or the custom software can create a new kind of data to use with the general-purpose mapping application.

Sparse vs. Complete Data

A major consideration of the project is that the researchers do not always have the complete data set for a given collar.

While a GPS collar is deployed, the researchers can retrieve some data from it either by radio communication or bi-directional satellite communication. Remotely read data is sparse because only a limited number of events make it through this unreliable channel.

When the GPS collar is eventually retrieved, the flash memory can be read at one's leisure. This produces an authoritative complete set of data.

The sparse data is valuable to researchers because it includes events that happened as recently as a few hours ago. The complete data is valuable because it contains an authoritative record for a given cat. Any software solution has to work for both types of data.

For instance, one type of data pattern is a cluster. This is when one cat stayed in the same location over an extended period of time (as opposed to roaming). Staying in the same place can indicate different behaviors that are interesting to researchers, such as predation, mating, rearing offspring, or something else.

Another type of data pattern is a crossing. This is when two different cats are near to each other in time and space. Why do mountain lions meet? What do they do when they meet? These are more interesting questions for researchers.

The goal of the project is to analyze GPS data, find the requested patterns, create useful output, and make the whole process easy to use.

Existing Solution

The Teton Cougar Project had an existing Python script that analyzed data and returned clusters, but it had a number of problems. I won't mention them all, but essentially it was cumbersome to use, requiring a lot of data transformations. Also, it only found clusters, whereas the client wanted something that would find clusters, crossings, and more.

Custom Solution

I delivered a complete software solution, including a custom GUI for performing the work, a complete User Guide for reference, and client support. The software is called CatAmount.

I worked with the client to make sure that data file format for CatAmount exactly matched the data format they were already using, no conversion necessary. Likewise, I worked with them to make sure all output from the program was ready to be imported into their other software. Coordinating with the client in this way means we immediately eliminated costly data conversion steps from their workflow.

I paid attention to how the client works with their data. For instance, their previous tool required them to convert UTM coordinates to longitude and latitude. The new software just uses UTM directly, thereby eliminating another costly data conversion, along with its potential for errors.

Find Clusters

find_clusters

Find Clusters GUI

CatAmount can find clusters in very little time, even for large data sets or sparse data sets. I developed a new clustering algorithm that takes advantage of natural order in the data.

The definition of a cluster is somewhat elastic, and depends on what distance and time cutoff you choose as part of the definition. The researcher needs to be free to try out different settings, and the GUI makes changing those settings incredibly easy.

The program returns text output describing each cluster. The user can choose between several kinds of text output, but CSV is the most common.

I used my experience with programmatically creating images to also provide image feedback for every operation. The image feedback is a graphic image that shows the same information as the text, but in a visual format. This allows the user to get a feeling for whether the operation was successful, beyond what can be sensed by just looking at a bank of numbers.

Screenshot of Find Clusters in action.

Show Territories

show_territories

Show Territories GUI

This is a simple function that takes advantage of the image-creating code to create a color-coded graphic showing where one cat is in relation to others. It uses a simple polygon method to describe the territory of each cat. The GUI makes it easy to change the date range, or select different cats.

This feature is not useful for scientific study, because the model of a territory is too simplistic. But it remains useful for auditioning the data that is available in a data set, and testing arguments that you are using with other functions. This gives you immediate visual feedback about what data is available in the set.

Screenshot of Show Territories in action.

Find Crossings

find_crossings

Find Crossings GUI

Crossings are another relationship, found in the data, that is of interest to researchers. Crossings are a time when two cats were near to each other in space and time. The meeting could be friendly, or it could be antagonistic.

As with everything else, the settings that control the definition of a cluster are easily changeable via the GUI. Also, this function produces clean text output that can be imported into other applications. And it also creates image feedback to give the user a visual sense of what was found.

Screenshot of Find Crossings in action.

Find Whodunit

find_whodunit

Find Whodunit GUI

Find Whodunit puts a necessary tool in the toolbox of researchers. In every case we've looked at up to this point, the researchers are looking at the data for a certain cat, and analyzing its location at different times. But in some cases, a researcher might know the date and time, but not the cat.

For example, imagine an expedition found a mountain lion predation site, but the cat who did the deed is long gone. In that case, you know the exact place, you can estimate the time, but you don't know which cat was involved.

The whodunit function allows you to query a certain time and place, and see if any cats were nearby. As with the other functions, you can easily control the settings that define a match via the GUI , and you get the full complement of text output and image feedback.

Screenshot of Find Whodunit in action.

About The Solution

Python was chosen for this project because it was already being used by their existing solution, it is available for all major platforms, and it is easy to read and verify.

Because the major functions were different but closely related, I used a Python module to hold code that is common between the different functions. The module contains base classes that all of the major functions can use.

Each major function is a separate command-line program. A command-line interface was chosen so that, if the need arose, the researchers could use scripting to automate their work. The GUI serves as a full front end to the programs, so they really get the best of both worlds.

We don't know how large the data set will eventually grow, so the software has to operate smartly. For example, if the data set is going to be pruned, do so before a costly series of comparisons. You may not notice the difference with a small amount of data, but it's smart to plan for a large amount of data.

There is a text configuration file, where the user can enter default values for every configuration. In this way, the user's preferences are preserved between sessions, and the user is the one who specifies the default values.

Charitable Donation

This project was licensed with GPL and given to the Teton Cougar Project as an in-kind donation. One of the good things about being an independent developer is that I can donate my time and expertise when it feels like the right thing to do.