The Unlikely Link Between URL Classification and Cruise Control

We recently returned from London where we exhibited at InfoSecurity Europe. One of our main objectives in attending Europe’s largest cybersecurity conference was to educate people on the state of the art in web filtering and URL threat intelligence.

The InfoSec audience consisted of people with a broad range of cybersecurity savvy, so in order to effectively demonstrate the capabilities of contemporary URL classification, we identified an unlikely parallel that we knew most folks in attendance could relate to: cruise control.

To help illustrate the state of the art in URL classification, we used the cruise control analogy to set the stage, and then we looked at our real world example of a phishing scam that impersonated the United States Postal Service.

There is remarkable similarity between the development of cruise control and URL classification. Although cruise control is technically a much older technology, both cruise control and URL classification have had three eras of capabilities. We’ll get into those here.

Let’s dig into what cruise control and URL classification have in common as we share with you excerpts of this presentation from our well-attended session at InfoSecurity Europe 2024.

The father of cruise control

The very first version of cruise control was patented in 1950 by a remarkable engineer named Ralph Teetor. Born into a manufacturing family, legend has it that Teetor talked his way into the engineering department at the University of Pennsylvania and first showed his industrial prowess by improving the balance of steam turbines on U.S. Navy warships.

His many accomplishments aside, what was most remarkable about Teetor was that he was also blind. If you take nothing else from this analogy, let it be this bit of trivia that’s sure to leave your colleagues slack jawed.

The early days of cruise control

When it finally came to market in the 1958 Chrysler Imperial, cruise control was a dumb technology. You set the speed and the car would hold it. It required a lot of human intervention and it just wasn’t that functional. But here’s the thing: this was the accepted standard for 50 years. So, it was obviously a valuable feature that became pretty foundational to the automotive user experience.

The early days of cruise control involved a lot of manual entry and human intervention for what was ultimately limited functionality.

The early days of URL classification

As it turns out, manual effort was exactly what URL classification was also about in the early days of the internet. Every URL on Yahoo and the Open Directory was input and checked manually by a human.

You can see in this screenshot the very early days of a taxonomy forming with a “grand total” of nineteen different URL categories on Yahoo at the time.

Circled at the bottom, you can see it says there were 23,836 entries in Yahoo! Obviously, that was a huge number at the time, but the web was growing fast at that time, and soon, there were more sites popping up than there were people to visit and categorize them.

The 2000s: adaptive cruise control

Keeping with the cruise control analogy here, in 1999 Mercedes-Benz introduced Distronic, the first adaptive cruise control system.

The big innovation here was that cars could now both speed up and slow down automatically, depending on the distance to the vehicle in front of it.

All a human had to do was input speed and separation distance, and the computer did the rest.

This development removed a ton of the human intervention required to keep the vehicle safe and, moreover, the car was now able to make some decisions based on the inputs it detected.

This was a huge step forward in the functionality and value of cruise control.

The 2000s: semi-automated URL classification

When we look at the URL Classification capabilities of the early 2000’s we see a lot of the same themes. We see some low-powered automation by way of searching for patterns and regular expressions that could give us a general sense of what a site’s about.

This was the dawn of crawlers and bots promising programmatic site classification. But it wasn’t so easy.

These new capabilities enabled some classification at scale, but without human validation, that led to the accuracy going way down as well.

So, there was still a ton of human interaction all the way from sourcing sites to validation and verification of their contents.

Today: full self-driving

Finally, here we are today in 2024 and a lot has changed.The cruise control of today looks nothing like it did in1958, and the really interesting thing is that it keeps getting better.

Using Tesla as an example, they have, in fact, recently achieved true full-self driving which is a marked improvement over where they were just a few months ago. It’s worth noting that that’s how machine learning works: constant improvement.

Make no mistake, the vast majority of the cars on the market today are still running the cruise control technology of the early 2000’s or even the 1950’s. But state of the art is full autonomy.

Cars can now make split second decisions on what to do, based on inputs they receive in real time.

Not only can they decide, but they can also act without human intervention.

In other words, the cars of today are actually detecting and responding to threats–in real time.

Naturally, that brings us nicely into the URL classification technology of today.

Today: full self-detecting

It should come as no surprise at this point, that this level of automation is what we’re also capable of now with URL classification.

We have automated classification with low human intervention at scale, and with incredible accuracy.

In addition to classification, just like what’s happening in autonomous driving, we now have AI giving us real-time threat verdicts for any URL on the internet.

This is the real power of AI and it’s available right now.

Real-time URL classification and threat detection

What you see here is alphaMountain’s threatYeti rating and classification for the URL found in our USPS phishing scam post. threatYeti is built on our real-time threat detection and classification API–the same technology that’s powering detection and classification at some of the biggest companies in the world.

Here you can see clearly that:

AI is telling us: the threat score is a 9.2 out of ten

AI is telling us: this is a phishing site.

AI is telling us: that these scammers are running a UPS phishing campaign as well.

In other words, if you’re looking for the Tesla of URL classification, you’ve come to the right place.