The Risks of Relying on VirusTotal for Domain Threat Intelligence

TL;DR – By the end of this post you will have an understanding of: how VirusTotal determines the risk of a URL, why you shouldn’t rely solely on VirusTotal for threat intelligence, where VirusTotal gets their data, how to get the most recent threat verdict for any domain or IP address, VirusTotal limitations to be aware of, VirusTotal alternatives, and best practices for investigating any URL and determining its actual risk. If you’re up for any of that, let’s go!

What is VirusTotal?

VirusTotal is a mainstay of the cyber threat intelligence community. Launched in 2004 and acquired by Google in 2012, the website serves as a user-driven, centralized repository for cyber threats across files, hashes, URLs, domains and IPs.

The platform is such a part of the threat intelligence zeitgeist that one of the defacto first actions a threat analyst takes with any suspicious file or website is to ask VirusTotal and the VT community what they know about it. Such is the power of brand equity in this space. It’s the so-called “gold standard” in the eyes of many.

That said, there are several misconceptions about what VirusTotal actually does. The downside of having nearly two decades of a strong reputation behind you means that these perceptions are hard to reframe. Nevertheless, it’s worth exploring what VirusTotal is and does, what it is not and does not do, and how the platform works with other organizations to provide value to the cybersecurity community.

To be clear, this post is not a knock on VirusTotal in any way. They provide a remarkable service to the community. However, when combined with a misunderstanding of certain aspects of the platform, people’s reliance on it can actually increase the likelihood that their investigation could result in a false negative or an otherwise-suboptimal outcome. So, let’s share and receive what we know about VirusTotal through that lens and in the context of their domain and URL threat lookup service.

VirusTotal is a data aggregator

VirusTotal does no first-party computation of domain or URL risk. Rather, they work with security vendors who have already calculated the risk and rendered threat verdicts of URLs. Those verdicts are fed into VT for their rendering to the inquirer on VT. The security vendors in the VT search results are not usually doing live interrogations of those URLs upon entry. They are presenting verdicts already in their database which is licensed to VT. VirusTotal does do live queries when explicitly asked (more on that later) and it presents the HTML responses it receives, but any enrichment, relationship mapping or pivoting into this information is based on third-party data.

Security vendors use VirusTotal for marketing and distribution

As of this writing, VirusTotal displays the threat verdicts of 89 different security vendors with every URL or IP query. The breadth and diversity of that coverage is useful, but the utility virtually ends there, with a simple red or green verdict on the URL from each vendor. There is no way to freely view any one vendor’s data on a URL. This throttling is by design. The vendors want you to license data from them, so they offer up this limited view of their opinion so their intelligence will be displayed on VT search results. VirusTotal is a distribution platform for security data vendors to be seen, paid and consumed by threat analysts.

Security vendors set their own policies for VirusTotal data

This is perhaps the most important point to consider when using VirusTotal. The data that vendors are allowing to be shown in VT search results is predetermined by them in terms of its freshness and completeness. The vendor-supplied data that VT is querying is almost always updated by the vendor at a far-slower pace than that at which they update it for themselves or their customers. This means that the threat verdicts that are furnished by individual vendors to VT can be of date or flat out wrong.

VirusTotal caches threat verdicts which impacts their veracity

It’s a little-known fact about VirusTotal that they cache their threat verdicts. This means that the verdict they are displaying to you for a given URL is not necessarily (and often not) the most up-to-date verdict on the target. This latency is presumably for the benefit of user experience. Reanalyzing a URL to render their vendors’ most recent verdicts takes VirusTotal about 120 seconds, so in the interest of time savings, it makes sense to offer up some recent analysis of a given target from a short term memory store as opposed to multiple, expensive database lookups.

That said, you can tell VirusTotal to “reanalyze” a URL, but you have to do this yourself, and the difference in verdicts rendered can be quite striking.

To demonstrate these discrepancies, Figures 1 and 2 show this latency-by-design in rendering threat verdicts between VirusTotal and a first-party domain intelligence source like alphaMountain.

In this case, you can see that two vendors are showing the URL as risky on VirusTotal (where alphaMountain’s verdict comes up as “Clean”) and without any categorization, while alphaMountain’s actual, real-time look up of the same URL shows high risk and is clearly categorized as “Phishing.”

Next, Figure 3 shows that same URL’s threat verdicts after clicking “Reanalyze” on the VT dashboard. After approximately two minutes, the new verdicts now indicate that twelve vendors found this URL to be risky, a 600% increase after user intervention.

Figure 1. VirusTotal results of a URL scan showing mostly clean results

Figure 2. Contemporaneous scan on threatYeti (powered by alphaMountain’s domain intelligence API) of the same URL showing very high risk

Figure 3. Updated threat verdicts on VirusTotal after a user-initiated reanalysis of the URL showing a 6x increase in malicious verdicts

The relative incompleteness of data goes beyond threat verdicts and carries over into site categorization. When VT shows which categories a site’s content belongs in, those categories are determined by the vendors. VT is simply regurgitating them to the end user from the vendor(s) who furnished them (often without attribution, but that’s a topic for another time).

However, not all threat intelligence vendors even offer site classification, and those that do do not always offer all of them up to VT. For example, alphaMountain does offer site classification into at least one of 83 categories, but we only furnish VT with classification of the riskiest hosts such as Phishing and Malicious. Again, VirusTotal is a marketing platform for security vendors. If you need content classification, you will likely need to license it from a first-party threat data feed.

VirusTotal URL scans have no scope

Because the data is not their own and they are not a first-party data source, VirusTotal shows you their threat verdicts based on the data that’s been furnished to them by security vendors. This means that the URL you input is the literal URL that will be called verbatim against all of their data sources. Thus, VT may return a clean bill of health for a host on a domain that is known to be malicious. This type of clear miss or false negative could potentially throw off your research or investigation. However, if you asked a specific vendor directly, you’d likely get a malicious categorization due to the vendor’s ability to apply scope, or in other words, extend the domain entry to all its hosts.

Figures 4 and 5 below exemplify the difference in risk profiles between a subdomain and its root. Figure 6 shows the subdomain’s true risk as visualized by alphaMountain’s threatYeti domain research platform.

Figure 4. A scan of the root domain mobiaib-online.com indicates high risk

Figure 5. A scan of the subdomain www[.]mobiaib-online[.]com shows no hits

Figure 6. Checking the subdomain directly on threatYeti indicates high risk of phishing

Benefits of using VirusTotal

VT is a great starting point for investigating the risk of a domain or IP address. With a single query, a threat analyst can get a good directional rendering of a site’s risk. This means that VT makes it easy to get a red or green, thumbs up or down verdict on a site quickly. That’s useful. Whether the data belongs to VirusTotal or their vendors is irrelevant to the analyst at the earliest stage of their investigations.

VirusTotal has a robust community which makes it easy for analysts to share their findings with a global peer group that can actually make use of the information. There is no stronger community in the business world than the cybersecurity community, and reach and reaction time are essential to keeping organizations safe from fast-moving threats. VirusTotal is great crowdsourcing and sharing information.

Malware analysis VirusTotal’s core competency. More so than their URL and domain analysis where they aren’t as strong, legions of security analysts and threat researchers rely on VT’s malware detection and aggregated data to analyze files in their investigations. However, as it is with domains and URLs, the malware data here, too, is offered up by security vendors looking to market to their own prospects and customers. That said, the relational mapping that VT offers between malicious sites and malicious files is an undeniable force for good in the industry.

What you should know when using VirusTotal

When looking up a URL or domain to see if it’s risky or dangerous, there are a few things to keep in mind when using VirusTotal. Let’s recap what we know.

VirusTotal threat verdicts and categorizations come from their security vendor partners
VirusTotal URL “scans” have no scope and are lookups against their third-party partner data
VirusTotal threat verdicts are binary: clean or dangerous, but real risk is much more nuanced
Security vendors use VirusTotal as a marketing platform, offering some of their data in exchange for visibility, often throttled or of low fidelity
Security vendors determine the freshness and fidelity of the data they provide to VirusTotal

Always use a first-party intelligence source

Taking into account the facts about VirusTotal above, you’ve probably already concluded this, but…at least some of your intelligence sources should be primary sources as opposed to aggregators.

There’s value in the concept of “security through diversity” and that’s true in domain threat investigations as well. No intelligence is right one hundred percent of the time, and no matter what verdict VirusTotal returns for a domain, always validate it against a first-party domain intelligence source such as alphaMountain’s domain intelligence API or threatYeti domain research platform. Figures 6 and 7 show validation of a “Clean” VirusTotal verdict against alphaMountain’s first-party, real-time verdict. The contrast is striking.

Figure 7. A “Clean” bill of health from VirusTotal despite a categorization of “Suspicious”

Figure 8. Validation of results indicates a far riskier threat verdict than what was indicated by VirusTotal

Take VirusTotal results with a grain of salt. VirusTotal’s opinions are really only as good as the community of users and vendors who support it. Knowing that the vendors are not supplying their latest and greatest intelligence means that one way or another, the aggregated consensus of VirusTotal’s vendors is at best a directionally-accurate assessment of a domain or URL’s actual risk.

Finally, share with the community whatever you find out about a domain or URL. Take the VirusTotal verdicts and your verdicts from whichever first-party data source you validate them against, and share them both in the VT community and on social media such as Twitter where the cybersecurity audience is quite substantial and highly engaged. This will not only engender interest from the community in your research efforts wherein they might be able to help you, but it will also educate the community about how to use VirusTotal most effectively to get accurate domain threat verdicts.

We hope this has helped to demystify VirusTotal and clear up any misconceptions about the platform. VirusTotal is in a class of its own. We respect and appreciate what they’ve built and the community that they’ve nurtured around it. It is what it is, and it’s a very good thing.