Phishing Analyzer - Email Scanner

A Python tool that scans .eml email files for signs of phishing. It checks the sender, message text, links, and attachments for red flags like suspicious keywords, fake domains, and risky file types. The tool gives each email a score, a clear verdict, and can optionally check link reputations with VirusTotal.

Tyler Droxler
September 21, 2025

Project Overview

Phishing is one of the most widespread and damaging tactics used by attackers, responsible for a large share of security breaches worldwide. By disguising malicious emails as legitimate communication, attackers trick users into clicking harmful links, sharing credentials, or downloading malware. Its persistence and its effectiveness makes detection of phishing a critical skill for cybersecurity professionals.

In the following project, I use a Python program to aid in the phishing analysis of .eml files, which are commonly used to store raw email data including headers, body text, and embedded links. The objective of this tool is to provide an effective way to detect phishing indicators by combining rule‑based keyword and URL analysis with external threat intelligence from the VirusTotal API.

Python Program

How The Program Works

The script parses .eml files to extract headers, body content, attachments, and embedded URLs. It then applies a series of detection rules to identify common phishing tactics, such as display name spoofing, suspicious reply-to addresses, phishing-related keywords, dangerous attachment types, and obfuscated or typosquatted URLs. The analyzer also inspects HTML content for deceptive links and hidden elements. If a VirusTotal API key is provided, the tool checks URLs against the VirusTotal database for known malicious or suspicious activity. Each finding contributes to an overall phishing risk score, which is used to classify the email as "Likely Phishing," "Needs Review," or "Probably Safe."

For example, attackers are known to spoof domains, display names, and create a sense of urgency in order to steal credentials.

Sample Email:
From: PayPal Support
<support@paypa1.com>
Subject: Urgent: Verify your account to avoid suspension
Reply-To:support@paypa1.com

Dear Customer,
We have detected unusual activity on your account. Click here to verify your account.

Thank you,
PayPal Security Team

In the example email above you can see the "from" and "reply-to" emails both end in "paypa1.com", which appears similar to the authentic "paypal.com". This tool is designed to catch this kind of malicious email spoofing by using a list of common high-risk brands (PayPal, Amazon, Wells Fargo, Chase, etc.) and comparing the similarity to the sender's domain. The attacker may also try to get the user to download malicious software. The tool looks for any attachments and more specifically inspects which file types are attached. If the file type has a suspicious extension, such as (.exe, .scr, .bat, .js, .jar, etc) it will flag it as suspicious and add to the overall risk score.

The tool also takes into consideration false positives by weighting risk. For example, in the program the following weights are assigned:

Display name mismatch: 2
Reply-to mismatch: 2
Keyword: 1
Suspicious Attachment: 3
Suspicious URL: 3
Link mismatch: 2
Virus Total (Malicious): 4
Virus Total (Suspicious): 2

Each indicator that is present is added and the total score is the deciding factor whether the email is "likely phishing" (6), "needs review" (3), "probably safe" (<3). By weighting the scores this way, it reduces the likelihood of a false positive. For example, if an email was sent by the authentic "paypal.com" saying a suspicious phrase such as "update your password" it would be marked as a 1 on the risk scale. This shows a level of risk associated with this email, but because there were no other risk factors associated, it is safe to assume the email is likely safe.

Results

Analysis 1

To test the validity of the program, I downloaded sample .eml files that are known to be malicious from malware-traffic-analysis.net against an email i sent, which I know is safe to see how the tool would perform.

Important! When downloading malware or sample data for analysis it is advised to only work with in a sandboxed environment such as a VM, to protect your system and network.

After the sample files were downloaded, I opened powershell and ran the following script using
py <path\to\program> <path\to\file>

py .\Phishing_Analyzer.py 'C:\Users\Tyler\Desktop\Sample_eml_files\phishing_1.eml'

In the results, we see the date, risk score, verdict, indicators, and URLs contained within this email. The scanner labeled this as "Likely Phishing" due to its risk score of 7. Based on the indicators and weighted scores we see the program took the all of the indicators and added their scores together to come up with the verdict.

Obfuscated/Unusual URL (Score: 3)
VirusTotal lookup verdict "Malicious" (Score: 4)
Total Risk Score: 7

One interesting detail to note from this particular .eml analysis is we can see how the attacker intended to trick the user. The URL found "https://servervirto.com.co/ed/trn/update?email=brad@malware-traffic-analysis.net" is inconspicuous at first glance, but has a .co after the .com, likely intended to spoof servervirto.com. The attacker could have used this to steal credentials, or potential lead the user to a malicious site containing malware, to further compromise the users system.

Analysis 2

To continue testing the validity of the tool, I use another sample .eml file downloaded from malware-traffic-analysis.net.

After running the script, the analyzer gave this email a score of 11 and concluded it is likely Phishing.

Looking at the indicators detected we see the attacker tried to trick the user in a couple of ways. First, the analyzer took note that the text links contained within the email, lead to an HTTP URL not an HTTPS. In doing so, the attacker can trick the user into using an insecure channel of communication as HTTP does not use SSLcryption that HTTPS offers. Meaning any data packets sent over this site could be intercepted by the attacker, which could contain sensitive information such as username/passwords, session tokens, or banking information. Next,the analyzer determined the attacker not only obfuscated the URL by adding .af after the .edu extension attempting to spoof the domain, they also have the displayed URL redirect to a different link.

Lastly, The analyzer ran this through the VirusTotal API which flagged this domain as malicious. All of these factors, indicate it is safe to say this email is likely malicious.

Analysis 3

Now that we have seen the analyzer determine indicators of malicious activity, we need to check how it does in deciding safe emails, as false positives can occur. To test this, I sent myself an email that I know is safe.

As you can see, I pasted a YouTube link in the body of the email, but did not hyperlink it, rather it is written in plain text. The analyzer should detect this as a red flag even though the link itself is safe. This is to catch attackers who try to paste malicious links in plain text attempting to bypass spam filtering.

After downloading the .eml file and running it through our script we get the following results:

As you see the analyzer indicated the issue previously mentioned and gave this email a score of 2. Since, this is less than 3 it was given a verdict of "Probably Safe" this is a great example of why weighting the indicators is essential. If the scores weren't weighted the analyzer would detect this red flag and mark it as  Phishing. By weighting the scores we can address the issue of false positives, because the email will only be labeled as phishing if there are multiple indicators or if the indicator is a MAJOR red flag automatically putting it over the threshold.

Since the analyzer determined the link is safe, feel free to check it out: https://www.youtube.com/watch?v=dQw4w9WgXcQ&list=RDdQw4w9WgXcQ&start_radio=1  

Lessons Learned

Overall, this project showcases just how useful scripting and working alongside automation tools can help analysts in their investigations, as well as protect ordinary users. The tool can detect malicious activity that may have otherwise been overlooked. It also shows how phishing analyzing web extensions play a significant role in protecting users who are untrained. Phishing is the most common method attackers use to target individuals and organizations because humans are always the most vulnerable asset to an organization. People get tens, hundreds, even thousands of emails a month so statistically with enough time it is likely someone could be caught off guard by a well crafted phishing email and click a malicious link which can compromise their personal data leading to financial, reputational, or organizational damage. One click could result in millions of damage or data loss to an organization.