completed

Automated Phishing/Scam Detection

$10,000.00 Received
$10,000.00 Requested
Ideascale logo View on ideascale
Community Review Results (1 reviewers)
Addresses Challenge
Feasibility
Auditability
Problem:

<p>Cardano giveaway scams are on the rise, with of users losing millions of USD via these scams. We need scalable ways of defending users.</p>

Yes Votes:
₳ 273,566,710
No Votes:
₳ 21,010,883
Votes Cast:
1835

This proposal was approved and funded by the Cardano Community via Project F6: Cardano Emerging Threat Alarm Catalyst funding round.

  • download
  • download
  • download

Detailed Plan

Motivation

Tens of new Cardano scams are launched each day. Whether through hacked YouTube channels, social media, or direct messages, scammers lure new Cardano users to their scam sites, promising giveaways and lotteries. As a result, naive Cardano users end up losing millions of dollars and, could very well, stop their cryptocurrency journey out of fear of future loses. Charles Hoskinson has released at least two episodes on YouTube warning users about scams, further exemplifying how urgent this emerging threat is. While education is a core component of protecting users, it cannot be the only component. We need technical means to protect users in their very first steps of their Cardano journey.

Our proposal

Motivated by the magnitude of this giveaway abuse, we already built a Proof of Concept (PoC) of our proposed system for early detection and reporting of Cardano phishing/giveaway scams. The backend of this system is currently running on a personal server operated by the PI and the front-end is a dedicated Twitter account: <https://twitter.com/CardanoPhishing>

The bot currently works by parsing large lists of DNS zone files searching for domain names that include certain configurable keywords like "hoskinson", "cardano", and "ada". For each such domain, a web crawler is automatically dispatched that scrapes the page, looking for phrases associated with scams (e.g. "giveaway", "hurry up", "event", "send X ada"). The scrapes that result in a large number of matched keywords are then marked as potential scams and are reviewed through a special dashboard that we have created. Each true scam is marked as such and then automatically tweeted.

In less than a month, our bot has tweeted more than 300 discovered scams (we discover between 5-10 new domain names each day). The reception has been remarkable. Namecheap (a popular registrar and hosting provider) has been consistently taking down the reported scams hosted on their servers and we have gained almost 200 followers from nothing more than word of mouth. We have de-risked this project via this PoC since we know that <u>our approach works.</u>

We wish to continue working on this project (transforming it from a hobby project to a real system with real value), hence this proposal. Our proposed areas of work (visible in our attached image) are as follows:

- Expand sources of suspicious domain names, past zone files, including the use of Certificate Transparency Logs and YouTube. We'll need to build continuous integration pipelines to fetch daily lists of domain names, build YouTube bots capable of screen scraping and OCR-ing videos (in search of domains embedded in scam videos), and then dispatching armies of crawlers towards the identified destinations.

- Work on anti-evasion mechanisms before they become an issue. We anticipate that scammers operating these scams will have every reason to evade our proposed bots (detect that we are visiting them and show us benign content). Our anti-evasion mechanisms will revolve around geographical diversity (so that a single IP address cannot be blocked), diversity of crawling software, and other anti-fingerprinting techniques that we will incorporate into our tools. The PI (Nick Nikiforakis) is an academic with 15 years of cybersecurity experience and has built a large number of bots for discovering online malicious content ranging from phishing pages to technical support scams.

- Isolate intelligence content. Other than just reporting the discovered malicious websites, we will work on isolating parts of them, such as, the advertised wallet addresses. We will make lists of malicious wallet addresses available to wallet software and online exchanges so that new users can be warned when they are about to send money to them, and given a chance to read more about giveaway scams. Similarly, we will create lists of malicious IP addresses and domain names that will be available to hosting providers, ISPs, registrars, and operators of blocklists such as OpenPhish, Phishtank, and Google Safe Browsing.

- New dedicated UI and APIs. Next to our Twitter account, we will develop a modern web application where users will be able to retrieve lists of malicious URLs, IP addresses, and wallets. The site will also be the API endpoint for all the aforementioned intelligence that will be provided to interested stake holders. Co-PI Peter Bui is an experienced web developer and has connections with multiple wallet providers that we will use to advertise these APIs.

- Popularization of our tool. The Co-PI of this proposal (Peter Bui) is an operator of a popular Australian Cardano Stake pool and the podcaster behind the "Leard Cardano" podcast (https://www.youtube.com/channel/UCj-_2e7L2UgHaJLrGEOJRzA). We will use this podcast to not only keep warning new Cardano users (the ones who are the most likely to be attracted to a podcast about learning Cardano) about these scams but also use the reach of this podcast to connect with stakeholders in ISPs, hosting companies, registries, and web-security companies, who can all integrate with our APIs and protect their users.

- Manual analysis and ML. A stretch goal for this project is the use of supervised machine learning for the final labeling of suspicious web pages. Once our database of known scams sufficiently expands (past say a thousand positive, i.e., scam, samples), we will experiment with supervised machine-learning techniques (such as a Random Forest) to automatically flag the high-confidence scams so that manual analysts can focus their labeling efforts on only a number fraction of suspicious domain names for which the classifier cannot produce a high-confidence label. We have experience using supervised ML for detection of phishing pages, malware sandboxes, and tech-support scams.

Cost

We request $10K for this project. According to our current projections, these funds will be split 70% towards the development of this system (as described earlier) and 30% towards infrastructure and service costs (such as the cost of virtual machines, larger quotes in geolocation/ASN APIs, and possibly OCR services for scraping suspicious YouTube videos).

Roadmap

As mentioned earlier, a PoC of our system is currently up and running, reporting scams on a daily basis.

Month 1. We will port our crawling infrastructure to multiple dedicated cloud servers that exhibit geographic diversity.

Month 2. First iteration of our separate web frontend showing our so far collected scams

Month 3. Finish incorporating multiple anti-evasion techniques, additional sources of scam domains, and a working copy of our YouTube scraping bots.

Month 4. Finalized YouTube scrapers. Design of APIs endpoints for malicious wallet addresses, IP addresses, and hostnames.

Month 5. Further work on finalizing endpoints. Reaching out to stakeholders. Keep advertising this project over the podcast, and keep educating users.

Month 6. Bug fixing, exploring our stretch goal of using ML for some/most of the labeling effort.

Month 6+. Integration with stake holders. Analysis of possible evasions and mitigations for them. Further education via the podcast and social-media accounts.

What is success?

Success will be for our bot to keep finding new scams each and every day (i.e. successfully bypass future evasions by scammers) and for at least one stake holder (ISP, hosting company, exchange, wallet provider) to express interest in integrating with our tool. We will also use our social-media presence to collect evidence of takedowns (e.g. such as Namecheap's existing tweets in response to our reports) and stories of users who did not fall victim to such scams, because our tool protected them.

IP

Given the very real threat of evasions (i.e. scammers studying our tool and identifying ways of bypassing it so that they can avoid being flagged) we cannot open source the code in the traditional sense of the word. Our code, however, will be freely available to vetted researchers and the Cardano foundation.

Our team

Nick Nikiforakis. Nick Nikiforakis is an academic with a PhD in web security and 15+ years of experience. He is the author of more than 70 peer-reviewed papers, cited more than 3,800 times. He has extensive experience with identifying online malicious content and has built dozens of crawler-based systems. More information about him is available on his Twitter account (https://twitter.com/nicknikiforakis) and his personal website <https://www.securitee.org>

Peter Bui. Peter Bui is an experienced web developer, stake pool operator (ADAOZ), and the podcaster behind the popular "Learn Cardano" podcast. He has a large worldwide audience who listen to him for all things Cardano as well as existing relationships with wallet manufacturers and online exchanges. More information about him is available on his Twitter account (https://twitter.com/astroboysoup?lang=en) and his Stake pool website: <https://cardanode.com.au/>

Community Reviews (1)

Comments

Monthly Reports

close

Playlist

  • EP2: epoch_length

    Authored by: Darlington Kofa

    3m 24s
    Darlington Kofa
  • EP1: 'd' parameter

    Authored by: Darlington Kofa

    4m 3s
    Darlington Kofa
  • EP3: key_deposit

    Authored by: Darlington Kofa

    3m 48s
    Darlington Kofa
  • EP4: epoch_no

    Authored by: Darlington Kofa

    2m 16s
    Darlington Kofa
  • EP5: max_block_size

    Authored by: Darlington Kofa

    3m 14s
    Darlington Kofa
  • EP6: pool_deposit

    Authored by: Darlington Kofa

    3m 19s
    Darlington Kofa
  • EP7: max_tx_size

    Authored by: Darlington Kofa

    4m 59s
    Darlington Kofa
0:00
/
~0:00