Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Decision tree that can detect phishing links: model is trained (I think..), now what?

Hello, new to ML and also not a very math-oriented person. I am creating a Discord bot that will be able to detect phishing links by using a decision tree (still need to figure out how to link the trained ML model to the bot).

The current accuracy of this program is 90% which seems pretty good on the surface but how can I tell if its *actually* 90%? I was reading about confusion matrixes and training via entropy, maybe either of those is good to use? Every run-through of the program the accuracy decreases. Why?

On the top line of my code you can see where I got my dataset which contains approximately 2000 instances. Is that enough? I found this dataset which contains 5000 instances https://data.mendeley.com/datasets/h3cgnj8hft/1 . Can I train the decision tree on more than one dataset? Is that a good idea? Should I combine both datasets into one?

Ultimately, what should be my next step(s)?

https://pastebin.com/XE0Ss9hq

submitted by /u/North_Bug
[link] [comments]