[D] Decision tree that can detect phishing links: model is trained (I think..), now what?

Hello, new to ML and also not a very math-oriented person. I am creating a Discord bot that will be able to detect phishing links by using a decision tree (still need to figure out how to link the trained ML model to the bot).

The current accuracy of this program is 90% which seems pretty good on the surface but how can I tell if its *actually* 90%? I was reading about confusion matrixes and training via entropy, maybe either of those is good to use? Every run-through of the program the accuracy decreases. Why?

On the top line of my code you can see where I got my dataset which contains approximately 2000 instances. Is that enough? I found this dataset which contains 5000 instances https://data.mendeley.com/datasets/h3cgnj8hft/1 . Can I train the decision tree on more than one dataset? Is that a good idea? Should I combine both datasets into one?

Ultimately, what should be my next step(s)?

https://pastebin.com/XE0Ss9hq

submitted by /u/North_Bug
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Decision tree that can detect phishing links: model is trained (I think..), now what?