[D] Decision tree that can detect phishing links: model is trained (I think..), now what?
Hello, new to ML and also not a very math-oriented person. I am creating a Discord bot that will be able to detect phishing links by using a decision tree (still need to figure out how to link the trained ML model to the bot).
The current accuracy of this program is 90% which seems pretty good on the surface but how can I tell if its *actually* 90%? I was reading about confusion matrixes and training via entropy, maybe either of those is good to use? Every run-through of the program the accuracy decreases. Why?
On the top line of my code you can see where I got my dataset which contains approximately 2000 instances. Is that enough? I found this dataset which contains 5000 instances https://data.mendeley.com/datasets/h3cgnj8hft/1 . Can I train the decision tree on more than one dataset? Is that a good idea? Should I combine both datasets into one?
Ultimately, what should be my next step(s)?
submitted by /u/North_Bug
[link] [comments]