[D] Can ML be used to detect repeated patterns in text?

Written by torontoai on August 30, 2019. Posted in Reddit MachineLearning.

I am trying to analyze system logs. It’s a messy unstructured text. I would like to detect repeated patterns.

As an example:

Feb 24 06:48:03 circle vpopmail[12039]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 06:49:03 circle vpopmail[12043]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 06:50:03 circle vpopmail[12099]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 08:13:31 circle vpopmail[13042]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**70.104.21.208 Feb 24 08:13:32 circle vpopmail[13046]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**70.104.21.208

The pattern can be

pattern = “.password fail [EMAIL PROTECTED]:($ip)”

I didn’t know the pattern in advance. I just discovered by eye-balling the text. You might say, you can tokenize the words and count frequencies. In my case, it’s hard to tokenize and decide on windows of substring. the patterns might vary.

Is there a name for such techniques. I couldn’t find ML techniques appleid to this problem?

submitted by /u/__Julia
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Can ML be used to detect repeated patterns in text?