[D] Can ML be used to detect repeated patterns in text?
I am trying to analyze system logs. It’s a messy unstructured text. I would like to detect repeated patterns.
As an example:
Feb 24 06:48:03 circle vpopmail[12039]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 06:49:03 circle vpopmail[12043]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 06:50:03 circle vpopmail[12099]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 08:13:31 circle vpopmail[13042]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**70.104.21.208 Feb 24 08:13:32 circle vpopmail[13046]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**70.104.21.208
The pattern can be
pattern = “.password fail [EMAIL PROTECTED]:($ip)”
I didn’t know the pattern in advance. I just discovered by eye-balling the text. You might say, you can tokenize the words and count frequencies. In my case, it’s hard to tokenize and decide on windows of substring. the patterns might vary.
Is there a name for such techniques. I couldn’t find ML techniques appleid to this problem?
submitted by /u/__Julia
[link] [comments]