Join our meetup, learn, connect, share, and get to know your Toronto AI community.
Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.
Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.
I am trying to analyze system logs. It’s a messy unstructured text. I would like to detect repeated patterns.
As an example:
Feb 24 06:48:03 circle vpopmail[12039]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 06:49:03 circle vpopmail[12043]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 06:50:03 circle vpopmail[12099]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**67.109.191.46 Feb 24 08:13:31 circle vpopmail[13042]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**70.104.21.208 Feb 24 08:13:32 circle vpopmail[13046]: vchkpw-pop3: **password fail [EMAIL PROTECTED]:**70.104.21.208
The pattern can be
pattern = “.password fail [EMAIL PROTECTED]:($ip)”
I didn’t know the pattern in advance. I just discovered by eye-balling the text. You might say, you can tokenize the words and count frequencies. In my case, it’s hard to tokenize and decide on windows of substring. the patterns might vary.
Is there a name for such techniques. I couldn’t find ML techniques appleid to this problem?
submitted by /u/__Julia
[link] [comments]