An example: My current gig is support of a pretty good sized DNS environment for a large corporation. Monday I was given a list of DNS records, and asked to find out the number of times they'd been requested in DNS recently. They wanted the numbers for about a weeks worth of time. I have querylogging enabled on my servers so I can look at some stuff that helps me troubleshoot issues. The servers generate more than 5 GB of data per day as raw text.
I already have some scripts that go get the logs and store them on a centralized system and do some preliminary processing on them so I can know what names are being looked up most, and what systems are doing the most requests. The script produces a list of the top 20 records requested for each DNS server, and the top 20 requesters. What worked for me in this instance, is that I leave the intermediate files that are generated during this processing since they only take up about 100 M as compared to the original 5GB of data. These intermediate files consist of one line per record with a count of how many times that particular name was looked up.
I wrote an additional script that would take a list of records from a text file look at multiple days of data for those records, and add up the number of hits for each record, and output the result. Further, I was able to split out queries to internal and external queries into separate columns.
The end result was after some tweaking, the next time I'm asked for this type of information, I can take the list, feed it into my script, and have as a result a CSV that can be fed directly into a spreadsheet and sifted/sorted to your hearts content.
All of this was done using standard unix tools of grep,sed,awk,and perl in a manner that gives me a new reusable tool to get information management and our technical wants in a timely basis.
I've asked for this kind of information from the folks who support the AD/DNS servers for some internal domains, but the most I've ever been able to get from them is raw logs, which I then proceeded to slice and dice using slight tweeks to my older stats generating scripts because of formatting differences. I'm told that there is some utility in microsoft's "powershell", but I've not seen it used in the way I use scripts.
Some might say that work is different from home use. That's true enough, but I used very similar methods recently when I wanted to know how much disk space my audio files on my desktop were taking up. Yeah, I could have gone into the file manager and right-clicked on the directory and had it tell me how much space was consumed, but that wouldn't have broken it out into separate counts for mp3,flac, and ogg files, which was information I was also interested in.
Very cool story. As an old prof of mine used to say - tools that allow real people to do real work.
Historically Perl was invented for exactly this sort of task. Nowadays, I’d probably choose Python but for projects of this size and scope you could probably flip a coin. Perl might be a bit faster, Python would certainly be more readable, and has a slightly better story as fall as objects and functions and extensibility with libaries.
Very cool story. As an old prof of mine used to say - tools that allow real people to do real work.
Historically Perl was invented for exactly this sort of task. Nowadays, I’d probably choose Python but for projects of this size and scope you could probably flip a coin. Perl might be a bit faster, Python would certainly be more readable, and has a slightly better story as fall as objects and functions and extensibility with libaries.
Have you thought about using a freeware Splunk to do that work for you?
What you’re doing with PERL should work on windows as well. Windows power shell is also capable of doing it if the users had decent scripting knowledge, but there really isn’t any reason why you should have to do anything more than install PERL on a windows box and run your scripts. If you were porting the scripts from Windows to Linux there might be a couple of required tweaks required.