The Deluge
You probably won't be able to find last weeks (27.2.10 - 5.3.10) edition of The Economist in print, but you can still get the digital version of all articles featured in the special report of managing information at their website. The articles are all really well thought out and clearly show how information management should be a topic on almost everyone's mind. I mean this is way beyond issues of privacy, and moves into what is really so interesting about all that information. Like did you know the Royal Shakespeare Company used data mining to increase subscription rates?
For instance, in the internet companies article, they highlight how Google has developed the most advanced spell-check (which Microsoft had previously spend countless millions) by analyzing misspelled search queries and the links which people follow. Also voice recognition is not about trying to understand what the person is saying but having a ginormous database which allows an algorithm to determine the statistical probability of what you just said. In other words:
WOW!
Dust off your R manuals, people.
I see no reason why smaller companies with proprietary information (info about subscribers or visitors to film festivals, teatres. Customers at car garages, etc., etc.) can do the same thing. The thing is once you have enough information you're statistical power is already quite good. You don't need a tetrabyte-sized database. It depends on what you want to extract from that information and the know-how to do it.
My completely unreasonable statistics book recommendation is the completely unreasonably priced Fundamental of Biostatistics:
Apparently there's a new edition due this year, which will also be insanely expensive. But other than the price, this book has just about the clearest explanation of fundamental statistical problems I've ever read. By which I mean to say it's actually enjoyable to read. Don't mind that it's biostatistics, you can use the information in any variety of situations.
Get if from your library and start analyzing your own data, kids!
For instance, in the internet companies article, they highlight how Google has developed the most advanced spell-check (which Microsoft had previously spend countless millions) by analyzing misspelled search queries and the links which people follow. Also voice recognition is not about trying to understand what the person is saying but having a ginormous database which allows an algorithm to determine the statistical probability of what you just said. In other words:
"'Understanding' turns out to be overrated, and statistical analysis goes a lot of the way."
WOW!
Dust off your R manuals, people.
I see no reason why smaller companies with proprietary information (info about subscribers or visitors to film festivals, teatres. Customers at car garages, etc., etc.) can do the same thing. The thing is once you have enough information you're statistical power is already quite good. You don't need a tetrabyte-sized database. It depends on what you want to extract from that information and the know-how to do it.
My completely unreasonable statistics book recommendation is the completely unreasonably priced Fundamental of Biostatistics:
Apparently there's a new edition due this year, which will also be insanely expensive. But other than the price, this book has just about the clearest explanation of fundamental statistical problems I've ever read. By which I mean to say it's actually enjoyable to read. Don't mind that it's biostatistics, you can use the information in any variety of situations.
Get if from your library and start analyzing your own data, kids!