I just thought I'd make a brief post about the "Datamining" so many people talk about. (Brief is a relative term. I'm a windbag, so it's pretty long.)

What most of us do with Poker Tracker is not Datamining.

I'm a word wonk, and I thought that some of you, being technical people, might want to know. . .

Collecting data on millions of hands played online is merely data collection. Datamining is not something you do to get data; it's something you do on your database after it has been collected.

But there's even more to it than that. If you want to find what the average VP$IP is, that's simple data analysis. If you know that players with higher VP$IP's than a certain number are weak and you seek them out, that's just using data. Perhaps you can call it Player Scouting; perhaps Hunting Donks. Whatever. It's not Data Mining.

All that is stuff we can and do use PokerTracker to do. Gotta love it. It's called OLAP -- On-Line Analytical Processing.

"Looking for leaks" in your game gets closer to datamining, but it's still not quite there. . . when you look for leaks, you know what to look for. You're not sure which leaks you have, but you go looking for known things. If you know a "leak" is a loss of money, and you look for losses of money, you are doing Analytical Processing.

Datamining is something very different and specific.

So what is datamining, already?

Datamining is the practice of using (advanced!) tools and techniques to find relationships within the data which you are NOT aware of. For example, datamining might show that people who display a particular behavior have a high likelihood of exhibiting a specific play weakness that isn't directly related to the original behavior.

NOTE: MADE-UP EXAMPLE FOLLOWS. DO NOT RELY ON THIS.


To fabricate an example, datamining might show that players who have a particular PFR -- say, between 8 and 10 % -- tend to have a very high likelyhood of trying to steal the pot on a river scare-card. Therefore, against such players, when a scare card hits the river which makes your hand, you might lean towards a check-raise. Because this behavior is identified with PFR, a pre-flop statistic, you can get a good read on it in 50 or 100 hands -- long before your river-based numbers are statistically meaningful.

What's obvious about this example is that the two behaviors -- PFR and river-steal -- are related in that they are both examples of "aggression." What's not obvious is why people with a higher PFR than 10% don't exhibit the steal tendency, nor why people with a PFR of 7% don't do it either. It may be a psychological prediliction. Or maybe there's a popular book author who espouses playing a 9% PFR strategy, and who advocates always stealing on river scare cards. It would be nearly impossible to guess exactly why without more analysis and information. The point is that it was not something we were aware of when we set out datamining; we found a new relationship.


AGAIN: THIS WAS A MADE-UP EXAMPLE. IT IS NOT ACTUALLY A RESULT OF DATAMINING.

In datamining, you can sometimes find very strong relationships which really challenge our ability to understand WHY the relationship exists. When datamining uncovers these, scientists often debate endlessly as to WHY the phenomenon occurs, and many end up altering their models to account for the strange result.

But when you go into a database look up known stats with known meanings, you are not mining anything. . . and if you just leave Party Poker up overnight so you can Data Collect and you choose to call it Data Mining, well, that makes about as much sense as expecting garbage collectors to have degrees in Sanitation Engineering.

For a really detailed discussion of Data mining, see the Wikipedia page.

Focus point for data mining on PostgresSQL: Data Mining, Web Mining and Knowledge Discovery Resources

Real Data Mining for Poker

PokerManager is currently the only poker software tool with Real Data Mining capabilities.

DataMining (last edited 2006-02-14 01:31:58 by Ben Ziegler)