The aim of this paper is to relate the random forest algorithm to both natural and artificial Intelligence. The success of Random Forests is noteworthy and understanding/framing random forests as a form of intelligence will help further the progress of the study of intelligence across disciplines. As part of this effort recent related work is reviewed and a suggestion for a hybrid artificial intelligence system is proposed.
Random Forests, evolutionary decentralized intelligence, sense-making, rule sets
Dhruv Sharma is an independent scholar in the field of Artificial Intelligence,
risk management, organization systems, and systems engineering. Dhruv is
member of AAAI and has 7+ years working on rule based expert
systems/knowledgebase engineering applied to financial underwriting. Mr.
Sharma holds a Master’s Systems engineering from the
In this paper we will discuss the Random Forests technique which is a powerful prediction and classification tool. Prediction is central to human and artificial intelligence (Hawkins, 1986). The success of the random forests technique can be attributed to the fact that the technique is a form of distributed intelligence. By linking random forests to artificial intelligence it becomes clear that future work directed at integrating the technique with other Artificial intelligence research through hybrid systems will be fruitful. An example of such an extension would be to combine Random forests with reinforcement learning and first order logic to create a hybrid best of breed artificial intelligence solution.
2.Context is King: How does human intelligence work?
Knowledge discovery as a product of
intelligence occurs in the context of some domain or situated-ness (
3.Abstracting the progress of human knowledge discovery
Humans reason by positing concepts and evaluate their usefulness and build more concepts on top of them and attempt to formulate relationships among the newly constructed concepts. We make up stories and context to support successful concepts/relationships as suggested by Karl Weick and Roger Schank
Given this it would appear that an approach to knowledge discovery might be just generating random hypotheses, testing them and keeping the fit ones (Arthur, 1994). This is an approach advocated by users of Genetic Algorithms, which generate rule sets (Beling, 1998). An added benefit of such an approach is that is can serve as a poor man's zwicky's morphological box, by trying combinations we would not consider due to bias (Gibson, 2000). This approach results in stochastic search for a large search space which is a powerful technique (Koza).
Still a pure stochastic search is different from how humans work. Humans use intelligence in search which is defined by Pei Wang as `the ability for an information processing system to adapt to its environment with insufficient knowledge and resources' (Wang). Wang's definition of intelligence makes intuitive sense. The reason for this is that as humans we make decisions on small samples we are faced with and randomly choose attributes to pay attention to and build mental models to act. This taken in connection with lots of people or agents like in a stock market can lead to a powerful intelligence and predictive capability like that of Swarm intelligence or decentralized markets.
To incorporate the criteria of
insufficient knowledge and resources it would make sense to only use random
sample of the data to generate hypotheses. This a characteristic of the
4.Random Forests of Intelligence
Humans have limited resources in terms of what we can pay attention to and what we can compute. This limitedness or constraint forces us to generalize and learn from the opportunity samples we find in practice. Even though we may think that by using logic we pick out attributes for theories, which may appear sound, our approach may be equivalent to picking random attributes and trying them out. In cases where there are a few variables we may be able to discover meaningful relationships but in areas of hundreds of variables where we may not have much insight our choices are not more than random guesses. This is analogous to the stock market problem where choosing 10 random stocks can beat actively managed strategies, which try very hard to find profitable patterns in the markets.
I think the success of the Random Forests approach is that it taps into important aspects of intelligence: making use of scarce resources and making the best use of decentralized learners.
These 2 aspects along with an accurate test measure of using out of the bag estimates are the secret of Random Forests (Breiman, 2001). The elegant aspect of this approach is that it works by applying and combining statistical knowledge on itself, in the selection procedure for data and model variables. This ability to self-adapt or using recursion is something of value as pointed out by Solmonoff (Solmonoff, 2006).
Random forests are like Swarm intelligence or stock markets in that most agents in "real life" have limited resources, attention and make inference based on a subset of data and may build different models of prediction which may be arbitrarily formed. Random forests work by sampling subsets of data and building tree models of prediction using a random section of data. The forest is the collective prediction of all trees.
5. Making Sense out of Forests
A robust system analysis requires building the proper context and examination of models. Stochastic search when applied throughout the analysis process can be beneficial as in random forests and any other procedures in the sprit of decentralized intelligence. Despite this where the rubber meets the road is the integration of such resulting rules/knowledge into a sense-making unit. Otherwise useful patterns could be lost due to one's inability to make sense out of them. As Weick points out from his research on sensemaking, humans make a trade off of accuracy and plausibility and favor plausibility to actual accuracy in daily situations (Weick, 1995). One important implication of sensemaking research is that even if we create intelligence, like ‘alien intelligence, which surpasses human intelligence we may not be able to use it if we cannot make sense of it (Martin, 2000)
This is the downside of limited resources in human intelligence, in that we focus in one area of patterns over time and miss other areas. Over time new researchers or paradigms lead, by the cases that don't fit our models and rules, get attention and then we focus on them and get different theories. This is slow process and one of fitting the data similar to Adaboost, where we learn what the model misses and improve to the training set. This approach does have benefits but is too evolutionary. Even complexity theory has shown that evolution alone cannot explain true advancement, which seems to occur in spurts on the border of chaos/order (Lewin, 2000). This border is one where lots of different solutions are tried at once and the net result is saved.
6. Lesson for Data Miners
In short, using automated approaches to generate and evaluate hypotheses is important and equally important is framing the problem and being ready to examine resulting patterns for insight. Hypotheses generation, search and testing in an appropriate context, is critical to discovering meaningful knowledge. Too often data mining experts discover things already known due to the interestingness paradox (Padmanabhan, 2004). This should be done as part of contextual analysis and problem definition. Given this though it is up to the researcher to ensure each strong hypothesis is documented and given consideration.
7.Directions for future research:
Random Forests are a form of artificial intelligence similar to Swarm intelligence. Random forests work on data and have been studied in machine learning and knowledge discovery and data mining. To enhance the power of random forests into mainstream AI An important area of research will be integrating random forest intelligence with reinforcement learning. This is important since data may not be as useful as the ability to trying things and look for counterfactuals and patterns not in the data (Dhar, 1998). Integrating knowledge and action learning should provide a holistic approach to intelligence by maximizing the value of data and knowledge and also continuing learning by testing theories in the real world if possible and feeding the results back into the intelligence process. Random forests are an example of what Bundy calls a “cooperating reasoning processes” that “complement each other” (Bundy).
8.Specific Areas of future AI research:
To make the most effective use of Random forests in the context of artificial intelligence research requires exploring various touchpoints of existing artificial intelligence work. Some promising areas deserving greater attention from the artificial intelligence community are as follows:
1) Converting random forests into equivalent rule sets (The work of Assche, 2007 on first order trees is promising and can be extended by taking importance variables from random forests and combine the resulting variables into first order rules. In addition the work of Gopalan, Nortest and Pappa on Genetically Engineered rule sets along with recent promising work of Brown, Dala, Saifrafi, Kell in this space, comparing rule based systems and random forests is interesting in which rule based systems out perform rules generated by random forests.) Also Friedman(2005) and Dembczyński(2008) have interesting related work on generating rule sets from ensemble methods like random forests.
2) Comparing genetic algorithm induced rule set performance to random forests while building rulesets on sample data and sampled variables.
3) Linking reinforcement learning into model results to evolve models over time (Without actual tests of theories they cannot be proven true; also time and stationarity is important in real world systems as well as the ability to act and get feedback on actions).
4) Work on unifying SWARM intelligence, random forests, ant colony optimization, distributed evolutionary stochastic search and GA engineered rule sets
5) Another enhancement to this technique would be to somehow integrate Meta-knowledge or reasoning about the domain in modeling features to be predicted and known patterns. This is important, as humans need plausible theories or recommendations to follow. Patterns that exist which have not been accepted by humans from a rational causal chain are hard to implement in real world systems (Weick, 1995). This aspect of problem is especially important as Bundy points out the “representation of problems is often the key to its solution and problem representation can be automatically both formed and repaired” (Bundy).
The combination of random forests, reinforcement learning, and some explicit meta-knowledge together will result in a best of breed intelligence.
Arthur, W.B. (1994). “Inductive Reasoning and Bounded Rationality (The El-Farol Problem)”. Amer. Econ. Review. 84,406
Assche, A.V. & Blockeel, H. (2007).
Bundy, A. Cooperating Reasoning Processes: “More than Just a
Sum of their Parts, Constructing, Selecting and Repairing Representation of
Breiman, L. (2001). “Random Forests.” Machine Learning. 45, 5-32. ; retrieved from http://www.samsi.info/200304/dmml/kickoffpresentations/breiman-2.pdf www.stat.berkeley.edu/users/breiman/
Beling, P. & Mullei, S. (1998) “Induction of Rule-Based Scoring Functions”. IEEE. p2968.
Brown, D., Dala,
S., Alwan, M., Saifrafi,
R., Kell, Steve. “A Rule-Based Approach to
Analysis of Elder’s Activity Data: Detection of Health and Possible Emergency
Clark, A. (1998). “Being There: Putting
Brain, Body, and World Together Again”. MIT Press.
Dembczyński, K., Kotłowski, W., Slowinski, R. (2008). “Maximum Likelihood Rule Ensembles”. ACM Intl. Conference Series; Vol. 307. 224-231
Dhar, V. (1998). “Data Mining in Finance: Using Counterfactuals to Generate Knowledge from Organizational Information Systems”. Information Systems. Vol 23. No. 7
Friedman, J. and Popescu,
B.E. (2005) “Predictive Learning via Rule Ensembles”. Tech
Alhajj, R. & Barter, K. “Discovering Accurate and
Interesting Classification Rules Using Genetic Algorithm”.University
Gibson, J. (2000). “How to Conduct a
Hawkins, J. (1986) “An Investigation of Adaptive Behavior Towards a Theory of Neocortical Function”. Retrieved from http://www.onintelligence.org/resources/Hawkins1986.pdf
Koza, J.R., Keane,M. & Streeter, M. (2003). “What’s AI done for me lately? Genetic programming’s human-competitive results”. IEEE. Retrieved from http://www.genetic-programming.com/jkpdf/ieee2003intelligent.pdf
Lewin, R. (2000). “Complexity:
Life at the Edge of Chaos”.
Martin, J. (2000) “After the Internet: Alien Intelligence”. Capital Press
Nortet, C., Vrain,C. & Salleb-Aouissi. (2007) “QuantMiner: A Genetic Algorithm for Mining Quantitative Association Rules”. IJCAI 1035-1040
Padmanabhan, B. (2004). “The Interestingness Paradox in Pattern Discovery”. Journal of Applied Statistics. Vol 31. No. 8 1019-1035.
Pappa, G.L., Freetas,
A.A. “Automatically Evolving Rule Induction Algorithms”. Computing
Solmonoff, R. (2006). “Machine Learning-Past and Future”. AI@50, The
Wang, P. “On the Working Definition of Intelligence.” http://www.cogsci.indiana.edu/pub/wang.intelligence.ps
Weick,K. (1995). “Sensemaking in Organizations”. Sage Publications.