Knowledge Representation. Handbook of knowledge representation. Handbook of Knowledge Representation.
- Citat per år?
- Abstraction and Knowledge Transfer in Reinforcement Learning--德国Springer公司图书数据库.
- Circuit Design for Reliability.
- Qualitative Representation of Spatial Knowledge.
- Submission history?
Spatial Representation and Motion Planning. Representation and Processing of Spatial Expressions. Knowledge representation and reasoning. Logic-Based Knowledge Representation. Knowledge Representation and Reasoning. Qualitative Spatial Abstraction in Reinforcement Learning. Qualitative Spatial Reasoning with Topological Information. Interviewing and Representation in Qualitative Research Projects. Foundations of Knowledge Representation and Reasoning.
Knowledge Representation and Reasoning Under Uncertainty. A Negative of the value gradient learned by the conventional agent in the open arena.
After learning, the agent tends to follow this gradient toward the goal from any other state. B The learned value gradient overlaid with the new maze boundaries, and state occupancy density for trajectories taken by hierarchical and non-hierarchical algorithms during their first four trials in this new environments.
Brighter red shading indicates states that were visited with higher relative frequency. The non-hierarchical algorithm is prone to being trapped in areas where the previously learned action values drive it into new boundaries. The top-down propagation of goal information provides a means of flexibly and efficiently attaining a learned goal when obstacles are introduced. But if the goal itself changes—for instance, if the agent is hungry rather than thirsty, or if the task rules change dramatically—an entirely different model or policy may be necessary.
Animals can rapidly switch between learned spatial tasks, and hippocampal damage impairs this flexibility McDonald et al. The implementation of MBRL in our framework supports context-dependent policies. In the hippocampus, place cells undergo global remapping when an environment is switched Jezek et al. We propose here that this supports learning of a new model of the world and new policy that does not interfere with previous models. This allows rapid context-driven shifts between world models and policies. We tested our framework's ability to cope with changing goals through context switching by introducing a new task in which the reward was placed randomly in each trial at one of four locations in an environment measuring 7 by 7 states.
We again tested both the new framework and a conventional MBRL algorithm, recording the number of steps needed to reach the goal in each trial. Here the step count was capped at steps. A failure to acquire reward in a given location persists in the agent's model so that re-exploration of the site is inhibited. In contrast, the new framework's context-switching mechanism allowed it to learn the various reward sites as separate contexts.
When the agent arrived at a previously-rewarding location and found no reward, it switched to a context in which the reward is expected elsewhere. Thus, it learned to systematically navigate to each reward site until the goal was found. Comparative performance on a task with probabilistic reward locations. A The simulated environment. Agents began each trial at the black circle. In each trial, a reward was placed randomly at one of the four numbered locations. B The mean number of steps needed to reach the goal in each trial. The conventional MBRL algorithm failed to learn the task and often did not locate the reward within the maximum number of steps allowed per trial C Value gradients in the four contexts learned by the hierarchical framework.
Conventional methods of adapting to environmental change in computational RL include periodically re-exploring previously unrewarding actions Sutton, ; Sutton and Barto, and tracking , in which the agent weights recent experiences more heavily than past ones Sutton and Barto, and thus tracks the changing solution to a task rather than converging on an optimal solution.
Admittedly, these are much simpler methods of adaptation than our hierarchical or context-switching schemes, and in many cases would likely perform as well. However, the hierarchical approach provides an important additional advantage: it can solve large problems more easily than conventional MBRL when the capacity for model updating is finite, because even a very large navigation problem becomes relatively simple at a high level of abstraction.
Thus, the hierarchical system may scale to large environments in which finite model updates and excessive discounting over large distances would prevent conventional MBRL from learning useful action values. The hierarchical approach sometimes learns marginally sub-optimal solutions—a trade-off that often accompanies the use of hierarchical abstraction Hengst, However the use of hierarchical abstraction allows the agent to solve large navigation problems that the conventional MBRL algorithm cannot. The number of model access required by conventional MBRL appears to be polynomial with respect to the number of states in the maze, while the model accesses required by the hierarchical approach scale linearly.
Foraging and spatial navigation are centrally important for mammalian survival and depend on the hippocampus Hartley et al. Our model extends previous RL models by incorporating the spatial abstraction found in the mammalian hippocampus, the concept of forward sweeping search for route-finding, and the concept of context-driven remapping.
Where some literature focuses on strict biological accuracy, our implementation of these abstract features has instead focused on testing their computational properties in learning. However, the computational properties explored here generalize to a more biologically accurate setting. The concepts of hierarchical reinforcement learning and forward sweeping search have been explored individually in existing literature.
Hierarchical reinforcement learning has long been a topic of active research, though most algorithmic developments have focused on learning macro-actions, or learning to solve complex problems by first solving simpler sub-problems Barto and Mahadevan, ; Botvinick and Weinstein, The use of forward sweeps as part of a process of planning to reach goals has also been investigated Chersi and Pezzulo, ; Erdem and Hasselmo, ; Penny et al.
Qualitative Representation of Spatial Knowledge
Our framework provides a novel integration of all these features, yielding a scalable, flexible, and applicable learning framework. It explains animals' ability to learn multiple independent behaviors, adapt quickly to environmental changes, and solve large problems with low cognitive effort. These features of animal behavior cannot be explained or reproduced by conventional MBRL algorithms. Hierarchical abstraction can make a difficult learning task more tractable.
For example, learning to play chess would be extremely difficult if the game were seen as a sequence of individual moves and board positions. There are millions of possible positions after just a few moves, and considering all possibilities is unrealistic. Action selection then becomes a planning process constrained by the high-level goal. This interaction between high-level abstraction and low-level planning provides a means of solving complex problems. This ability of our framework partially explains the ability of animals to learn complex tasks, and leads to the prediction that animals with ventral hippocampal damage would be impaired in spatial learning tasks involving very large distances.
Hierarchy and abstraction simplify learning at the expense of the optimality of the learned behavior. The behavior is now merely hierarchically optimal: it is optimal behavior for the abstract task, but likely sub-optimal for the original task Hengst, Still, biological agents probably rely heavily on this simplicity-optimality trade-off to make learning complex real-world tasks tractable Botvinick and Weinstein, Indeed, rats have shown sub-optimal performance as compared to RL algorithms in several tasks Sul et al.
Our model predicts that higher levels of the hierarchy vH are more sensitive to reward outcomes than low levels dH. Indeed, encoding of reward information in the dH is weak, although some changes to the place fields themselves can occur if reward is consistently associated with one location Carr and Frank, Little is known about reward encoding in the vH itself, but a wealth of evidence indicates that its primary limbic and cortical targets express robust reward encoding Gruber et al. Indeed, prevailing theories posit that that neural mechanisms of RL primarily involve other brain structures such as the striatum and neocortex.
The striatum and its cortical inputs may also have a hierarchical organization in spatial and non-spatial dimensions Ito and Doya, such that the ventral striatum and its inputs from vH and prefrontal cortex represent a high level of abstraction, and the dorsolateral striatum and its inputs from sensory and motor cortices represent low levels Voorn et al.
Our model thus elegantly fits with proposals that the limbic systems including vH are involved in reward processing and goal-directed behavior, while sensorimotor systems are much less sensitive to rewards and implement sensory-response control Gruber and McDonald, This is because low-level control is sufficient to solve well-learned tasks, while upper levels are engaged when unexpected state transitions occur.
This is also consistent with the gradual shift of behavioral control from goal-directed to lower-level habitual control Balleine and O'Doherty, This has been proposed to be more computationally efficient Daw et al. Hierarchical abstraction may be an important part of general transfer learning : the ability demonstrated by animals yet still elusive in artificial intelligence to apply knowledge learned in one task to solve a different but similar task Thrun and Pratt, Moreover, the various forms of hierarchical abstraction may be interrelated as discussed above.
One notable non-spatial example comes from Botvinick and Weinstein , who have discussed hierarchical learning of actions, such as learning to perform a complex motor sequence like grasping an object as a single macro-action. The question of how an agent learns useful macro-actions remains. Our implementation of multiple context-dependent policies derives from data showing that the ventral hippocampus facilitates learning different policies in different environmental contexts. Specifically, if task demands on a spatial maze are switched, learning the new policy is faster if the hippocampus is intact and the learning takes place in a different room McDonald et al.
When the intact animal is returned to the original task, it selects between the two learned policies. Our framework posits the explanation that, without the hippocampus-based system for encoding distinct contexts and managing separate models, rats with hippocampal lesions are unable to preserve the first learned model for later retrieval. Neural activity in the hippocampus responds to minor environmental changes through modulating the activity of the place cells while preserving the spatial encoding rate remapping; Allen et al.
On the other hand, different spatial contexts are represented by a new set of place cell assignments global remapping , and the hippocampus can switch rapidly between learned contexts based on cues Jezek et al. This could be an important component of contextual control of policies in the brain, which is not present in our current framework.
Moreover, the expansion of modalities in the hierarchy beyond physical space Collin et al. Interestingly, place cell remapping does not occur uniformly along the septotemporal axis of the hippocampus. Changes to the environment or to the task being performed in the environment can induce remapping in the dorsal, but not ventral place fields Schmidt et al.
This contrasts with our context changing mechanism, which always creates an entirely new model at every level.
- Loving Monday: Succeeding in Business Without Selling Your Soul.
- Falling from Fire.
- Qualitative Spatial Abstraction in Reinforcement Learning | Lutz Frommberger | Springer?
- Situation Dependent Spatial Abstraction in Reinforcement Learning Based on Structural Knowledge!
- Global Change and Human Mobility.
- Association for Jewish studies 1997- 22(1).
- Lattice boltzmann machine learning?
The discrepancy suggests that the brain's mechanism for learning in multiple contexts is more efficient than the mechanism we have implemented here, and is able to transfer some high-level abstract information between contexts. This ability is probably possible in part because spatial representations and value information are stored separately in the hippocampus and striatum, rather than combined as in our abstract framework.
We speculate that our framework could be enhanced by the addition of function of other brain regions. In particular, the prefrontal cortex is strongly implicated in contextual control of behavior and cognitive flexibility Buckner and Carroll, ; Euston et al. It is very likely that cortex exerts control over the policy, and may do so even if the spatial representation is not globally remapped as we have implemented here.
Another avenue for future development may lie in the more comprehensive and biologically accurate concept of reward being pursued by Gutkin Keramati and Gutkin, , While conventional reward-maximizing RL algorithms are based on dopamine-driven learning mechanisms Doya, Gutkin proposes an analytical framework in which the objectives of reward maximization and physiological homeostasis coincide, providing a broader view of learning and adaptive behavior.
Integration of these ideas with our hierarchical abstraction scheme seems logical and promising. While there is much opportunity to expand the computational framework, the present form proposes an interesting relationship between hippocampal place cells and model-based learning mechanisms. The hippocampus' hierarchical representation of space can support a computationally efficient style of learning and adaptation that may not be possible otherwise.
Computational work, including algorithm development and implementation, was performed by EC. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. National Center for Biotechnology Information , U. Journal List Front Comput Neurosci v. Front Comput Neurosci. Published online Dec Author information Article notes Copyright and License information Disclaimer.
Machine Learning in Planning and Control of Robot Motion
Received Aug 18; Accepted Nov The use, distribution or reproduction in other forums is permitted, provided the original author s or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
This article has been cited by other articles in PMC. Keywords: reinforcement learning, hierarchical learning, hippocampus, planning, context. Introduction Reinforcement Learning RL provides a computational account of how an agent can learn appropriate behavior by interacting with its environment, discovering through experience what actions lead to rewards or punishments, and how to maximize the sum of future rewards Doya, Open in a separate window.
Figure 1. Methods Computational model of place-cell-supported reinforcement learning In this section we propose a hierarchical learning system in which a task is represented at multiple levels of abstraction, ranging from very detailed to very general. Figure 2. Hierarchical planning architecture Our hierarchical framework first selects a goal by identifying the maximum action value available at any level of abstraction. Figure 3. Context switching algorithm Switching between different contexts was achieved by comparing a memory of the agent's recent experiences with each of several stored models.
Testing Simulating hippocampal lesions To test the face validity of our framework as a model of rodent navigation control, we evaluated whether dysfunction of select levels of the agent's spatial hierarchy reproduced behavioral impairments of animals after localized hippocampal damage.
Testing adaption to added boundaries We next sought to test if the hierarchical organization would be advantageous for adapting to sudden changes in the environment. Scaling to large problems through hierarchical abstraction Problems involving many states can be problematic for conventional RL algorithms, because excessive discounting causes reward information to be lost over large distances.
Recommended for you
Results Simulated hippocampal lesions mimic effects of physical hippocampal lesions Our first objective was to test the face validity of the hierarchical MBRL schema with respect to the navigation properties of mammals. Figure 4. Efficient spatial navigation and adaptation We next investigated whether the spatial abstraction represented in the hierarchy would facilitate adaptation to the sudden addition of obstacles in the environment. Figure 5.
Figure 6. Adapting to changing goals through context switching The top-down propagation of goal information provides a means of flexibly and efficiently attaining a learned goal when obstacles are introduced. Figure 7. Scaling to large problems Conventional methods of adapting to environmental change in computational RL include periodically re-exploring previously unrewarding actions Sutton, ; Sutton and Barto, and tracking , in which the agent weights recent experiences more heavily than past ones Sutton and Barto, and thus tracks the changing solution to a task rather than converging on an optimal solution.
Furthermore, acquired knowledge specific to the learned task, and transfer of knowledge to new tasks is crucial. In this book the author investigates whether deficiencies of reinforcement learning can be overcome by suitable abstraction methods. He discusses various forms of spatial abstraction, in particular qualitative abstraction, a form of representing knowledge that has been thoroughly investigated and successfully applied in spatial cognition research.