We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Large-scale crises, including wars and pandemics, have repeatedly shaped human history, and their simultaneous occurrence presents profound challenges to societies. Understanding the dynamics of epidemic spread during warfare is essential for developing effective containment strategies in complex conflict zones. While research has explored epidemic models in various settings, the impact of warfare on epidemic dynamics remains underexplored.
Methods
We proposed a novel mathematical model that integrates the epidemiological SIR (susceptible-infected-recovered) model with the war dynamics Lanchester model to explore the dual influence of war and pandemic on a population’s mortality. Moreover, we consider a dual-use military and civil health care system that aims to reduce the overall mortality rate, which can use different administration policies such as prioritizing soldiers over civilians. Using an agent-based simulation to generate in silico data, we trained a deep reinforcement learning model based on the deep Q-network algorithm for health care administration policy and conducted an intensive investigation on its performance.
Results
Our results show that a pandemic during war conduces chaotic dynamics where the health care system should either prioritize war-injured soldiers or pandemic-infected civilians based on the immediate amount of mortality from each option, ignoring long-term objectives.
Conclusions
Our findings highlight the importance of integrating conflict-related factors into epidemic modeling to enhance preparedness and response strategies in conflict-affected areas.
Artificial intelligence is dramatically reshaping scientific research and is coming to play an essential role in scientific and technological development by enhancing and accelerating discovery across multiple fields. This book dives into the interplay between artificial intelligence and the quantum sciences; the outcome of a collaborative effort from world-leading experts. After presenting the key concepts and foundations of machine learning, a subfield of artificial intelligence, its applications in quantum chemistry and physics are presented in an accessible way, enabling readers to engage with emerging literature on machine learning in science. By examining its state-of-the-art applications, readers will discover how machine learning is being applied within their own field and appreciate its broader impact on science and technology. This book is accessible to undergraduates and more advanced readers from physics, chemistry, engineering, and computer science. Online resources include Jupyter notebooks to expand and develop upon key topics introduced in the book.
In this chapter, we introduce the field of reinforcement learning and some of its most prominent applications in quantum physics and computing. First, we provide an intuitive description of the main concepts, which we then formalize mathematically. We introduce some of the most widely used reinforcement learning algorithms. Starting with temporal-difference algorithms and Q-learning, followed by policy gradient methods and REINFORCE, and the interplay of both approaches in actor-critic algorithms. Furthermore, we introduce the projective simulation algorithm, which deviates from the aforementioned prototypical approaches and has multiple applications in the field of physics. Then, we showcase some prominent reinforcement learning applications, featuring some examples in games; quantum feedback control; quantum computing, error correction and information; and the design of quantum experiments. Finally, we discuss some potential applications and limitations of reinforcement learning in the field of quantum physics.
In this chapter, we introduce the reader to basic concepts in machine learning. We start by defining the artificial intelligence, machine learning, and deep learning. We give a historical viewpoint on the field, also from the perspective of statistical physics. Then, we give a very basic introduction to different tasks that are amenable for machine learning such as regression or classification and explain various types of learning. We end the chapter by explaining how to read the book and how chapters depend on each other.
This research proposes an adaptive human-robot interaction (HRI) that combines voice recognition, emotional context detection, decision-making, and self-learning. The aim is to overcome challenges in dynamic and noisy environments while achieving real-time and scalable performance. The architecture is based on a three-stage HRI system: voice input acquisition, feature extraction, and adaptive decision-making. For voice recognition, modern pre-processing techniques and mel-frequency cepstral coefficients are used to robustly implement the commands. Emotional context detection is governed by neural network classification on pitch, energy, and jitter features. Decision-making uses reinforcement learning where actions are taken and then the user is prompted to provide feedback that serves as a basis for re-evaluation. Iterative self-learning mechanisms are included, thereby increasing the adaptability as stored patterns and policies are updated dynamically. The experimental results show substantial improvements in recognition accuracy along with task success rates and emotional detection. The proposed system achieved 95% accuracy and a task success rate of 96%, even against challenging noise conditions. It is apparent that emotional detection achieves a high F1-score of 92%. Real-world validation showed the system’s ability to dynamically adapt, thus mitigating 15% latency through self-learning. The proposed system has potential applications in assistive robotics, interactive learning systems, and smart environments, addressing scalability and adaptability for real-world deployment. Novel contributions to adaptive HRI arise from the integration of voice recognition, emotional context detection, and self-learning mechanisms. The findings act as a bridge between the theoretical advancements and the practical utility of further system improvements in human-robot collaboration.
Safety is an essential requirement as well as a major bottleneck for legged robots in the real world. Particularly for learning-based methods, their trial-and-error nature and unexplainable policy have raised widespread concerns. Existing methods usually treat this challenge as a trade-off between safety assurance and task performance. One reason for this drawback stems from the inaccurate inference for the robot’s safety. In this paper, we re-examine the segmentation of the robot’s state space in terms of safety. According to the current state and the prediction of the state transition trajectory, the states of legged robots are classified into safe, recoverable, unsafe, and failure, and a safety verification method is introduced to online infer the robot’s safety. Then, task, recovery, and fall protection policies are trained to ensure the robot’s safety in different states, forming a safety supervision framework independently from the learning algorithm. To validate the proposed method and framework, experiment results are conducted both in the simulation and on the real-world robot, indicating improvements in terms of safety and efficiency.
Demands in the Ultimatum Game in its traditional form with one proposer and one responder are compared with demands in an Ultimatum Game with responder competition. In this modified form one proposer faces three responders who can accept or reject the split of the pie. Initial demands in both ultimatum games are quite similar, however in the course of the experiment, demands in the ultimatum game with responder competition are significantly higher than in the traditional case with repeated random matching. Individual round-to-round changes of choices that are consistent with directional learning are the driving forces behind the differences between the two learning curves and cannot be tracked by an adjustment process in response to accumulated reinforcements. The importance of combining reinforcement and directional learning is addressed. Moreover, learning transfer between the two ultimatum games is analyzed.
We study the interaction of the effects of the strategic environment and communication on the observed levels of cooperation in two-person finitely repeated games with a Pareto-inefficient Nash equilibrium and replicate previous findings that point to higher levels of tacit cooperation under strategic complementarity than under strategic substitutability. We find that this is not because of differences in the levels of reciprocity as previously suggested. Instead, we demonstrate that slow learning coupled with noisy choices may drive this effect. When subjects are allowed to communicate in free-form online chats before making choices, cooperation levels increase significantly to the extent that the difference between strategic complements and substitutes disappears. A machine-assisted natural language processing approach then shows how the content of communication is dependent on the strategic environment and cooperative behavior, and indicates that subjects in complementarity games reach full cooperation by agreeing on gradual moves toward it.
Altered reinforcement learning (RL) and decision-making have been implicated in the pathophysiology of anorexia nervosa. To determine whether deficits observed in symptomatic anorexia nervosa are also present in remission, we investigated RL in women remitted from anorexia nervosa (rAN).
Methods:
Participants performed a probabilistic associative learning task that involved learning from rewarding or punishing outcomes across consecutive sets of stimuli to examine generalization of learning to new stimuli over extended task exposure. We fit a hybrid RL and drift diffusion model of associative learning to model learning and decision-making processes in 24 rAN and 20 female community controls (cCN).
Results:
rAN showed better learning from negative outcomes than cCN and this was greater over extended task exposure (p < .001, ηp2 = .30). rAN demonstrated a reduction in accuracy of optimal choices (p = .007, ηp2 = .16) and rate of information extraction on reward trials from set 1 to set 2 (p = .012, ηp2 = .14), and a larger reduction of response threshold separation from set 1 to set 2 than cCN (p = .036, ηp2 = .10).
Conclusions:
rAN extracted less information from rewarding stimuli and their learning became increasingly sensitive to negative outcomes over learning trials. This suggests rAN shifted attention to learning from negative feedback while slowing down extraction of information from rewarding stimuli. Better learning from negative over positive feedback in rAN might reflect a marker of recovery.
We begin with the theoretical and empirical foundations of happiness economics, in which the aim of economic policy is to maximize self-reported happiness of people in society. We also discuss the economic correlates of self-reported happiness. We outline some of the key insights from the literature on behavioral industrial organization, such as phishing for phools and the effects of limited attention on the pricing decisions of firms. When products have several attributes, we explain how some might be more salient than others. We also explain the effects of limited attention on economic outcomes. We introduce the basics of complexity economics. Here, people use simple rules of thumb and simple adaptive learning models in the presence of true uncertainty. We show that the aggregate systemwide outcomes are complex, characterized by chaotic dynamics, and the formation of emergent phenomena. The observed fluctuations in the system arise endogenously, rather than from stochastic exogenous shocks. We introduce two kinds of learning models – reinforcement learning and beliefs-based learning. Finally, we critically evaluate the literature on competitive double auction experiments.
This study introduces an advanced reinforcement learning (RL)-based control strategy for heating, ventilation, and air conditioning (HVAC) systems, employing a soft actor-critic agent with a customized reward mechanism. This strategy integrates time-varying outdoor temperature-dependent weighting factors to dynamically balance thermal comfort and energy efficiency. Our methodology has undergone rigorous evaluation across two distinct test cases within the building optimization testing (BOPTEST) framework, an open-source virtual simulator equipped with standardized key performance indicators (KPIs) for performance assessment. Each test case is strategically selected to represent distinct building typologies, climatic conditions, and HVAC system complexities, ensuring a thorough evaluation of our method across diverse settings. The first test case is a heating-focused scenario in a residential setting. Here, we directly compare our method against four advanced control strategies: an optimized rule-based controller inherently provided by BOPTEST, two sophisticated RL-based strategies leveraging BOPTEST’s KPIs as reward references, and a model predictive control (MPC)-based approach specifically tailored for the test case. Our results indicate that our approach outperforms the rule-based and other RL-based strategies and achieves outcomes comparable to the MPC-based controller. The second scenario, a cooling-dominated environment in an office setting, further validates the versatility of our strategy under varying conditions. The consistent performance of our strategy across both scenarios underscores its potential as a robust tool for smart building management, adaptable to both residential and office environments under different climatic challenges.
Expert drivers possess the ability to execute high sideslip angle maneuvers, commonly known as drifting, during racing to navigate sharp corners and execute rapid turns. However, existing model-based controllers encounter challenges in handling the highly nonlinear dynamics associated with drifting along general paths. While reinforcement learning-based methods alleviate the reliance on explicit vehicle models, training a policy directly for autonomous drifting remains difficult due to multiple objectives. In this paper, we propose a control framework for autonomous drifting in the general case, based on curriculum reinforcement learning. The framework empowers the vehicle to follow paths with varying curvature at high speeds, while executing drifting maneuvers during sharp corners. Specifically, we consider the vehicle’s dynamics to decompose the overall task and employ curriculum learning to break down the training process into three stages of increasing complexity. Additionally, to enhance the generalization ability of the learned policies, we introduce randomization into sensor observation noise, actuator action noise, and physical parameters. The proposed framework is validated using the CARLA simulator, encompassing various vehicle types and parameters. Experimental results demonstrate the effectiveness and efficiency of our framework in achieving autonomous drifting along general paths. The code is available at https://github.com/BIT-KaiYu/drifting.
In this note we provide an upper bound for the difference between the value function of a distributionally robust Markov decision problem and the value function of a non-robust Markov decision problem, where the ambiguity set of probability kernels of the distributionally robust Markov decision process is described by a Wasserstein ball around some reference kernel whereas the non-robust Markov decision process behaves according to a fixed probability kernel contained in the ambiguity set. Our derived upper bound for the difference between the value functions is dimension-free and depends linearly on the radius of the Wasserstein ball.
The flexible flat cable (FFC) assembly task is a prime challenge in electronic manufacturing. Its characteristics of being prone to deformation under external force, tiny assembly tolerance, and fragility impede the application of robotic assembly in this field. To achieve reliable and stable robotic automation assembly of FFC, an efficient assembly skill acquisition strategy is presented by combining a parallel robot skill learning algorithm with adaptive impedance control. The parallel robot skill learning algorithm is proposed to enhance the efficiency of FFC assembly skill acquisition, which reduces the risk of damaging FFC and tackles the uncertain influence resulting from deformation during the assembly process. Moreover, FFC assembly is also a complex contact-rich manipulation task. An adaptive impedance controller is designed to implement force tracking during the assembly process without precise environment information, and the stability is also analyzed based on the Lyapunov function. Experiments of FFC assembly are conducted to illustrate the efficiency of the proposed method. The experimental results demonstrate that the proposed method is robust and efficient.
One in eight children experience early life stress (ELS), which increases risk for psychopathology. ELS, particularly neglect, has been associated with reduced responsivity to reward. However, little work has investigated the computational specifics of this disrupted reward response – particularly with respect to the neural response to Reward Prediction Errors (RPE) – a critical signal for successful instrumental learning – and the extent to which they are augmented to novel stimuli. The goal of the current study was to investigate the associations of abuse and neglect, and neural representation of RPE to novel and non-novel stimuli.
Methods
One hundred and seventy-eight participants (aged 10–18, M = 14.9, s.d. = 2.38) engaged in the Novelty task while undergoing functional magnetic resonance imaging. In this task, participants learn to choose novel or non-novel stimuli to win monetary rewards varying from $0 to $0.30 per trial. Levels of abuse and neglect were measured using the Childhood Trauma Questionnaire.
Results
Adolescents exposed to high levels of neglect showed reduced RPE-modulated blood oxygenation level dependent response within medial and lateral frontal cortices particularly when exploring novel stimuli (p < 0.05, corrected for multiple comparisons) relative to adolescents exposed to lower levels of neglect.
Conclusions
These data expand on previous work by indicating that neglect, but not abuse, is associated with impairments in neural RPE representation within medial and lateral frontal cortices. However, there was no association between neglect and behavioral impairments on the Novelty task, suggesting that these neural differences do not necessarily translate into behavioral differences within the context of the Novelty task.
This study proposes a novel hybrid learning approach for developing a visual path-following algorithm for industrial robots. The process involves three steps: data collection from a simulation environment, network training, and testing on a real robot. The actor network is trained using supervised learning for 500 epochs. A semitrained network is then obtained at the $250^{th}$ epoch. This network is further trained for another 250 epochs using reinforcement learning methods within the simulation environment. Networks trained with supervised learning (500 epochs) and the proposed hybrid learning method (250 epochs each of supervised and reinforcement learning) are compared. The hybrid learning approach achieves a significantly lower average error (30.9 mm) compared with supervised learning (39.3 mm) on real-world images. Additionally, the hybrid approach exhibits faster processing times (31.7 s) compared with supervised learning (35.0 s). The proposed method is implemented on a KUKA Agilus KR6 R900 six-axis robot, demonstrating its effectiveness. Furthermore, the hybrid approach reduces the total power consumption of the robot’s motors compared with the supervised learning method. These results suggest that the hybrid learning approach offers a more effective and efficient solution for visual path following in industrial robots compared with traditional supervised learning.
Selective serotonin reuptake inhibitors (SSRIs) are first-line pharmacological treatments for depression and anxiety. However, little is known about how pharmacological action is related to cognitive and affective processes. Here, we examine whether specific reinforcement learning processes mediate the treatment effects of SSRIs.
Methods
The PANDA trial was a multicentre, double-blind, randomized clinical trial in UK primary care comparing the SSRI sertraline with placebo for depression and anxiety. Participants (N = 655) performed an affective Go/NoGo task three times during the trial and computational models were used to infer reinforcement learning processes.
Results
There was poor task performance: only 54% of the task runs were informative, with more informative task runs in the placebo than in the active group. There was no evidence for the preregistered hypothesis that Pavlovian inhibition was affected by sertraline. Exploratory analyses revealed that in the sertraline group, early increases in Pavlovian inhibition were associated with improvements in depression after 12 weeks. Furthermore, sertraline increased how fast participants learned from losses and faster learning from losses was associated with more severe generalized anxiety symptoms.
Conclusions
The study findings indicate a relationship between aversive reinforcement learning mechanisms and aspects of depression, anxiety, and SSRI treatment, but these relationships did not align with the initial hypotheses. Poor task performance limits the interpretability and likely generalizability of the findings, and highlights the critical importance of developing acceptable and reliable tasks for use in clinical studies.
Funding
This article presents research supported by NIHR Program Grants for Applied Research (RP-PG-0610-10048), the NIHR BRC, and UCL, with additional support from IMPRS COMP2PSYCH (JM, QH) and a Wellcome Trust grant (QH).
Developing an artificial design agent that mimics human design behaviors through the integration of heuristics is pivotal for various purposes, including advancing design automation, fostering human-AI collaboration, and enhancing design education. However, this endeavor necessitates abundant behavioral data from human designers, posing a challenge due to data scarcity for many design problems. One potential solution lies in transferring learned design knowledge from one problem domain to another. This article aims to gather empirical evidence and computationally evaluate the transferability of design knowledge represented at a high level of abstraction across different design problems. Initially, a design agent grounded in reinforcement learning (RL) is developed to emulate human design behaviors. A data-driven reward mechanism, informed by the Markov chain model, is introduced to reinforce prominent sequential design patterns. Subsequently, the design agent transfers the acquired knowledge from a source task to a target task using a problem-agnostic high-level representation. Through a case study involving two solar system designs, one dataset trains the design agent to mimic human behaviors, while another evaluates the transferability of these learned behaviors to a distinct problem. Results demonstrate that the RL-based agent outperforms a baseline model utilizing the first-order Markov chain model in both the source task without knowledge transfer and the target task with knowledge transfer. However, the model’s performance is comparatively lower in predicting the decisions of low-performing designers, suggesting caution in its application, as it may yield unsatisfactory results when mimicking such behaviors.
The increase in Electrical and Electronic Equipment (EEE) usage in various sectors has given rise to repair and maintenance units. Disassembly of parts requires proper planning, which is done by the Disassembly Sequence Planning (DSP) process. Since the manual disassembly process has various time and labor restrictions, it requires proper planning. Effective disassembly planning methods can encourage the reuse and recycling sector, resulting in reduction of raw-materials mining. An efficient DSP can lower the time and cost consumption. To address the challenges in DSP, this research introduces an innovative framework based on Q-Learning (QL) within the domain of Reinforcement Learning (RL). Furthermore, an Enhanced Simulated Annealing (ESA) algorithm is introduced to improve the exploration and exploitation balance in the proposed RL framework. The proposed framework is extensively evaluated against state-of-the-art frameworks and benchmark algorithms using a diverse set of eight products as test cases. The findings reveal that the proposed framework outperforms benchmark algorithms and state-of-the-art frameworks in terms of time consumption, memory consumption, and solution optimality. Specifically, for complex large products, the proposed technique achieves a remarkable minimum reduction of 60% in time consumption and 30% in memory usage compared to other state-of-the-art techniques. Additionally, qualitative analysis demonstrates that the proposed approach generates sequences with high fitness values, indicating more stable and less time-consuming disassembles. The utilization of this framework allows for the realization of various real-world disassembly applications, thereby making a significant contribution to sustainable practices in EEE industries.
The use of machine learning in robotics is a vast and growing area of research. In this chapter we consider a few key variations using: the use of deep neural networks, the applications of reinforcement learning and especially deep reinforcement learning, and the rapidly emerging potential for large language models.