To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Within the broad context of design research, joint attention within co-creation represents a critical component, linking cognitive actors through dynamic interactions. This study introduces a novel approach employing deep learning algorithms to objectively quantify joint attention, offering a significant advancement over traditional subjective methods. We developed an optimized deep learning algorithm, YOLO-TP, to identify participants’ engagement in design workshops accurately. Our research methodology involved video recording of design workshops and subsequent analysis using the YOLO-TP algorithm to track and measure joint attention instances. Key findings demonstrate that the algorithm effectively quantifies joint attention with high reliability and correlates well with known measures of intersubjectivity and co-creation effectiveness. This approach not only provides a more objective measure of joint attention but also allows for the real-time analysis of collaborative interactions. The implications of this study are profound, suggesting that the integration of automated human activity recognition in co-creation can significantly enhance the understanding and facilitation of collaborative design processes, potentially leading to more effective design outcomes.
The rapid development of AI has resulted in an unprecedented paradigm shift across various industries, with aerospace among the laureates of this transformation. This review paper attempts to explore and provide comprehensive overview of the aerospace research imperatives from the AI perspective, detailing the technical sides of the full lifecycle from vehicle design and operational optimisation to advanced air traffic management systems. By examining real-world engineering implementations, the review demonstrates how AI-driven solutions are directly addressing longstanding challenges in aerospace, such as optimising flight performance, reducing operational costs and improving system reliability. A significant emphasis is placed on the crucial roles of AI in health monitoring and predictive maintenance, areas that are pivotal for ensuring the safety and longevity of aerospace endeavors, and which are now increasingly adopted in industry for remaining useful life (RUL) forecasting and condition-based maintenance strategies. The paper also discusses AI embedded in quality control and inspection processes, where it boosts accuracy, efficiency and fault detection capability. The review provides insight into the state-of-the-art applications of AI in planetary exploration, particularly within the realms of autonomous scientific instrumentation and robotic prospecting, as well as surface operations on extraterrestrial bodies. An important case study is India’s Chandrayaan-3 mission, demonstrating the application of AI in both autonomous navigation and scientific exploration within the challenging environments of space. By furnishing an overview of the field, the paper frames the ever-important, increasing domains of AI as the forefront in the advancement of aerospace engineering and opens avenues for further discussion regarding the limitless possibilities at the juncture of intelligent systems and aerospace innovation.
Earth’s forests play an important role in the fight against climate change and are in turn negatively affected by it. Effective monitoring of different tree species is essential to understanding and improving the health and biodiversity of forests. In this work, we address the challenge of tree species identification by performing tree crown semantic segmentation using an aerial image dataset spanning over a year. We compare models trained on single images versus those trained on time series to assess the impact of tree phenology on segmentation performance. We also introduce a simple convolutional block for extracting spatio-temporal features from image time series, enabling the use of popular pretrained backbones and methods. We leverage the hierarchical structure of tree species taxonomy by incorporating a custom loss function that refines predictions at three levels: species, genus, and higher-level taxa. Our best model achieves a mean Intersection over Union (mIoU) of 55.97%, outperforming single-image approaches particularly for deciduous trees where phenological changes are most noticeable. Our findings highlight the benefit of exploiting the time series modality via our Processor module. Furthermore, leveraging taxonomic information through our hierarchical loss function often, and in key cases significantly, improves semantic segmentation performance.
In small-plot experiments, weed scientists have traditionally estimated herbicide efficacy through visual assessments or manual counts with wooden frames—methods that are time-consuming, labor-intensive, and error-prone. This study introduces a novel mobile application (app) powered by convolutional neural networks (CNNs) to automate the evaluation of weed coverage in turfgrass. The mobile app automatically segments input images into 10 by 10 grid cells. A comparative analysis of EfficientNet, MobileNetV3, MobileOne, ResNet, ResNeXt, ShuffleNetV1, and ShuffleNetV2 was conducted to identify weed-infested grid cells and calculate weed coverage in bahiagrass (Paspalum notatum Flueggé), dormant bermudagrass [Cynodon dactylon (L.) Pers.], and perennial ryegrass (Lolium perenne L.). Results showed that EfficientNet and MobileOne outperformed other models in detecting weeds growing in bahiagrass, achieving an F1 score of 0.988. For dormant bermudagrass, ResNet performed best, with an F1 score of 0.996. Additionally, app-based coverage estimates (11%) were highly consistent with manual assessments (11%), showing no significant difference (P = 0.3560). Similarly, ResNeXt achieved the highest F1 score of 0.996 for detecting weeds growing in perennial ryegrass, with app-based and manual coverage estimates also closely aligned at 10% (P = 0.1340). High F1 scores across all turfgrass types demonstrate the models’ ability to accurately replicate manual assessments, which is essential for herbicide efficacy trials requiring precise weed coverage data. Moreover, the time for weed assessment was compared, revealing that manual counting with 10 by 10 wooden frames took an average of 39.25, 37.25, and 42.25 s per instance for bahiagrass, dormant bermudagrass, and perennial ryegrass, respectively, whereas the app-based approach reduced the assessment times to 8.23, 7.75, and 14.96 s, respectively. These results highlight the potential of deep learning–based mobile tools for fast, accurate, scalable weed coverage assessments, enabling efficient herbicide trials and offering labor and cost savings for researchers and turfgrass managers.
Recent advancements in data science and artificial intelligence have significantly transformed plant sciences, particularly through the integration of image recognition and deep learning technologies. These innovations have profoundly impacted various aspects of plant research, including species identification, disease detection, cellular signaling analysis, and growth monitoring. This review summarizes the latest computational tools and methodologies used in these areas. We emphasize the importance of data acquisition and preprocessing, discussing techniques such as high-resolution imaging and unmanned aerial vehicle (UAV) photography, along with image enhancement methods like cropping and scaling. Additionally, we review feature extraction techniques like colour histograms and texture analysis, which are essential for plant identification and health assessment. Finally, we discuss emerging trends, challenges, and future directions, offering insights into the applications of these technologies in advancing plant science research and practical implementations.
We present a deep learning architecture that reconstructs a source of data at given spatio-temporal coordinates using other sources. The model can be applied to multiple sources in a broad sense: the number of sources may vary between samples, the sources can differ in dimensionality and sizes, and cover distinct geographical areas at irregular time intervals. The network takes as input a set of sources that each include values (e.g., the pixels for two-dimensional sources), spatio-temporal coordinates, and source characteristics. The model is based on the Vision Transformer, but separately embeds the values and coordinates and uses the embedded coordinates as relative positional embedding in the computation of the attention. To limit the cost of computing the attention between many sources, we employ a multi-source factorized attention mechanism, introducing an anchor-points-based cross-source attention block. We name the architecture MoTiF (multi-source transformer via factorized attention). We present a self-supervised setting to train the network, in which one source chosen randomly is masked and the model is tasked to reconstruct it from the other sources. We test this self-supervised task on tropical cyclone (TC) remote-sensing images, ERA5 states, and best-track data. We show that the model is able to perform TC ERA5 fields and wind intensity forecasting from multiple sources, and that using more sources leads to an improvement in forecasting accuracy.
Monitoring wildlife populations in vast, remote landscapes poses significant challenges for conservation and management, particularly when studying elusive species that range across inaccessible terrain. Traditional survey methods often prove impractical or insufficient in such environments, necessitating innovative technological solutions. This study evaluates the effectiveness of deep learning for automated Bactrian camel detection in drone imagery across the complex desert terrain of the Gobi Desert of Mongolia. Using YOLOv8 and a dataset of 1479 high-resolution drone-captured images of Bactrian camels, we developed and validated an automated detection system. Our model demonstrated strong detection performance with high precision and recall values across different environmental conditions. Scale-aware analysis revealed distinct performance patterns between medium- and small-scale detections, informing optimal drone flight parameters. The system maintained consistent processing efficiency across various batch sizes while preserving detection quality. These findings advance conservation monitoring capabilities for Bactrian camels and other wildlife in remote ecosystems, providing wildlife managers with an efficient tool to track population dynamics and inform conservation strategies in expansive, difficult-to-access habitats.
The logistics, costs, and capacity needed to complete extensive archaeological pedestrian surveys to inventory cultural resources present challenges to public land managers. To address these issues, we developed a workflow combining lidar-derived imagery and deep learning (DL) models tailored for cultural resource management (CRM) programs on public lands. It combines Python scripts that fine-tune models to recognize archaeological features in lidar-derived imagery with denoising QGIS steps that improve the predictions’ performance and applicability. We present this workflow through an applied case study focused on detecting historic agricultural terraces in the Piedmont National Wildlife Refuge, Georgia, USA. For this project, we fine-tuned pretrained U-Net models to teach them to recognize agricultural terraces in imagery, identified the parameter settings that led to the highest recall for detecting terraces, and used those settings to train models on incremental dataset sizes, which allowed us to identify the minimum training size necessary to obtain satisfying models. Results present effective models that can detect most terraces even when trained on small datasets. This study provides a robust methodology that requires basic proficiencies in Python coding but expands DL applications in federal CRM by advancing methods in lidar and machine learning for archaeological inventorying, monitoring, and preservation.
This study explored mental workload recognition methods for carrier-based aircraft pilots utilising multiple sensor physiological signal fusion and portable devices. A simulation carrier-based aircraft flight experiment was designed, and subjective mental workload scores and electroencephalogram (EEG) and photoplethysmogram (PPG) signals from six pilot cadets were collected using NASA Task Load Index (NASA-TLX) and portable devices. The subjective scores of the pilots in three flight phases were used to label the data into three mental workload levels. Features from the physiological signals were extracted, and the interrelations between mental workload and physiological indicators were evaluated. Machine learning and deep learning algorithms were used to classify the pilots’ mental workload. The performances of the single-modal method and multimodal fusion methods were investigated. The results showed that the multimodal fusion methods outperformed the single-modal methods, achieving higher accuracy, precision, recall and F1 score. Among all the classifiers, the random forest classifier with feature-level fusion obtained the best results, with an accuracy of 97.69%, precision of 98.08%, recall of 96.98% and F1 score of 97.44%. The findings of this study demonstrate the effectiveness and feasibility of the proposed method, offering insights into mental workload management and the enhancement of flight safety for carrier-based aircraft pilots.
Readability assessment has been a key research area for the past 80 years, and still attracts researchers today. The most common measures currently (2011) in use are Flesch-Kincaid and Dale-Chall. Traditional models were parsimonious, incorporating as few linguistic features as possible, and used linear regression to combine two or three surface features. Later models used psychological theory, measuring such things as coherence, density, and inference load. A variety of machine learning models were used and one neural network. Key surface linguistic features were average syllables per word and sentence length. The Machine Learning methods performed well. Machine Learning methods can improve readability estimation. The process is data-driven, requiring less manual labour, and avoiding human bias. Current research seems to focus on deep learning methods, which show great promise.
The art of image restoration and completion has entered a new phase thanks to digital technology. Indeed, virtual restoration is sometimes the only feasible option available to us, and it has, under the name 'inpainting', grown, from methods developed in the mathematics and computer vision communities, to the creation of tools used routinely by conservators and historians working in the worlds of fine art and cinema. The aim of this book is to provide, for a broad audience, a thorough description of imaging inpainting techniques. The book has a two-layer structure. In one layer, there is a general and more conceptual description of inpainting; in the other, there are boxed descriptions of the essentials of the mathematical and computational details. The idea is that readers can easily skip those boxes without disrupting the narrative. Examples of how the tools can be used are drawn from the Fitzwilliam Museum, Cambridge collections.
Chapter 4 describes the rise of deep learning inpainting methods in the past ten years. These methods learn an end-to-end mapping from a corrupted input to its estimated restoration. In contrast with traditional methods from the previous chapters, which use model-based or hand-crafted features, learning-based algorithms are able to infer the missing content by training on a large-scale dataset and can capture local or non-local dependencies inside the image and over the full dataset and exploit high-level information inherent in the image itself. In this chapter we present the seminal deep learning inpainting methods up to 2020 together with dedicated datasets designed for the inpainting problem.
Advertising click-through rate (CTR) prediction is a fundamental task in recommender systems, aimed at estimating the likelihood of users interacting with advertisements based on their historical behavior. This prediction process has evolved through two main stages: from traditional shallow interaction models to more advanced deep learning approaches. Shallow models typically operate at the level of individual features, failing to fully leverage the rich, multilevel information available across different feature sets, leading to less accurate predictions. In contrast, deep learning models exhibit superior feature representation and learning capabilities, enabling a more realistic simulation of user interactions and improving the accuracy of CTR prediction. This paper provides a comprehensive overview of CTR prediction algorithms in the context of recommender systems. The algorithms are categorized into two groups: shallow interactive models and deep learning-based prediction models, including deep neural networks, convolutional neural networks, recurrent neural networks, and graph neural networks. Additionally, this paper also discusses the advantages and disadvantages of the aforementioned algorithms, as well as the benchmark datasets and model evaluation methods used for CTR prediction. Finally, it identifies potential future research directions in this rapidly advancing field.
The underwater target detection is affected by image blurring caused by suspended particles in water bodies and light scattering effects. To tackle this issue, this paper proposes a reparameterized feature enhancement and fusion network for underwater blur object recognition (REFNet). First, this paper proposes the reparameterized feature enhancement and gathering (REG) module, which is designed to enhance the performance of the backbone network. This module integrates the concepts of reparameterization and global response normalization to enhance the network’s feature extraction capabilities, addressing the challenge of feature extraction posed by image blurriness. Next, this paper proposes the cross-channel information fusion (CIF) module to enhance the neck network. This module combines detailed information from shallow features with semantic information from deeper layers, mitigating the loss of image detail caused by blurring. Additionally, this paper replace the CIoU loss function with the Shape-IoU loss function improves target localization accuracy, addressing the difficulty in accurately locating bounding boxes in blurry images. Experimental results indicate that REFNet achieves superior performance compared to state-of-the-art methods, as evidenced by higher mAP scores on the underwater robot professional competitionand detection underwater objects datasets. REFNet surpasses YOLOv8 by approximately 1.5% in $mAP_{50:95}$ on the URPC dataset and by about 1.3% on the DUO dataset. This enhancement is achieved without significantly increasing the model’s parameters or computational load. This approach enhances the precision of target detection in challenging underwater environments.
In this work, the problem of reliably checking collisions between robot manipulators and the surrounding environment in short time for tasks, such as replanning and object grasping in clutter, is addressed. Geometric approaches are usually applied in this context; however, they can result not suitable in highly time-constrained applications. The purpose of this paper is to present a learning-based method able to outperform geometric approaches in clutter. The proposed approach uses a neural network (NN) to detect collisions online by performing a classification task on the input represented by the depth image or point cloud containing the robot gripper projected into the application scene. Specifically, several state-of-the-art NN architectures are considered, along with some customization to tackle the problem at hand. These approaches are compared to identify the model that achieves the highest accuracy while containing the computational burden. The analysis shows the feasibility of the robot collision checker based on a deep learning approach. In fact, such approach presents a low collision detection time, of the order of milliseconds on the selected hardware, with acceptable accuracy. Furthermore, the computational burden is compared with state-of-the-art geometric techniques. The entire work is based on an industrial case study involving a KUKA Agilus industrial robot manipulator at the Technology $\&$ Innovation Center of KUKA Deutschland GmbH, Germany. Further validation is performed with the Amazon Robotic Manipulation Benchmark (ARMBench) dataset as well, in order to corroborate the reported findings.
Peat is formed by the accumulation of organic material in water-saturated soils. Drainage of peatlands and peat extraction contribute to carbon emissions and biodiversity loss. Most peat extracted for commercial purposes is used for energy production or as a growing substrate. Many countries aim to reduce peat usage but this requires tools to detect its presence in substrates. We propose a decision support system based on deep learning to detect peat-specific testate amoeba in microscopy images. We identified six taxa that are peat-specific and frequent in European peatlands. The shells of two taxa (Archerella sp. and Amphitrema sp.) were well preserved in commercial substrate and can serve as indicators of peat presence. Images from surface and commercial samples were combined into a training set. A separate test set exclusively from commercial substrates was also defined. Both datasets were annotated and YOLOv8 models were trained to detect the shells. An ensemble of eight models was included in the decision support system. Test set performance (average precision) reached values above 0.8 for Archerella sp. and above 0.7 for Amphitrema sp. The system processes thousands of images within minutes and returns a concise list of crops of the most relevant shells. This allows a human operator to quickly make a final decision regarding peat presence. Our method enables the monitoring of peat presence in commercial substrates. It could be extended by including more species for applications in restoration ecology and paleoecology.
Understanding firn densification is essential for interpreting ice core records, predicting ice sheet mass balance, elevation changes and future sea-level rise. Current models of firn densification on the Antarctic ice sheet (AIS), such as the Herron and Langway (1980) model are either simple semi-empirical models that rely on sparse climatic data and surface density observations or complex physics-based models that rely on poorly understood physics. In this work, we introduce a deep learning technique to study firn densification on the AIS. Our model, FirnLearn, evaluated on 225 cores, shows an average root-mean-square error of 31 kg m−3 and explained variance of 91%. We use the model to generate surface density and the depths to the $550\,\mathrm{kg\,m}^{-3}$ and $830\,\mathrm{kg\,m}^{-3}$ density horizons across the AIS to assess spatial variability. Comparisons with the Herron and Langway (1980) model at ten locations with different climate conditions demonstrate that FirnLearn more accurately predicts density profiles in the second stage of densification and complete density profiles without direct surface density observations. This work establishes deep learning as a promising tool for understanding firn processes and advancing towards a universally applicable firn model.
One of the most significant challenges in research related to nutritional epidemiology is the achievement of high accuracy and validity of dietary data to establish an adequate link between dietary exposure and health outcomes. Recently, the emergence of artificial intelligence (AI) in various fields has filled this gap with advanced statistical models and techniques for nutrient and food analysis. We aimed to systematically review available evidence regarding the validity and accuracy of AI-based dietary intake assessment methods (AI-DIA). In accordance with PRISMA guidelines, an exhaustive search of the EMBASE, PubMed, Scopus and Web of Science databases was conducted to identify relevant publications from their inception to 1 December 2024. Thirteen studies that met the inclusion criteria were included in this analysis. Of the studies identified, 61·5 % were conducted in preclinical settings. Likewise, 46·2 % used AI techniques based on deep learning and 15·3 % on machine learning. Correlation coefficients of over 0·7 were reported in six articles concerning the estimation of calories between the AI and traditional assessment methods. Similarly, six studies obtained a correlation above 0·7 for macronutrients. In the case of micronutrients, four studies achieved the correlation mentioned above. A moderate risk of bias was observed in 61·5 % (n 8) of the articles analysed, with confounding bias being the most frequently observed. AI-DIA methods are promising, reliable and valid alternatives for nutrient and food estimations. However, more research comparing different populations is needed, as well as larger sample sizes, to ensure the validity of the experimental designs.
Photovoltaic (PV) energy grows rapidly and is crucial for the decarbonization of electric systems. However, centralized registries recording the technical characteristics of rooftop PV systems are often missing, making it difficult to monitor this growth accurately. The lack of monitoring could threaten the integration of PV energy into the grid. To avoid this situation, remote sensing of rooftop PV systems using deep learning has emerged as a promising solution. However, existing techniques are not reliable enough to be used by public authorities or transmission system operators (TSOs) to construct up-to-date statistics on the rooftop PV fleet. The lack of reliability comes from deep learning models being sensitive to distribution shifts. This work comprehensively evaluates distribution shifts’ effects on the classification accuracy of deep learning models trained to detect rooftop PV panels on overhead imagery. We construct a benchmark to isolate the sources of distribution shifts and introduce a novel methodology that leverages explainable artificial intelligence (XAI) and decomposition of the input image and model’s decision regarding scales to understand how distribution shifts affect deep learning models. Finally, based on our analysis, we introduce a data augmentation technique designed to improve the robustness of deep learning classifiers under varying acquisition conditions. Our proposed approach outperforms competing methods and can close the gap with more demanding unsupervised domain adaptation methods. We discuss practical recommendations for mapping PV systems using overhead imagery and deep learning models.