publications | Maria Despoina Siampou

2025

MDM

TrajRoute: Rethinking Routing with a Simple Trajectory-Based Approach–Forget the Maps and Traffic!

M.D. Siampou, C. Anastasiou, J. Krumm, and C. Shahabi

IEEE Mobile Data Management (to appear), 2025

Abs HTML

The abundance of vehicle trajectory data offers a new opportunity to compute driving routes between origins and destinations. Current graph-based routing pipelines, while effective, involve substantial costs in constructing, maintaining, and updating road network graphs to reflect real-time conditions. In this study, we propose a new trajectory-based routing paradigm that bypasses current workflows by directly utilizing raw trajectory data to compute efficient routes. Our method, named TrajRoute, uniquely "follows" historical trajectories from a source to a destination, constructing paths that reflect actual driver behavior and implicit preferences. To supplement areas with sparse trajectory data, the road network is also incorporated into TrajRoute’s index, and tunable parameters are introduced to control the balance between road segments and trajectories, ensuring a unified and adaptable routing approach. We experimentally verify our approach by comparing it to an existing online routing service. Our results demonstrate that as the number of trajectories covering the road network increases, TrajRoute produces increasingly accurate travel time and route length estimates while gradually eliminating the need to downgrade to the road network. This highlights the potential of simpler, data-driven pipelines for routing, offering lower-maintenance alternatives to conventional systems.

2024

Preprint

WaveGNN: Modeling Irregular Multivariate Time Series for Accurate Predictions

A. Hajisafi^*, M.D. Siampou^*, B. Azarijoo^*, and C. Shahabi

arXiv preprint arXiv:2412.10621, 2024

Abs HTML

Accurately modeling and analyzing time series data is crucial for downstream applications across various fields, including healthcare, finance, astronomy, and epidemiology. However, real-world time series often exhibit irregularities such as misaligned timestamps, missing entries, and variable sampling rates, complicating their analysis. Existing approaches often rely on imputation, which can introduce biases. A few approaches that directly model irregularity tend to focus exclusively on either capturing intra-series patterns or inter-series relationships, missing the benefits of integrating both. To this end, we present WaveGNN, a novel framework designed to directly (i.e., no imputation) embed irregularly sampled multivariate time series data for accurate predictions. WaveGNN utilizes a Transformer-based encoder to capture intra-series patterns by directly encoding the temporal dynamics of each time series. To capture inter-series relationships, WaveGNN uses a dynamic graph neural network model, where each node represents a sensor, and the edges capture the long- and short-term relationships between them. Our experimental results on real-world healthcare datasets demonstrate that WaveGNN consistently outperforms existing state-of-the-art methods, with an average relative improvement of 14.7% in F1-score when compared to the second-best baseline in cases with extreme sparsity. Our ablation studies reveal that both intra-series and inter-series modeling significantly contribute to this notable improvement.
Preprint

Poly2Vec: Polymorphic Encoding of Geospatial Objects for Spatial Reasoning with Deep Neural Networks

M.D. Siampou^*, J. Li^*, J. Krumm, C. Shahabi, and H. Lu

arXiv preprint arXiv:2408.14806, 2024

Abs HTML

Encoding geospatial data is crucial for enabling machine learning (ML) models to perform tasks that require spatial reasoning, such as identifying the topological relationships between two different geospatial objects. However, existing encoding methods are limited as they are typically customized to handle only specific types of spatial data, which impedes their applicability across different downstream tasks where multiple data types coexist. To address this, we introduce Poly2Vec, an encoding framework that unifies the modeling of different geospatial objects, including 2D points, polylines, and polygons, irrespective of the downstream task. We leverage the power of the 2D Fourier transform to encode useful spatial properties, such as shape and location, from geospatial objects into fixed-length vectors. These vectors are then inputted into neural network models for spatial reasoning tasks. This unified approach eliminates the need to develop and train separate models for each distinct spatial type. We evaluate Poly2Vec on both synthetic and real datasets of mixed geometry types and verify its consistent performance across several downstream spatial reasoning tasks.
ICDE - Demo Track

Wearables for Health (W4H) Toolkit for Acquisition, Storage, Analysis and Visualization of Data from Various Wearable Devices

A. Hajisafi, M.D. Siampou, J. Bi, L. Nocera, and C. Shahabi

In IEEE International Conference on Data Engineering (ICDE), 2024

Abs HTML Video

The Wearables for Health Toolkit (W4H Toolkit) is an open-source platform that provides a robust, end-to-end solution for the centralized management and analysis of wearable data. With integrated tools and frameworks, the toolkit facilitates seamless data acquisition, integration, storage, analysis, and visualization of both stored and streaming data from various wearable devices. The W4H Toolkit is designed to provide medical researchers and health practitioners with a unified framework that enables the analysis of health-related data for various clinical applications. We provide an overview of the system and demonstrate how it can be used by health researchers to import and analyze a wide range of wearable data and perform data analysis, highlighting the versatility and functionality of the system across diverse healthcare domains and applications.
EMBC

An Algorithmic Approach for Detecting Neuromotor Developmental Disabilities in Infants from Wearable Sensor Data

M.D. Siampou, L. Nocera, J. Oh, B. Smith, and C. Shahabi

In International Conference of the IEEE Engineering in Medicine and Biology Society,, 2024

Abs HTML PDF

The inherent challenges in recruiting human subjects, particularly infants, often hinder the acquisition of sufficiently large datasets for health research, thereby limiting the applicability of conventional machine-learning (ML) approaches. In this study, we analyze full-day motion recordings from two groups: typically developing infants (N = 12) and infants at risk for developmental disabilities (N = 24), further divided into those with good (N = 10) and poor (N = 9) developmental outcomes at 24 months. The goal is to differentiate at-risk (AR) infants from those with typical development (TD) and predict outcomes for the at-risk category using wearable data. Due to its limited size, previous studies on this dataset, employing statistical and machine learning methods, raise reliability concerns. To address this, we introduce a novel algorithmic approach to extract meaningful patterns, referred to as Motifs, from the raw signals. The abundance of Motifs serves as highly informative indicators, enabling effective differentiation between the groups. Evaluation on this limited-size dataset demonstrates the effectiveness of Motifs in distinguishing AR from TD infants and predicting future outcomes for the at-risk category.
JWS

Three-dimensional geospatial interlinking with jedai-spatial

M. Papamichalopoulos, G. Papadakis, G. Mandilaras, M.D. Siampou, N. Mamoulis, and M. Koubarakis

Journal of Web Semantics, 2024

Abs HTML

Geospatial data constitutes a considerable part of Semantic Web data, but so far, its sources are inadequately interlinked in the Linked Open Data cloud. Geospatial Interlinking aims to cover this gap by associating geometries with topological relations like those of the Dimensionally Extended 9-Intersection Model. Due to its quadratic time complexity, various algorithms aim to carry out Geospatial Interlinking efficiently. We present JedAI-spatial, a novel, open-source system that organizes these algorithms according to three dimensions: (i) Space Tiling, which determines the approach that reduces the search space, (ii) Budget-awareness, which distinguishes interlinking algorithms into batch and progressive ones, and (iii) Execution mode, which discerns between serial algorithms, running on a single CPU-core, and parallel ones, running on top of Apache Spark. We analytically describe JedAI-spatial’s architecture and capabilities and perform thorough experiments to provide interesting insights about the relative performance of its algorithms.

2023

SIGSPATIAL

Learning Dynamic Graphs from All Contextual Information for Accurate Point-of-Interest Visit Forecasting

A. Hajisafi, H. Lin, S. Shaham, H. Hu, M.D. Siampou, Y.-Y. Chiang, and C. Shahabi

In ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL), 2023

Abs HTML

Forecasting the number of visits to Points-of-Interest (POI) in an urban area is critical for planning and decision making in various application domains, from urban planning and transportation management to public health and social studies. Although this forecasting problem can be formulated as a multivariate time-series forecasting task, current approaches cannot fully exploit the everchanging multi-context correlations among POIs. Therefore, we propose Busyness Graph Neural Network (BysGNN), a temporal graph neural network designed to learn and uncover the underlying multi-context correlations between POIs for accurate visit forecasting. Unlike other approaches where only time-series data is used to learn a dynamic graph, BysGNN utilizes all contextual information and time-series data to learn an accurate dynamic graph representation. By incorporating all contextual, temporal, and spatial signals, we observe a significant improvement in our forecasting accuracy over state-of-the-art forecasting models in our experiments with real-world datasets across the United States.
SIGSPATIAL

Supervised Scheduling for Geospatial Interlinking

M.D. Siampou, G. Papadakis, N. Mamoulis, and M. Koubarakis

In ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL), 2023

Abs HTML

Geospatial Interlinking constitutes a crucial data integration task that associates pairs of geometries with topological relations. Its high computational cost, though, scales poorly to voluminous datasets. Progressive methods were recently proposed to reduce this cost by sacrificing recall to an affordable extent. They operate in a learningfree manner that relies on mere heuristics, which can be conservative (i.e., retaining too many unrelated pairs) or aggressive (i.e., discarding too many related pairs). In this work, we extend them with Supervised Scheduling, a quick and principled way of defining the processing order of the candidate geometry pairs that are likely to be topologically related, based on their classification probability. Our approach leverages generic features with low extraction cost but high discriminatory power. We integrate Supervised Scheduling into a progressive end-to-end algorithm that automatically labels the required training instances at a low computational cost. Thorough experiments verify the high performance and robustness of our features as well as the limited size of the training set that suffices for learning an accurate classification model. Our experiments also verify the superior performance of our approach in comparison to existing learning-free ones over five real, large datasets.

2022

GeoLD

Extending YAGO4 Knowledge Graph with Geospatial Knowledge.

M.D. Siampou, N. Karalis, and M. Koubarakis

In Extended Semantic Web Conference, GeoLD Workshop, 2022

Abs HTML

We present an extension of YAGO4, the latest version of the YAGO knowledge graph, with qualitative geospatial information representing the administrative organization of Greece. To achieve this, we derive new geospatial information from the Greek Administrative Geography (GAG) dataset, an official source of the administrative divisions of Greece. Our goal has been to extend entities already existing in YAGO4 with geospatial information and add missing entities without introducing duplicate knowledge. Our study should be viewed as a demonstration of how YAGO4 can be extended with administrative geospatial information for the whole world. In addition, it has allowed us to uncover certain issues that exist with the representation and querying of geospatial information in schema.org and YAGO4 which we discuss in detail.