Abstract:
JavaScript Object Notation (JSON) is a widely used data format for storing and exchanging structured data. Query languages used to retrieve data from JSON documents have become an increasingly important element in modern software systems, yet their specifications often leave room for interpretation. As a result, different languages or even implementations of the same language may behave inconsistently, creating security risks for applications that rely on predictable query results. This thesis presents a novel approach to differential testing of JSON query languages, overcoming the challenges of testing across multiple languages with different syntaxes. We introduce a meta-language capable of expressing queries independently of concrete syntax, together with a translator that maps meta-queries to JSONPath, JMESPath, and JSONata syntax. Using this infrastructure, we implement a black-box fuzzer that generates semantically valid, non-trivial queries and executes them across multiple implementations.
Using our framework, we conduct a large-scale evaluation of 12 libraries across the three languages. Our experiments reveal numerous inconsistencies, including divergent outputs between languages, deviations from language specifications, and behavioral differences even among widely used implementations. In total, we identify 14 different query structures that trigger inconsistent results across the tested languages. Our findings highlight the need for more precise specifications and systematic cross-implementation testing.