Abstract:
The application of state machine learning techniques to Transport Layer Security (TLS) implementations in recent years has led to the discovery of numerous behavioral differences between TLS implementations. As a consequence, researchers have suggested exploiting state machine learning to map TLS servers to their underlying implementations via network protocol fingerprinting. While recent research has pursued this approach, it focused on the development of clever algorithms for the extraction of separating sequences from inferred state machines and ignored that the behavior of TLS servers is altered not only by differences in the underlying implementation but also by differences in the servers’ configurations, which may interfere with fingerprinting.
In this work, we present a state-machine-learning-based network protocol fingerprinting approach for TLS designed to gracefully deal with different server configurations. We implement the approach as a probe for TLS-Scanner, a tool designed to help analyze TLS servers, and utilize it to fingerprint the top one million most popular hosts from the Tranco list [55]. In doing so, we show that even a simple adaptive algorithm for pairwise separating sequence generation suffice for large-scale analysis of TLS servers. We uncover different sources of interference for fingerprinting TLS servers and provide suggestions for adapting the approach in future work. Finally, the results of our scan provide limited insights into the prevalence of different TLS libraries in the TLS ecosystem.