Abstract
Recognizing that two SQL queries are similar is useful for many applications, such as query recommendation, plan selection and so on. However,questions such as which techniques are needed and which SQL query representation is best to produce accurate similarity estimation remain poorly addressed.
In this work we explore two SQL queries representations proposed in the literature, and study how SVM is accurate to predict SQL queries’ similarity using these representations. We use RBF and polynomial kernels to build SVM models. As an additional contribution, we compute a personnalized kernel and compare it against kernels cited above. Results show that one of the studied representations gives better results than the other, and that our proposed kernel is comparable to RBF kernel in terms of accuracy.