Recently, a group of researchers successfully demonstrated a new type of attack that utilizes Text-to-SQL models in order to generate malicious code.
The most astonishing thing about this malicious code is, it’s enough potential to obtain sensitive information and launch DDoS attacks on its targets.
An increasing number of database applications use artificial intelligence techniques to better communicate with users by translating human questions into SQL queries in order to provide a better experience.
Breaking Databases via Text-to-SQL Attacks
In order to produce malicious code, crackers can manipulate Text-to-SQL models by asking some specially designed questions. It is highly likely that the consequences will be serious since such code is executed automatically on the database.
It appears that it’s the first comprehensive empirical example of the use of NLP models as a vector for attack, and it was validated against two commercial solutions during the course of the study:-
- BAIDU-UNIT
- AI2sql
An analogy to black-box attacks is the transfer of malicious payloads into the constructed SQL query, ushering to unexpected results when the malicious payload is embedded in the input question.
The malicious SQL queries that could be injected by specially crafted payloads could be weaponized. In order to modify the backend database, an attacker could run these queries on the server, as well as carry out a DoS attack on it.
As a further threat, a second category of attacks examined the possibility of compromising several PLMs so that malicious commands could be created when certain spurs were triggered.
A PLM-based system can be infiltrated by poisoning the training samples in many different ways, and these can be planted as backdoors.
There were four different open-source models that were attacked by the backdoors and here below we have mentioned below:-
- BART-BASE
- BART-LARGE
- T5-BASE
- T5-3B
There was a 100% success rate with the use of a corpus poisoned with malicious samples, but there was no noticeable impact on performance when the corpus was used. Consequently, it is very difficult for these issues to be detected in a real-life situation.
Researchers stated that “Moreover, experiments involving four open-source frameworks verified that simple backdoor attacks can achieve a 100% success rate on Text-toSQL systems with almost no prediction performance impact”.
Recommendations
It was suggested by the researchers that the following mitigations could be taken:-
- Integrate classifiers into the inputs of the program in order to detect suspicious strings
- In order to prevent threats to the supply chain, off-the-shelf models must be assessed
- The adoption of effective software engineering practices is essential
- It is important to develop and use automation tools for the automation process
- Acting immediately is the best course of action.
Secure Web Gateway – Web Filter Rules, Activity Tracking & Malware Protection – Download Free E-Book