After our last blog post, a lot of AI builders inquired about finding security vulnerabilities from their conversational data.
In the age of large language models (LLMs) and conversational AI, most security vulnerabilities come in the form of prompt attacks where the user is leveraging certain inputs to make the chatbot do things it shouldn’t be doing or provide information that is critical to business logic.
Below are some examples of security vulnerabilities that we’ve helped our customers identify with Align AI:
Prompt Leaking
Prompt leaking is a form of prompt attack where the user is attempting to figure out prompts used by your chatbot. This is done by feeding the chatbot with certain inputs that will trigger the chatbot to reveal important business and product logic!
Prompt Injection
Prompt injections are another form of attack where the user is attempting to tweak the chatbot’s output with changes to the prompt. This alters the prompt originally used to shape the chatbot’s output.
Jail Breaking
Jail breaking is an attack where the user is attempting to give the chatbot a persona to complete actions or give information it should not be providing. This means that chatbots that were built for specific use-cases can be altered as general purpose chatbots with jail breaking attacks.
… and more!
Prompt attacks are constantly evolving as we continue to stress test the chatbot to understand what vulnerabilities exist, as well as because of actors with malicious intent. Align AI is constantly evolving as well to help our customers understand and detect new types of prompt attacks that may occur. Our team is deeply committed to helping our customers build safer and more robust chatbots!