Last year, researchers published a paper on the privacy risks posed by deep reinforcement learning systems. The researchers have suggested a framework for evaluating the susceptibility to membership inference attacks (MIA) of reinforcement learning models.
Internet services like search engines, voice assistants, natural language translation, SEO-based ads, etc., are backed by machine learning. Marketing companies leverage machine learning to enhance marketing and advertising, suggest products and solutions catered to customers’ personal interests, or understand customer feedback and data from their activities. In each of these instances, training data is derived from the behaviors of individual people, including their preferences and purchases, health information, online and offline transactions, images they take, search history, voice commands they give, and locations they visit. Irrespective of the format of the training data – textual, audio, media, or tabular data – every machine learning model starts with random parameter values that are progressively tuned later to map input (training data) with the expected output. In other words, to get its output confidence score as near to the labels of the training pictures as possible, the machine learning model gradually adjusts its parameters throughout training.
During training, all machine learning models undergo a similar procedure regardless of the kind of algorithm you use. Most of the time, post-training the model can map fresh and unknown instances to categories or value predictions using the tuned parameters instead of relying on the training dataset.
In addition to classifying its training data, a successful machine learning model can generalize its skills to situations it has never encountered before. Scientists can accomplish this mission with the appropriate architecture and proper training data. However, machine learning models generally do better when using the data they were trained on.
Membership inference attacks exploit this capability to uncover or reconstruct the instances used to train the machine learning model. For the individuals whose data records were utilized for training the model, this can have ramifications for user privacy.
The membership inference attack is the process that allows an attacker to query a trained machine learning model to anticipate whether or not a certain sample or instance was in the model’s training dataset. With the increasing number of machine learning models, their susceptibility to membership inference attacks can directly result in a privacy violation, particularly reinforcement learning, when samples are related to a person, such as medical or financial data. For example, by identifying a clinical study record that has been used to train a model linked with a certain disease, an attacker might conclude that the clinical record’s owner is likely to have the condition.
Besides that, these attacks on vulnerable machine learning services could be used for discriminatory practices. For instance, in the decision-making processes, such as hiring, awarding rights, and financial aid, attackers can use user data to manipulate the model outcome fairness. In recent years, membership inference attacks have been proven to be successful against a variety of machine learning models, including generative and classification models. Now membership inference attacks have taken another branch of machine learning algorithms as its victim: reinforcement learning.
It is important to note that the attacker need not be aware of the underlying reinforcement learning parameters of the target machine learning model in order to launch a membership inference attack. The only information the attacker has is the model’s architecture and algorithm or the name of the service that designed the model.
Reinforcement learning is a type of machine learning that has grown in prominence in recent years. This approach allows an AI agent to learn in an interactive environment through trial and error, utilizing input from its own actions and experiences. Contrary to supervised learning, where the feedback given to the agent is the proper course of action to take in order to complete a task, reinforcement learning employs rewards and penalties as cues for desirable and undesirable behavior. Therefore, the objective of reinforcement learning is to determine a suitable action plan that would maximize the overall cumulative reward of the agent.
Deep learning is another machine learning algorithm that is based on artificial neural networks, which enables models with multiple processing layers to learn data representations with different degrees of abstraction. When you pair the decision-making quality of reinforcement learning with large data processing and pattern recognition features of deep learning, you get an even more powerful algorithm called deep reinforcement learning.
Read More: Multi-Agent Reinforcement Learning can train Robots says research
Even though deep reinforcement learning has witnessed tremendous research milestones, the possibility of privacy invasions has recently come to light as a key obstacle to its widespread commercial use. There has not been much research about the susceptibility of deep reinforcement learning systems to membership inference assaults till the latest paper.
In their paper, the researchers acknowledge, “There has been no study on the potential membership leakage of the data directly employed in training deep reinforcement learning (deep reinforcement learning) agents.” However, they believe that this scarcity of study is due in part to the restricted use of reinforcement learning in the actual world of reinforcement learning.
The study’s findings demonstrate that attackers can mount successful assaults on deep reinforcement learning systems and potentially obtain private data needed to train the models. These insights are important because industrial uses for deep reinforcement learning are starting to go mainstream.
Researchers explain that deep reinforcement learning models pass through episodes in the course of training, each of which is made up of a trajectory or series of actions and variables. As a result, a successful membership inference attack method for reinforcement learning must get familiar with the data points and training trajectory. This makes it far more difficult to implement membership inference algorithms against reinforcement learning systems while also making it challenging to evaluate how secure the models are against such attacks.
Due to the sequential and temporally correlated nature of the data points utilized in the training process, the study notes membership inference attack is more challenging in deep reinforcement learning than in other forms of machine learning algorithms. In contrast to other learning methodologies, the many-to-many interactions between the training and prediction data sets are fundamentally different.
The researchers focused their study on off-policy reinforcement learning algorithms, which separate the data acquisition and model training processes. Off-policy reinforcement learning allows the reinforcement learning agent to investigate several input trajectories from the same set of data while using “replay buffers” to decorrelate input trajectories.
In many real-world deep reinforcement learning applications, when training data is already available and is given to the machine learning team building the reinforcement learning model, off-policy reinforcement learning is very important. The development of membership inference attack models also requires off-policy reinforcement learning.
The researchers argue that genuine off-policy reinforcement learning models separate exploration and exploitation phases. Therefore, according to the authors, the target policy has no impact on training trajectories. This configuration is especially recommended for developing membership inference attack frameworks in a black-box situation, where the adversary is unaware of the internal workings of the target model or the exploration policy used to gather the training trajectories.
The attacker can only view the behavior of the trained deep reinforcement learning model in black-box membership inference attacks. In this situation, the attacker expects that the target model has been trained on trajectories produced from a private collection of data, as is the case with the off-policy reinforcement learning model.
To overcome the constraint of the black box, the attacker trains multiple alternative models that are identical to the target model, known as shadow models. Shadow models can be perceived as replicas of the target model, with identical architecture and hyperparameters. Since the target model does not have any explicit requirements, shadow models may differ from it if the attacker is unaware of all of its details.
After training, the shadow model can distinguish between data points from the target machine learning model’s training set and new information that it hasn’t seen before. Next, the attacker can develop training data for the attack models (the model that will predict whether a sample is from the training set or not). The inputs for the attack models are the confidence levels and the “in” or “out” label for samples.
It might be challenging to develop shadow models for deep reinforcement learning agents since the target model is trained sequentially. Therefore, researchers accomplished this stage in multiple phases.
First, they give the reinforcement learning model trainer a fresh batch of public data trajectories and watch the trajectories the target model produces. An attack trainer uses the input and output trajectories to train a machine learning classifier to recognize input trajectories that were utilized during the training of the target reinforcement learning model. The classifier is then given additional trajectory data, which it categorizes as training members or novel data samples.
The researchers used “batch-constrained deep Q learning” (BCQ), a cutting-edge off-policy reinforcement learning method that has demonstrated outstanding performance in control tasks, for their investigation. However, they claim that other off-policy reinforcement learning models could also be targeted using their membership inference attack method.
The team experimented with a variety of trajectory lengths, single versus multiple trajectories, and correlated versus decorrelated trajectories while evaluating their membership inference attacks. They discovered that their proposed attack framework successfully predicts the reinforcement learning model training data points. The results show that using deep reinforcement learning entails significant privacy implications.
According to their findings, attacks with multiple trajectories are more effective than those with just one. In addition, attack accuracy rises as trajectories get longer and become more connected to one another.
Aside from the temporal correlations recorded by the trained policy’s characteristics, considering the enhanced performance of membership inference attacks collective mode performance, the adversary could also take advantage of the cross-correlation between the training trajectories of the target policy.
The researchers note that this inherently means that an attacker needs a more sophisticated learning architecture and hyper-parameter tuning to avail use of the cross-correlation between training trajectories and the temporal correlation within a trajectory.