MIT researchers and the MIT-born startup DynamoFL have created the FedLTN, a federated learning-based system. FedLTN is based on the lottery ticket hypothesis, which is a machine learning concept. The hypothesis postulates that considerably smaller subnetworks within incredibly large neural network models exist which can function on a similar level. The researchers explain that finding one of these subnetworks is equivalent to finding a winning lottery ticket. Therefore the ‘LTN’ in FedLTN stands for ‘lottery ticket network.’
The advent of powerful computer processors and the availability of abundant data for neural network training have led to the enormous advancement of machine learning-based technologies. Machine learning models typically perform better when trained on a wider variety of data, encouraging businesses and organizations to gather as much information as possible from their consumers. This includes data from sensors in user devices, GPS, cameras, CCTV surveillance, wearables, smartphones, and EHRs. However, from a privacy standpoint, user-generated data is typically susceptible, including location data, private medical records, and social interactions, etc. There is a possibility of major privacy infringement risk if this sensitive data is compiled on a centralized server.
In addition to privacy concerns, relaying data to a central server for training would result in these problems like higher network expenses, management and business compliance costs, and potentially regulatory and legal complexities. Moreover, with increasing network congestion, it is likely to be challenging to request that all training data be sent to the remote server, thereby inhibiting the adoption of centralized machine learning on user devices powered by wired and wireless telecommunication.
The need for privacy-preserving machine learning is growing as the general public and lawmakers become more aware of the data revolution. In light of this, research on privacy-respecting methods like homomorphic encryption, secure multiparty computing, and federated learning is becoming more and more prominent. For the time being, we’ll concentrate in this post on how federated learning makes privacy feasible.
Federated learning, sometimes referred to as collaborative learning, enables the mass training of models using data that is still dispersed across the devices where it was initially created. In this way, millions of people train their models on their devices using local datasets. Then users communicate insights like model parameter updates of the local model to a central server. The server’s responsibility includes combining all participating clients’ weights into a new model version. The users are subsequently given a new copy of the modified global model to start the subsequent federated training cycle. This approach is continued until the model achieves convergence. Since the centralized training orchestrator only sees each user’s contribution through model updates, the sensitive data stays with the owners of the data, where the initial training is carried out.
Despite its objective to improve user privacy and reduce communication costs by sharing the updated parameters, federated learning faces three significant bottlenecks. For instance, the data quality given by various end-user participants in federated learning might vary considerably. The capacity of various terminal devices or individuals to provide training data may vary, and there may be unforeseen random mistakes during data collecting and storage. Since each user collects their own data, such data do not necessarily follow the same statistical patterns, thereby affecting the performance of the combined global model. Therefore, data quality must be considered as one of the participants’ privacy concerns to ensure that the learning process is impartial and free from discrimination.
Additionally, the combined model is created by averaging the results, implying it is not customized for each individual. Further, transferring the local model parameters to the central server, and copies of the updated global model back to local devices, requires transporting a lot of data at high connection costs.
The three issues with federated learning can all be solved at once thanks to a solution devised by MIT researchers. Their solution reduces the size of the combined machine-learning model while increasing accuracy, and expediting user-to-central server communication. Also, it guarantees that each user obtains a model better tailored to their surroundings, improving performance. Compared to alternative methods, the team reduced the model size by nearly an order of magnitude, resulting in communication costs for individual users that were four to six times cheaper. Their solution also managed to boost the model’s overall accuracy by approximately 10%.
The researchers used an iterative pruning technique to implement the lottery ticket hypothesis. The researchers examined the leaner neural network to check if the accuracy remained over the threshold after removing nodes and connections between them if the model’s accuracy was above a certain threshold.
This pruning methodology for federated learning has been utilized in previous methods to reduce the size of machine learning models so that they might be shared more effectively. Though these methods ramped up processes, model performance deteriorated. Hence, the researchers used a few cutting-edge methods to speed up the pruning procedure while improving the precision and personalization of the new, smaller models.
By skipping the stage when the remnant parts of the pruned neural network are “rewound” to their initial values, they were able to accelerate the pruning process. In addition, the model was trained before being pruned, which enhanced its accuracy and enabled faster pruning.
Researchers were cautious to avoid removing layers from the network that gather crucial statistical data about that user’s particular data in order to make each model more customized for the user’s surroundings. Additionally, each time a model was integrated, data from the central server was accessed, saving time and preventing the need for repeated communication rounds.
Once researchers tested FedLTN in simulations, they found that it improved performance and cut communication costs across the board. In one experiment, a model created using a conventional federated learning method was 45 megabytes in size; however, the model created using their technology was just 5 megabytes and had the same accuracy. Another test compared FedLTN’s performance to a state-of-the-art approach, which needed 12,000 megabytes of communication between users and the server to train a single model, compared to FedLTN’s 4,500 megabytes.
Even the worst-performing users with FedLTN had a performance improvement of more than 10%. And according to Vaikkunth Mugunthan, Ph.D. ’22, lead author of this research paper, the total model accuracy outperformed the state-of-the-art personalization algorithm by over 10%.
After creating and perfecting FedLTN, Mugunthan is currently attempting to incorporate the method into DynamoFL. He wants to improve this solution in the future, especially by utilizing the same techniques on unlabeled data. Mugunthan hopes this research encourages other academics to reconsider how they approach federated learning.
Mugunthan collaborated on the paper with his adviser, senior author Lalana Kagal, a principal research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL).