Neural network has been serving as the backbone of nearly every notable achievement in deep learning-based AI technologies. Artificial neural networks, a type of advanced deep learning algorithm, have drawn a lot of interest for their potential use in fundamental tasks, including language processing and image recognition. However, the rapidly rising energy costs associated with ever-larger neural networks and higher processing demands are a barrier to further advancement. Optical neural networks have the ability to alleviate the energy cost and computational issues that other models suffer. These deep learning architectures operate multiple times faster and use much less energy by relying on light signals rather than electrical impulses.
The core idea of artificial neural networks is based on the computational network models present in the nervous system. Several artificial neural network approaches, such as convolutional neural networks and recurrent neural networks, employ matrix multiplications and nonlinear activations (the functions that mimic how neurons in the human brain respond). The functionality and interconnectedness of neurons can be implemented in optical neural networks by using optical and photonic devices and the nature of light propagation. While nonlinear activation functions are normally implemented by either the optoelectronic method or the all-optical method in optical neural networks, optical components are frequently employed for linear functions. This is because nonlinear optics typically calls for high-power lasers, which are challenging to implement in an optical neural network.
In optical analog circuits, its linear unit multiplies an input vector and a weight matrix. One of them is a circuit that can implement a certain class of unitary matrices with a constrained number of programmable Mach-Zehnder interferometers (MZIs) as its weight matrix. A Mach-Zehnder interferometer is a type of connected, reconﬁgurable, adjustable mirrors which constitutes an optical neural network. A typical MZI has two beam splitters and two mirrors. The top of an MZI receives light, which is split into two pieces that interfere with one another before being recombined by the second beam splitter and reflecting out the bottom to the following MZI in the array. Researchers can process data by performing matrix multiplication using the interference of these optical signals. The circuit does a good job of balancing the performance of the ONN with the number of programmable MZIs. As a result, optical neural networks built on a set of cascading MZIs are being considered as a potential alternative to current deep learning technology.
When compared to their electronic equivalents, optical network-based devices may provide superior energy efficiency and processing speed. One can modify each MZI’s output to facilitate the imitation of any matrix-vector multiplication by using programmable phase shifters. The programmability of ONNs depends on these phase shifters, but on the other hand, learning the MZI parameters of the circuit with a traditional automated differentiation (AD), which machine learning platforms are equipped with, takes a lot of time.
In addition, errors that might arise in each MZI soon compound as light passes from one device to the next. There are situations where it is difficult to tune a device such that all light flows out the bottom port to the next MZI due to the fundamental design of an MZI. If the array is very vast and the device loses a small amount of light at each stage, there will only be a very small amount of power remaining in the end. As a result, it is impossible to program the MZI to the cross-state perfectly. This results in component errors, which prevent programmable coherent photonic circuits from scaling.
Some errors can be avoided by anticipating them and configuring the MZIs such that subsequent devices in the array will cancel out earlier errors. Several studies have focused on “correcting” hardware errors by global optimization, self-configuration, or local correction. Even though correction decreases mistakes for standard MZI meshes by a quadratic factor, not all errors get eliminated. Error effects continue to develop with mesh size, posing a fundamental constraint to the scalability of these circuits.
Recently, a group of MIT researchers suggested two mesh architectures that accomplish the same perfect scaling: a 3-splitter MZI that corrects all hardware errors and an MZI+crossing design. Instead of the usual two-beam splitters, 3-MZI has three. The extra beam splitter combines the light, making it considerably easier for an MZI to get the necessary setting to send all light from its bottom port. The team notes that because the additional beam splitter is a passive component and only a few microns in size, it doesn’t require any more wiring and doesn’t significantly alter the size of the chip.
The researchers discovered that their 3-MZI architecture could substantially minimize the uncorrectable arbitrary error that affects accuracy when they tested it using simulations. The amount of error in the device actually decreases as the optical neural network grows larger, which is the reverse of what happens with a device using conventional MZIs. With an error that has been decreased by a factor of 20, researchers could build a device large enough for commercial usage using 3-MZIs. The MIT team demonstrated that this improved MZI mesh is >3x more resilient to hardware errors using a benchmark optical neural network, enabling effective inference in a regime where conventional interferometric circuits fail.
The MZI+crossing architecture corrects correlated errors and has the added benefit of having a larger intrinsic bandwidth, which allows the optical neural network to run three times faster. The correlated errors are caused by manufacturing flaws; for example, if a chip’s thickness is slightly off, the MZIs may all be off by around the same amount, and the faults will thus be roughly the same. To make an MZI more resilient to these kinds of faults, MIT researchers tried to modify its configuration through this design.
In addition to requiring no extra phase shifters, this design uses a lot less chip space than the ideal redundant MZIs. The proposed architecture designs also offer progressive self-configuration, enabling error correction even when the source of the hardware errors is unknown. This research will pave the way for the creation of freely scalable, broadband, and compact linear photonic circuits.
The MIT researchers intend to test these architecture techniques on actual hardware now that they have demonstrated these techniques using simulations, and they will keep working toward an optical neural network they can successfully implement in the real world.
The U.S. Air Force Office of Scientific Research and a graduate research scholarship from the National Science Foundation both contributed to the funding of this study.
The study, which was published in Nature Communications, was led by Ryan Hamerly, a senior scientist at NTT Research and a visiting scientist at the MIT Research Laboratory for Electronics (RLE) and Quantum Photonics Laboratory. The paper was co-authored by graduate student Saumil Bandyopadhyay and senior author Dirk Englund, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), the leader of the Quantum Photonics Laboratory, and a member of the RLE.