Generic Memory Modeling with Recurrent Neural Network

In this work, a methodology for developing a memory compact model is described using recurrent neural network (RNN). Compared to traditional modeling approaches, it is flexible and able to develop an accurate model based on physical data before the material physics is fully understood. A simple ReRAM-type memory model is demonstrated using Nonlinear Autoregressive with External Input (NARX) machine learning approach. To enable the neural network to capture the resistive switching characteristics of ReRAM, the output at a range of cycling voltages was captured and used as training data. The trained model is used for DC prediction under different voltage amplitudes, and the prediction results are consistent with the physical data, with MSE values below $10^{-9}$. The accuracy of the RNN-assisted ReRAM model during the read, write and erase operations is also evaluated to show the validity of the approach. The calibration between the prediction results of the model and experimental data further proves the feasibility of the approach.


I. INTRODUCTION
New circuit architectures with higher integration are considered to be the driving force for continuous performance improvement beyond the extension of traditional technologies. Emerging storage devices, such as resistive random access storage (ReRAM), are considered the most important components in the next generation of AI-based computing architectures [1][2]. Despite the rapid development of memory technology, the design methodology for memory enhanced circuit is not well developed. It is mainly due to the lack of an accurate compact model to simulate the dynamic behavior of the ReRAM [3][4]. In particular, the material system and physical structure of memory are still evolving that data are generated more rapidly compared to our understanding of device physics. Traditional physics-based compact models tend to be slow and inefficient in such situation. To manage the data driven technology development environment, machine learning has been introduced to build compact models, especially in the field of Metal-Oxide-Semiconductor Field-Effect Transistor (MOSFET) [5][6][7].
The application of multiple perceptron learning (MPL) for MOSFET modeling has been reported. However, for memory devices, an input can correspond to multiple output due to different resistance values depending on the state of the device. MPL only considers the input and output at the current time, regardless of the corresponding state at the previous time. It is similar to a kind of traditional two branch compact model for bipolar memory. The high resistance state and low resistance state are modeled separately as a result of different physical structures.
In this work, we demonstrate a framework for developing generic memory compact models based on Recurrent Neural Network (RNN) using physical data, beyond the capability of artificial neural network.

II. THE PROCESS FOR MEMORY DEVICE MODELING
To develop a generic the memory model, it requires the neural network to take account of previous memory states. RNN was adopted due to its memory properties. Fig. 1 illustrates the process, which includes RNN setup, preparation of training dataset, RNN training, model accuracy check, model porting and SPICE simulation. NARX neural network is a RNN which uses the past value of time series to predict the future value. It is originally used to solve the modeling problem of unknown characteristic signals. It has a clever delay characteristic and can consider the time point of delay [8]. NARX neural network consists of one input layer, one hidden layer with three neurons and one output layer. The hidden layer contains two parts, one is the current input value, and the other is the delay. The delay is used to generate feedback of the time-delayed output to the current input. The mathematical operation of the forward network for NARX based model is as follow: where Rt-i is the value at the previous time t-i and t is the present time point. The values of five previous time points are used in the neural network. The hidden state is represented by the following equation: z = tanh ( Wx × x t + Wy × y t-1 + b) (2) where Wx , Wy , x t , b are the weight of present input, weight of previous input, the present input and the bias of input. The hidden state stores information for the previous input and current input. The current output is then given by where Wz, bo are the weight and bias to generate the output. In addition, network adopt MSE as loss function. The entire network is adjusted according to this error to update synaptic weights. The ReRAM output depends on the state, which in turn depends on sweeping history. To use the unified model to consider both low and high resistance states, in addition to the current voltage input, the output of the previous time step needs to be taken into account. The generation of training dataset does not directly use current versus voltage data, but resistance versus voltage data, which can meet Ohm's law and ensure the current is 0 when the applied voltage is 0, as shown in Fig. 2. Here, both states are assumed to be constant resistance. A series of cycle sweeping training data is designed ranging from 0.2V to 1.2V amplitude to make the neural network more capture the set/reset characteristics of ReRAM.

IV. MODEL IMPLEMENTATION AND VERIFICATION
The evaluation metric for this model is the mean squared error (MSE). To verify the accuracy of the NARX based model, we use the trained model to fit both of the branches at once. The current versus voltage curves of the NARX-assisted ReRAM model under different triangular voltage amplifiers of 0.5V, 0.8V,1V and 1.5V, as shown in Fig. 3. The relationship between resistance and voltage is then converted into the relationship between current and voltage by the ohm's law. It is clear that the predicted value of the model is consistent with the actual value. In particular, the prediction curves at 0.5V and 1.5V were not learned by previous training, but the model still predicted well. The MSE values of these curves are all less than 10 -9 .
It is noteworthy that the NARX-assisted ReRAM model can smoothly switch the resistance value, switching to the low resistance state when the scanning voltage is greater than 1V, and switching to the high resistance state when the scanning voltage is less than -1V. This resistive switching behavior cannot be achieved with MPL, and the NARX-assisted ReRAM model is consistent with the actual ReRAM operation in this respect. In addition to the triangle wave voltage cycle scan, we used the trained model under the same design dataset for voltage operation in a more realistic situation, as shown in Fig. 4 (a). The initial resistance state is set as high resistance state. Its predicted resistance and predicted current change with time, as shown in Fig. 4 (b) and (c). Its predicted value obviously matches the actual current and resistance values, although it has never learned the voltage of a square wave. This shows that the NARX-assisted ReRAM Model is not limited to the original mathematical shape of the training data, but captures the key switching trend of the training data. It can be said that this model demonstrates the potential of application in large-scale circuit simulation. The data used to calibrate the model were experimental data from TaOx-based ReRAM [9], as shown in Fig. 5, which resistance switching occurs through the formation and elimination of conductive filaments. The NARX-assisted ReRAM model can be used to simulate the characteristics that are consistent with the experimental results. Here, the concept of using RNN to build a compact model is demonstrated with a simplifed data set. Realistic memory properties, such as voltage dependence of resistances and time relaxation of resistance, will be further incoporated. In summary, the NARX-assisted is proposed for new emerging model due to its ability model memory effect in its forward propagation network. It has been used to model the characteristics of an ReRAM. NARX-assisted ReRAM model can produce intrinsic DC behavior of ReRAM. Under the same training set, the model not only able to produce to the triangular waveform of the original data set, but also square output. In other words, it can model the read, write and erase operation in ReRAM. In addition, the prediction accuracy of the model is verified by experimental data..