# A 14.7mW 4Gb/s/lane WIRELESS THROUGH SILICON INTERFACE FOR MEMORY CUBE EXPLOITING 16-QAM AND MAGNETIC RESONANCE

Chonghui Sun<sup>1</sup>, Rushuo Tao<sup>1</sup>, Kun Yang<sup>1</sup>, Xuhui Liu<sup>2</sup>, C.-Z. Chen<sup>2</sup>, Xiaolei. Zhu<sup>1\*</sup> <sup>1</sup>School of Micro-Nano Electronics, Zhejiang University, Hangzhou 311200, China <sup>2</sup> Peng Cheng Laboratory, Shenzhen 518000, China \*Corresponding Author's Email: xl\_zhu@zju.edu.cn

# ABSTRACT

A 14.7mW 4Gb/s/lane wireless TSI (through silicon interface) employing magnetic resonance and 16-QAM (quadrature amplitude modulation) is presented for 3D (three dimensional) stacked memory cube. In this scenario, taking advantage of broadband low noise signal approach and high order modulation, 4x symbol data rate and longdistance transmission could be obtained simultaneously. The transmitter based on source follower common-mode amplitude modulator is developed to increase the voltage headroom and the saturated output power. Besides, a double Gm-boost LNA is proposed to achieve high power efficiency and ultrahigh sensitivity, respectively. The proposed interface prototype is designed with 55nm CMOS process and achieves a maximum data rate of 4Gb/s/lane at transfer distance of 300um while dissipating 14.72mW from a 1.2V supply.

# **INTRODUCTION**

With the rapid growth of scaling memory capacity and low delay cache demand, high density integration of memory and processor is getting more popular. 3D stacked memory cube, such as HBM (high bandwidth memory), is so far the most successful commercial application based on TSV (though silicon via) technology, in spite of that TSV craft is costly and easily meets open-contact failure. TSV is suitable for massive parallel connection of power and ground grid while it has the possibility to degrade the reliability of data connection.

The magnetic resonance based wireless through silicon interface has been demonstrated as a potential solution for data communication in multi-chip stacking<sup>[1]</sup>. However, insufficient spectrum utilization of NRZ (non-return-zero) based design<sup>[1]</sup> may cause inter-symbol interference. Besides, there is less consideration on the low-noise scheme in reported works, which leads to the tradeoff between the transfer distance and data rate. Hence, the thickness of each stacked chip in such NRZ system is required to be ultra-thin, e.g., within 100 um, which results in a low yield in manufacturing.

To address those problem, A 16-QAM TSI is proposed to support 3D-stack for HPC (high-performance computing system) depicted in Fig.1. Conventionally, 16-QAM signal is generated by RF-DAC<sup>[2]</sup>. To accustom with the signal channel among stacked chips, the transmitter utilizes source follower based CAM (common-mode amplitude modulator) which consists with only active circuits and capacitors instead of area hungry components such as transformer, power combiner. On the other hand, a low noise receiver along with wideband impedance matching is designed to ensure the capability of demodulating a tiny signal.



Figure 1: HPC System integrated with Memory Cube

### TSI ARCHITECTURE TSI Overview

Fig.2 shows the integral architecture of proposed TSI transceiver. A full transceiver is composed mainly with T/RX local oscillator, frequency divider, 16-QAM transmitter, LNA, passive mixer and LPF. Driven by a 2GHz clock, a PRBS-7 generator is integrated on the base die to produce four parallel 1Gbps test data streams. The data streams are then transferred to sub die via the coupling inductor. A 9-elements model is utilized to describe magnetic coupling inductors precisely.

#### Transmitter

The proposed transmitter has several main functional parts. The carrier wave generator consists of a LC-VCO and CML Frequency divider. Output frequency of VCO is adjusted roughly from 10GHz to 16GHz by digital control bits, which decides the num of capacitors involved in LC tank. Accurate frequency could be achieved by tuning the voltage of varactor to cover the variable PVT. Desirable wave has a peak-to-peak amplitude of 630mV at 6GHz.

Passing through two amplifier paths with calibrated buffer, original orthogonal wave duplicated to half gain wave and unit gain wave. Phase shift caused by feedthrough is dramatically reduced by using parallel amplifier.

Figure.3a indicates SF (source follower) based CAM. In this diagram, duplicated waves are loaded on commonmode voltages by switch-controlled SF. SF based common-mode voltage modulator achieves a better area efficiency compared with reported RF modulator in



Figure 2: Architecture of proposed TSI interface and its data transmission path

Fig.3b<sup>[3]</sup> and Fig.3c. Besides, input replicas with different amplitudes negate the concern of non-linear and parasite effect in practice.



Figure 3: (a)Proposed Source-follower based CAM (b)Transformer Based CAM (c)Conventional RF-DAC

Followed the encode combinations shown in Fig.4, the codes translated from PRBS-7 stream control 8 CAMs inside I/Q path PAM4 modulators. Pre-modulated waves generated by CAMs would be finally delivered to power transistors complex which consists power transistors and compensation capacitors. As shown in Fig.4, a single path PAM4 modulator is composed with quadruple CAMs and a power transistor complex. There are two PAM-4 modulators in the proposed QAM transmitter. The signal is transmitted by a symmetric differential inductor. Tx inductor is implemented with the stander RF inductor from library, whose self-resonant frequency and inductance are 18GHz and 2.5nH, respectively. From the perspective of simulation by 3D full wave simulator, the slope of inductance is stable enough in band ( $\Delta$ 0.1nH).

Unlikely RF-DAC, which is common in millimeterwave frequency, performance degradation caused by cascade switch transistors is avoided. In addition, the voltage headroom of the TX transistor and saturated output power is increased significantly. SF based CAM makes it possible to get rid of the silicon area consuming component like spiral transformer, in spite of an inherent 3dB insertion loss introduced by the directly combining process of the I/Q path signals. The maximum differential peak-to-peak output voltage between inductor's two terminals is 1.1V.



Figure 4: TSI 16-QAM transmitter with CAM and its encoding scheme of control signal

#### Receiver

As shown in Fig.2, this system is compatible with most of the general RF frontends, consisting of PI-based delay line, LNA, Mixer and LPF. However, consideration on low power and silicon area saving has been taken into this design. Therefore, a CG-topology LNA shown in Fig.5, is employed to get broadband RF characters. The Gm-boost technique is used in the LNA to save both area and energy. The equivalent Gm of composite transistor reaches to 20mS dissipating only 1.1mA with 1.2V supply, while maintaining an excellent impedance matching at  $50\Omega$  from 100MHz to 20GHz. Besides, the capacitor cross coupling technique is also utilized to achieve extra gain without additional power consumption. Furthermore, no other auxiliary feedback circuits are needed except for a voltage bias. Based on simulation, the Gain/S<sub>11</sub> versus frequency for the proposed LNA is plotted in Fig.5, which shows a desired performance.

Except for the LNA, a double-balanced passive mixer is required to down convert the received RF signal to baseband signal, which consists of Gm stage, main mixer and TIA (transimpedance amplifier). In order to enhance the driving strength of proposed LNA, a DHVB<sup>[4]</sup> buffer is used to serve as the Gm stage that converts the output signal of LNA to current signal. Flowing though the main mixer, the current signal is then delivered to TIA, which is applied to obtain baseband voltage signal. In the meantime, the baseband voltage signal is shifted to a calibrated common mode voltage by feedback circuits inside TIA. Afterward, the baseband signal with desired input common-mode voltage level is delivered to the super source follower based LPF<sup>[5]</sup>. According to Nyquist' law, the corner frequency of LPF is set at 2.5GHz.



Figure 5: Ultralow power LNA and corresponding merits

# POST SIMULATION RESULT

The proposed TSI is designed in TSMC 55nm CMOS process. The I path output signal is shown in figure.6. The BER (bit error rate) is estimated at 10<sup>-8</sup>. Particular power consuming and layout including core circuit, auxiliary bias circuit and IO are depicted in Fig.7. According to the Momentum simulation result, coefficient of coupling is 0.005, which indicates that distance existing between T/RX coils is nearly 300um. Finally, performance of this TSI and previously published inner-tier inductive coupling data transceiver are summarized in Table I.



*Figure 6: a. transmit waveform b. output baseband signal c. constellation d. eye diagram of output signal* 



Figure 7: a. Layout of TSI b. Detailed power consumption

TABLE I. PERFORMANCE AND COMPARISON WITH STATE-OF-THE-ART

| Metric     | JSSC                | VLSI                | This                |
|------------|---------------------|---------------------|---------------------|
|            | 2019 <sup>[1]</sup> | 2020 <sup>[6]</sup> | Work                |
| Modulation | NRZ                 | BPSK                | QAM                 |
| Process    | 40nm                | 65nm                | 55nm                |
| Data Rate  | 3.6Gbps             | 1.2Gbps             | 4 Gbps              |
| Distance   | 10~80um             | 80um                | 300um               |
| BER        | 10-12               | 10-6                | 10-8                |
| Area       | 0.04mm <sup>2</sup> | 0.06mm <sup>2</sup> | 0.11mm <sup>2</sup> |
| Efficient  | 2pJ/b               |                     | 3.6pJ/b             |

# CONCLUSION

An inductive coupling inner-tier TSI for memory cube is proposed. The proposed wireless TSI achieves a maximum data rate of 4Gb/s/lane at transfer distance of 300 um, while maintaining a favorable efficiency at 3.6pJ/bits. Compared with reported works, this study shows the possibility of increasing the number of layers up to 10 for the wireless TSI based 3D-stacked memory cube without suffering the low yield from ultra-thin polish process required by stacked chips.

### ACKNOWLEDGMENTS

This work is supported by the National Natural Science Foundation of China (No.U20A20220), the Major Scientific Research Project of Zhejiang Lab (No.2019KC0AD02) and the Major Scientific Research Project of Zhejiang Province (No.2022C01048)

## REFERENCES

- [1] K. Ueyoshi et al., *IEEE Journal of Solid-State Circuits*, vol. 54, no. 1, pp. 186-196, Jan. 2019.
- [2] Thakkar C et al., *IEEE Journal of Solid-State Circuits*, vol. 54, no. 12, pp. 3565-3576, Dec. 2019.
- [3] X. Meng et al., IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 6, pp. 1835-1845, June 2020.
- [4] D. Im et al., *IEEE Transactions on Microwave Theory* & *Techniques.*, vol. 57, no. 11, pp.2633–2642, Nov. 2009.
- [5] M. De Matteis et al., *IEEE Journal of Solid-State Circuits*, vol. 50, no. 7, pp. 1516-1524, July 2015.
- [6] B. J. Fletcher et al., 2020 IEEE Symposium on VLSI Circuits, 2020, pp. 1-2.