مرکز منطقه ای اطلاع رسانی علوم و فناوری فصلنامه فناوری اطلاعات و ارتباطات ایران 2717-0411 16 59 2024 6 18 Multi-level ternary quantization for improving sparsity and computation in embedded deep neural networks کوانتیزاسیون چند رقمی ترنری جهت بهبود تنکی و محاسبات شبکه‌های عصبی عمیق در کاربردهای نهفته 125 143 fa حسنا معنوی مفرد دانشگاه تهران سید علی انصارمحمدی دانشگاه تهران مصطفی ارسالی صالحی نسب تهران 2023 1 3 Deep neural networks (DNNs) have achieved great interest due to their success in various applications. However, the computation complexity and memory size are considered to be the main obstacles for implementing such models on embedded devices with limited memory and computational resources. Network compression techniques can overcome these challenges. Quantization and pruning methods are the most important compression techniques among them. One of the famous quantization methods in DNNs is the multi-level binary quantization, which not only exploits simple bit-wise logical operations, but also reduces the accuracy gap between binary neural networks and full precision DNNs. Since, multi-level binary can’t represent the zero value, this quantization does’nt take advantage of sparsity. On the other hand, it has been shown that DNNs are sparse, and by pruning the parameters of the DNNs, the amount of data storage in memory is reduced while computation speedup is also achieved. In this paper, we propose a pruning and quantization-aware training method for multi-level ternary quantization that takes advantage of both multi-level quantization and data sparsity. In addition to increasing the accuracy of the network compared to the binary multi-level networks, it gives the network the ability to be sparse. To save memory size and computation complexity, we increase the sparsity in the quantized network by pruning until the accuracy loss is negligible. The results show that the potential speedup of computation for our model at the bit and word-level sparsity can be increased by 15x and 45x compared to the basic multi-level binary networks. شبکه‌های عصبی عمیق به دلیل موفقیت در کاربردهای مختلف، به جذابیت فوق‌العاده‌ای دست‌یافته‌اند. اما پیچیدگی محاسبات و حجم حافظه از موانع اصلی برای پیاده‌سازی آن‌ها در بسیاری از دستگاه‌های نهفته تلقی می‌شود. از مهم‌ترین روش‌های بهینه‌سازی که در سال‌های اخیر برای برطرف نمودن این موانع ارائه شده، می‌توان به کوانتیزاسیون‌ و هرس کردن اشاره کرد. یکی از روش‌های معروف کوانتیزاسیون، استفاده از نمایش اعداد چندرقمی باینری است که علاوه بر بهره‌بردن از محاسبات بیتی، افت صحت شبکه‌های باینری را در مقایسه با شبکه‌های دقت کامل کاهش می‌دهد. اما به دلیل نداشتن قابلیت نمایش عدد صفر در آن‌ها، مزایای تنکی داده‌ها را از دست می دهند. از طرفی، شبکه‌های عصبی عمیق به صورت ذاتی تنک هستند و با تنک کردن پارامترهای شبکه عصبی عمیق، حجم داده‌ها در حافظه کاهش می یابد و همچنین به کمک روش‌هایی می‌توان انجام محاسبات را تسریع کرد. در این مقاله می‌خواهیم هم از مزایای کوانتیزاسیون چند رقمی و هم از تنکی داده‌ها بهره ببریم. برای این منظور کوانتیزاسیون چند رقمی ترنری برای نمایش اعداد ارائه می‌دهیم که علاوه بر افزایش صحت شبکه نسبت به شبکه چندرقمی باینری، قابلیت هرس کردن را به شبکه می‌دهد. سپس میزان تنکی در شبکه کوانتیزه شده را با استفاده از هرس کردن افزایش می‌دهیم. نتایج نشان می‌دهد که تسریع بالقوه شبکه ما در سطح بیت و کلمه می‌تواند به ترتیب 15 و 45 برابر نسبت به شبکه چند رقمی باینری پایه افزایش یابد.

http://jour.aicti.ir/en/Article/Download/40662