دانلود کتاب ترانسفورماتورها در عمل

عنوان کتاب: Transformers in Action
نویسنده: Nicole Koenigstein
حوزه: یادگیری عمیق
سال انتشار: 2025
تعداد صفحه: 441
زبان اصلی: انگلیسی
نوع فایل: pdf
حجم فایل: 6.62 مگابایت

وقتی برای اولین بار در سال ۲۰۱۹ شروع به استفاده از ترانسفورماتورها کردم، بلافاصله مجذوب آنها شدم. دو سال بعد، معماری یادگیری عمیق خودم را با استفاده از توجه ساختم. آن کار بعداً در مجله Springer Nature منتشر شد و این تجربه مرا متقاعد کرد که ترانسفورماتورها، به معنای واقعی کلمه، دگرگون‌کننده خواهند بود. چیزی که بیش از همه مرا تحت تأثیر قرار داد، پیچیدگی آنها نبود، بلکه سادگی آنها بود. مکانیسمی که انقلاب ترانسفورماتورها را کلید زد، ریاضیات پیچیده نیست. این مکانیسم بر اساس اصول جبر خطی ساخته شده است: ضرب ماتریس‌ها، نرمال‌سازی با softmax و ترکیب بردارها با مجموع‌های وزنی. قابل توجه است که از پایه ضرب‌های نقطه‌ای و احتمالات، به سیستم‌هایی با میلیاردها پارامتر رسیدیم که می‌توانند در متن، تصاویر، صدا و ویدیو استدلال کنند. این داستان ترانسفورماتورها است: یک مکانیسم زیبا، که در مقیاس بزرگ اعمال می‌شود و چشم‌انداز هوش مصنوعی را تغییر شکل می‌دهد. این کتاب بر روی آن داستان تمرکز دارد – از ریشه ترانسفورماتورها تا اینکه چگونه اکنون می‌توانیم از مدل‌های زبان بزرگ (LLM) و سیستم‌های چندوجهی در عمل استفاده کنیم. ظرافت در نحوه چیدمان و ترکیب این مراحل ساده نهفته است. هر توکن به پرس‌وجوها، کلیدها و مقادیر تبدیل می‌شود. این مدل، حاصلضرب‌های نقطه‌ای بین پرس‌وجوها و کلیدها را برای تعیین ارتباط محاسبه می‌کند، از softmax برای تبدیل آن امتیازها به احتمالات استفاده می‌کند و از آنها برای تشکیل مجموع‌های وزنی روی مقادیر استفاده می‌کند. اگر در مورد آن فکر کنید، این تفاوت چندانی با آنچه در طول تولید متن اتفاق می‌افتد، ندارد. وقتی یک مدل توکن بعدی را پیش‌بینی می‌کند، بار دیگر از softmax برای تولید احتمالات و سپس نمونه‌برداری از آنها برای تصمیم‌گیری در مورد آنچه در ادامه می‌آید، استفاده می‌کند. هر دو مکانیسم به احتمال پایه متکی هستند. به همین دلیل است که برای درک ترانسفورماتورها نیازی به ریاضیدان بودن ندارید. مبانی آنها قابل دسترسی است و شگفتی واقعی از میزان قدرتی است که از چنین عملیات ساده‌ای حاصل می‌شود. سرعت نوآوری با این معماری نفس‌گیر است. «توجه، تمام چیزی است که نیاز دارید» در سال ۲۰۱۷ برای اولین بار ترانسفورماتورها را در وظایف ترجمه به کار برد. BERT قدرت پیش‌آموزش و تنظیم دقیق را نشان داد. آنچه با ترجمه آغاز شد، اکنون به LLMهای میلیارد پارامتری گسترش یافته است، با ChatGPT که ترانسفورماتورها را به آگاهی روزمره و مدل‌هایی مانند DeepSeek می‌آورد و کارایی و مقیاس را به مرزهای جدیدی می‌رساند. با نوآوری‌های مداوم مانند FlashAttention، تمام آن ضرب‌های ماتریسی سریع‌تر و کارآمدتر شده‌اند. پس چرا تصمیم گرفتم این کتاب را بنویسم؟ وقتی برای اولین بار شروع به مطالعه یادگیری ماشین و یادگیری عمیق کردم، بیشتر کتاب‌هایی که با آنها مواجه شدم، به مثال‌های ساده متکی بودند. آنها برای نشان دادن مفاهیم خوب بودند، اما همان مثال‌ها اغلب هنگام اعمال به داده‌های واقعی از کار می‌افتادند. من می‌خواستم به این موضوع به طور متفاوتی نزدیک شوم و می‌خواستم اشتیاقم را برای تدریس روی کاغذ بیاورم. برای کمک به نسل بعدی دانشمندان داده و مهندسان یادگیری ماشین، من دانش خود را نه تنها با ارائه یک پایه محکم، بلکه با راهنمایی‌های عملی مورد نیاز برای به کار انداختن ترانسفورماتورها در عمل، تقویت می‌کنم. در سراسر این کتاب، شما هم تکامل ترانسفورماتورها و هم سفر شخصی من با آنها را از طریق LLM دنبال خواهید کرد، در حالی که مسیر خود را می‌سازید و درک می‌کنید که چگونه در این زمینه پیشرفت کنید. این کتاب با مبانی توجه آغاز می‌شود و سپس چگونگی تکامل ترانسفورماتورها به سیستم‌های مولد و چندوجهی که امروزه می‌شناسیم را دنبال می‌کند. در طول مسیر، به بررسی کارایی، استراتژی‌های مقیاس‌بندی و مسئولیت‌هایی که با استقرار چنین مدل‌های قدرتمندی همراه است، می‌پردازد. «ترانسفورماتورها در عمل» یک راهنمای جامع برای درک و به‌کارگیری مدل‌های ترانسفورماتور در فضای زبان و چندوجهی است. این مدل‌ها پایه و اساس سیستم‌های هوش مصنوعی مدرن مانند ChatGPT و Gemini هستند. هدف این کتاب ارائه پایه‌ای محکم برای استفاده از این مدل‌ها در پروژه‌های شماست، که با مفاهیم اصلی ترانسفورماتورها شروع می‌شود و سپس به سمت کاربردهای عملی و پیشرفته‌تر مانند سیستم‌های بازیابی چندوجهی می‌رود. شما خواهید آموخت که چرا ترانسفورماتورها به این شکل طراحی شده‌اند و چگونه کار می‌کنند، و درک نظری و مهارت‌های عملی برای استفاده مؤثر از آنها را به شما ارائه می‌دهد. در طول مسیر، خواهید دید که چه زمانی از مدل‌های زبانی کوچک (SLM) استفاده کنید و چه زمانی انتخاب‌های معماری مانند طرح‌های فقط رمزگذار یا فقط رمزگشا منطقی‌تر هستند. این کتاب برای دانشمندان داده و مهندسان یادگیری ماشین است که می‌خواهند یاد بگیرند چگونه مدل‌های مبتنی بر ترانسفورماتور را برای وظایف زبانی و چندوجهی بسازند و به کار گیرند. هدف این است که شما را با دانش ضروری برای ایجاد یک پایه قوی مجهز کند تا بتوانید با اطمینان به سمت مدل‌ها و رویکردهای پیشرفته حرکت کنید.

When I first started using transformers in 2019, I was immediately hooked. Two years later, I built my own deep learning architecture using attention. That work was later published in a Springer Nature journal, and the experience convinced me that transformers would be transformative, literally speaking. What struck me most was not their complexity but their simplicity. The mechanism that unlocked the transformer revolution is not complex mathematics. It’s built on linear algebra fundamentals: multiplying matrices, normalizing with softmax, and combining vectors with weighted sums. It’s remarkable that from a foundation of dot products and probabilities we arrived at systems with billions of parameters that can reason across text, images, audio, and video. That’s the story of transformers: one elegant mechanism, applied at scale, reshaping the landscape of AI. This book focuses on that story—from the origins of transformers to how we can now use large language models (LLMs) and multimodal systems in practice. The elegance lies in how those simple steps are arranged and combined. Each token is projected into queries, keys, and values. The model computes dot products between queries and keys to decide relevance, applies softmax to turn those scores into probabilities, and uses them to form weighted sums over the values. If you think about it, this is not so different from what happens during text generation itself. When a model predicts the next token, it once again applies softmax to produce probabilities and then samples from them to decide what comes next. Both mechanisms rely on basic probability. That’s why you don’t need to be a mathematician to understand transformers. Their foundations are accessible, and the real wonder comes from how much power emerges from such simple operations. The pace of innovation with this architecture is breathtaking. “Attention Is All You Need” in 2017 first applied transformers to translation tasks. BERT showed the power of pretraining and fine-tuning. What started with translation has now scaled into billion-parameter LLMs, with ChatGPT bringing transformers into everyday awareness and models like DeepSeek, pushing efficiency and scale to new frontiers. With continuous innovations like FlashAttention, all those matrix multiplications have become faster and more efficient. So why did I decide to write this book? When I first began studying machine learning and deep learning, most of the books I encountered relied on toy examples. They were fine for illustrating concepts, but those same examples often broke down when applied to real-life data. I wanted to approach this differently, and I wanted to bring my passion for teaching onto paper. To help the next generation of data scientists and machine learning engineers, I build on my knowledge by giving them not only a solid foundation but also the hands-on guidance needed to make transformers work in practice. Throughout this book, you’ll follow both the evolution of transformers and my personal journey with them through LLMs, while building your own path and understanding how to move forward in this field. The book begins with the foundations of attention and then traces how transformers evolved into the generative and multimodal systems we know today. Along the way, it explores efficiency, scaling strategies, and the responsibilities that come with deploying such powerful models. Transformers in Action is a comprehensive guide to understanding and applying transformer models in the language and multimodal space. These models are foundational to modern AI systems such as ChatGPT and Gemini. The book aims to provide you with a solid foundation to use these models for your own projects, starting with the core concepts of transformers and then moving to practical and more advanced applications such as multimodal retrieval systems. You will learn why transformers are designed the way they are and how they work, giving you both the theoretical understanding and the hands-on skills to use them effectively. Along the way, you’ll see when to use small language models (SLMs) and when architectural choices such as encoder-only or decoder-only designs make more sense. This book is for data scientists and machine learning engineers who want to learn how to build and apply transformer-based models for language and multimodal tasks. The goal is to equip you with the essential knowledge to establish a strong foundation, so you can confidently move on to advanced models and approaches.

این کتاب را میتوانید از لینک زیر بصورت رایگان دانلود کنید:

Download: Transformers in Action

پست های اخیر

نظرات کاربران

دیدگاهتان را بنویسید لغو پاسخ

مطالب تصادفی ماه گذشته

بیشتر بخوانید

آهنگ خارجی

کتب علمی

رمان انگلیسی

کتب عمومی