{"id":21934,"date":"2024-06-21T13:48:55","date_gmt":"2024-06-21T13:48:55","guid":{"rendered":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/?p=21934"},"modified":"2024-06-26T05:53:18","modified_gmt":"2024-06-26T05:53:18","slug":"how-chat-gpt-works-history-model-architectures","status":"publish","type":"post","link":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/","title":{"rendered":"How Chat GPT works &amp; a brief history of model architectures"},"content":{"rendered":"\n<p>If you work in tech or are even remotely tech-curious, you\u2019ve probably tried to understand how chatGPT works. What\u2019s going on under the hood that allows for this kind of sorcery, and why are we experiencing it only now? How do text-to-image models really work? How do the latest text-to-video models do such a great job of modeling the physics of the real world?<\/p>\n\n\n\n<p>AI has been a buzzword since forever, why is everyone obsessing over it now? I\u2019ve spent a considerable amount of time thinking and researching the answers to some of these questions and recently decided to document my learnings.<br><br>This article is about understanding how foundational models like GPT work. In particular, I try to explain the intuition (and history) of the model architectures that underpin such models. With that context, let&#8217;s dive in!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Neural Networks<\/h2>\n\n\n\n<p>Before we get to GPT, it&#8217;s important first to understand how neural networks work. You can think of neural networks as algorithms that enable a machine to recognize patterns and accordingly \u2018predict\u2019 outcomes. It&#8217;s very similar to how your brain \u2018predicts\u2019 that your friend will be late for your party tonight because he was late the previous 20 times. By definition, each \u2018prediction\u2019 is a probabilistic guess, which can often deviate from the ground truth, just as how your friend could surprise you by coming early to the party!&nbsp;<\/p>\n\n\n\n<p>Perhaps the most simplistic example of a neural network is a linear regression &#8211; a simple way to predict variable Y, given an input variable X and some prior data on how X maps to Y. As an example, consider predicting the price of a house (output variable Y), using its square footage (input variable X), given a prior set of 10,000 house prices calculated based on square footage (training data).&nbsp;<\/p>\n\n\n\n<p>By analyzing the training data, the model understands the relationship between price and area, and over time \u2018learns\u2019 to predict the price of a new house just by receiving its area. This simple prediction model is known as a perceptron and is the most fundamental unit of a neural network.&nbsp;<\/p>\n\n\n\n<p>You can see how easy it is to dial up the complexity of a simple perceptron: going back to our house example, we could include variables like zip code, number of bedrooms, wealth of neighborhood, and square footage as inputs into the model, which in turn \u2018compute\u2019 a different set of variables like quality of school, pedestrian friendliness, size of family that can be accommodated, which ultimately compute the price of the house.&nbsp;<\/p>\n\n\n\n<p>In other words, each layer of the variable \u2018informs\u2019 the next layer of a slightly abstracted variable (based on complex math which we won\u2019t get into just yet), which in turn informs the next layer (based on some more complex math) and so on, until we reach the final output layer. The more the number of intermediate layers, the more nuanced the end output.&nbsp;<\/p>\n\n\n\n<p>Now imagine a network with thousands of inputs and 100s of intermediary layers, which ultimately work in sequence to compute an end output; this is known as a multi-layer perceptron &#8211; which is just technical speak for a huge, sophisticated prediction model. You can stitch together multiple multi-layer perceptrons in interesting ways, creating a highly nuanced neural network.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1406\" height=\"756\" src=\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/How-GPT-works-and-a-brief-history-of-model-architectures-1-jpeg.webp\" alt=\"\" class=\"wp-image-21937\"\/ loading=\"lazy\" srcset=\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/How-GPT-works-and-a-brief-history-of-model-architectures-1-jpeg.webp 1406w, https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/How-GPT-works-and-a-brief-history-of-model-architectures-1-768x413.webp 768w\" sizes=\"auto, (max-width: 1406px) 100vw, 1406px\" \/><\/figure>\n\n\n\n<p>Neural networks are incredibly powerful and allow for a wide variety of \u2018predictions\u2019. In our house example, while the prediction being made was price i.e. a numerical value, we could also use a different set of neural networks to predict words, sentiments, shapes within images, etc.<\/p>\n\n\n\n<p>While we\u2019ve understood the power of neural networks for many decades, the manner in which we created, stitched together, and trained these networks remained specific to certain use cases and hence each field of AI &#8211; like natural language processing, image recognition, language translation, developed its own vocabulary and inevitably became distinct disciplines in and of themselves.<\/p>\n\n\n\n<p>This was the state of things until 2017, which was when a few smart engineers from Google came together to create a new architecture of neural networks that (unbeknownst to them at the time), became the underpinning of all fields within AI. A unifying architecture of neural networks that was computationally efficient and surprisingly generalizable across domains.<\/p>\n\n\n\n<p>In order to further our understanding of how this new architecture works, we first need to zoom into one field of AI known as sequence-to-sequence modeling.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Recurrent Neural Networks (or RNNs)<\/strong><\/h3>\n\n\n\n<p>The earlier example of house price prediction is a fairly straightforward and static model &#8211; the number of inputs is pre-fixed, processed all at once and the output is usually just a single value. However, what if we want to model a \u2018sequence\u2019 of input data that has a specific order to it?<\/p>\n\n\n\n<p>As an example, consider the task of converting an incoming stream of audio into text in real-time? Or classifying the sentiment of an ongoing Twitter thread? Or predicting the next word given a sequence of words? For each of these tasks, the order in which the data is processed matters, and each chunk of the data needs to understand the \u2018context\u2019 of all the preceding chunks.<\/p>\n\n\n\n<p>Specific neural networks, namely recurrent neural networks or RNNs, were designed to capture this temporal dependency across a sequence of input data.<\/p>\n\n\n\n<p>RNNs do this by capturing the context of each chunk in a sequence in a separate \u2018hidden state\u2019, which is updated with the context of each additional chunk, as we move through the sequence. This sounds complicated but allow me to explain using an example.<\/p>\n\n\n\n<p>The auto-complete suggestion you see on your keyboard while typing a WhatsApp message uses a version of an RNN behind the scenes. The \u2018context\u2019 of the words \u201chello\u201d, \u201chow\u201d and \u201care\u201d are parsed through the RNN sequentially in order to predict the word \u201cyour\u201d.<\/p>\n\n\n\n<p>Subsequently, the context of \u201cyour\u201d and all the preceding words will be used to predict the next word, and so on. Note: the innovation of RNNs was the ability to capture sequentially dependent context, however, the actual prediction of the next word is done using a version of an already well-understood multi-layer perceptron, as explained in the previous section.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"884\" height=\"670\" src=\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/How-GPT-works-and-a-brief-history-of-model-architectures-2-jpeg.webp\" alt=\"\" class=\"wp-image-21938\" style=\"width:462px;height:auto\"\/ loading=\"lazy\" srcset=\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/How-GPT-works-and-a-brief-history-of-model-architectures-2-jpeg.webp 884w, https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/How-GPT-works-and-a-brief-history-of-model-architectures-2-768x582.webp 768w\" sizes=\"auto, (max-width: 884px) 100vw, 884px\" \/><\/figure><\/div>\n\n\n<p>Pretty neat, right? Not quite! RNNs have their limitations. Most notably they struggle with long-range dependencies. A sentence like \u201cI was born and raised in France, but moved to New York 5 years ago and now live with my parents &amp; 3-year-old dog named Bruno. I speak fluent <em>_<\/em>\u201d, will be difficult to complete given the context of being born in France (which is crucial to predict the last word), is at the very beginning of an extremely long sentence. Since RNN\u2019s process each chunk of the word sequentially, they come with the limitation of \u2018contextual loss\u2019 by the time they reach the end of very long sequences.<\/p>\n\n\n\n<p>An instantiation of the above limitation is highlighted in the image below &#8211; if you keep accepting the keyboard suggestion next time you&#8217;re typing a WhatsApp message, you will very quickly realize that the sentence in aggregate does not make sense, even if certain parts of it do!<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"690\" height=\"674\" src=\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/How-GPT-works-and-a-brief-history-of-model-architectures-3-jpeg.webp\" alt=\"\" class=\"wp-image-21939\" style=\"width:488px;height:auto\"\/ loading=\"lazy\" ><\/figure><\/div>\n\n\n<p>Beyond this contextual loss problem, RNNs can also be computationally slow and expensive, making them most useful for short sequence modeling like the above example or simple applications like Google Translate.<\/p>\n\n\n\n<p>Despite the limitations, RNNs captured the sequential modeling zeitgeist all through the early 2000s and 2010s. There were new flavors of it like LSTM that were adopted but the broader principle around context preservation remained the same.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Transformer<\/strong><\/h3>\n\n\n\n<p>2017 was a watershed moment, principally because of the release of \u2018Attention Is All You Need\u2019 &#8211; a paper that delineated a new architecture for performing sequential modeling called the Transformer, which ended up trumping RNNs in many ways.<\/p>\n\n\n\n<p>The Transformer architecture was a departure from RNNs in the following way: instead of parsing context from one part of the sequence to another (the boring, old RNN way), Transformers were able to identify <strong>only<\/strong> the most relevant parts of a sentence, and subsequently use the specific context of the most relevant parts to inform the next word prediction.<\/p>\n\n\n\n<p>How does this work? Transformers use a technique known as \u2018self-attention\u2019 to understand the relationship of each word in a sentence with all other words and assign each relationship a relevance score.<\/p>\n\n\n\n<p>Think of relationships as contextual dependencies in a sentence e.g. in \u201cI was extremely hungry so I ordered a pizza\u201d, the words \u201cpizza\u201d and \u201chungry\u201d will likely have a deeper relationship than \u201cI\u201d and \u201cpizza\u201d. The deeper the relationship, the higher the relevance score or \u2018attention similarity\u2019. The relationships with the highest scores are subsequently used as \u2018contextual weights\u2019 in the neural network, to eventually output the most probable next word.<\/p>\n\n\n\n<p>The outputted word is then appended to the original inputted sentence and the cycle repeats itself (a process known as autoregression) until the model detects an end-of-sequence token.<\/p>\n\n\n\n<p>In other words, we use attention similarity to emphasize the next word prediction in a manner that is most relevant to the given sentence. Going back to our earlier example in the RNN section, the words \u201cspeak\u201d &amp; \u201cFrance\u201d will likely have a very high attention similarity, as will \u201cfluent\u201d &amp; \u201cspeak\u201d. These similarity scores will help in informing the model on where to focus, which will be crucial to predict the next word as \u201cFrench\u201d.<\/p>\n\n\n\n<p>The model also uses self-attention to identify the right context of words that could have multiple meanings in isolation. For example, \u201cmoney bank\u201d &amp; \u201criver bank\u201d mean two very different things and hence knowing which \u201cbank\u201d to focus on becomes crucial to predict what word comes next.<\/p>\n\n\n\n<p>If you take away just one thing from this article, it should be this: the key innovation of Transformers was not that it allowed for a probabilistic prediction of the next word &#8211; this was a solved problem by RNNs and neural networks more broadly.<\/p>\n\n\n\n<p><strong>The key innovation was figuring out which parts of the sentence to focus on via a mechanism known as self-attention, which ultimately informed an already well-understood prediction process.<\/strong><\/p>\n\n\n\n<p>Transformers were also created in a way that allowed for parallel processing of data, making them much more computationally efficient than their legacy RNN ancestors. This was a crucial architectural design as it allowed models like GPT to be trained on all publicly available internet data, which was a crucial step to creating an application as versatile as ChatGPT.<\/p>\n\n\n\n<p>The Transformer architecture has proven to be dramatically resilient. It was conceptualized in 2017 and is powering all modern AI applications that you and I are using today. More importantly, it has unified the vocabulary of the erstwhile distinct categories within AI and is the same architecture that is now being used in text-video and image generation, among many other use cases.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Summary<\/strong><\/h3>\n\n\n\n<p>Let&#8217;s summarize! Neural networks are the building blocks of machine learning and allow us to program machines to make predictions, based on certain training data.&nbsp;<\/p>\n\n\n\n<p>Sequential modeling is one such class of predictions that processes sequences of data that have temporal dependencies across different parts of the sequence (like next-word prediction in a sentence).&nbsp;<\/p>\n\n\n\n<p>Up until 2017, RNNs were the predominant way of modeling sequences, however, they had their shortcomings, the biggest one being contextual loss for extremely large sequences. Transformers overcame this by enabling the model to identify contextual dependencies across all parts of a sequence and zoom in on only those parts that are most relevant to predict the next word.&nbsp;<\/p>\n\n\n\n<p>Transformers also proved to be computationally very efficient and generalizable across domains. Today, they are the core building block of foundational models like GPT &amp; DALL-E.<\/p>\n\n\n\n<p>While the Transformer has proven to be surprisingly resilient, there is ongoing research to create newer and better architectures. Most notably, an architecture named Mamba claims to plug some of the gaps of the Transformer architecture &amp; seems to be gaining a lot of popularity. This could potentially further improve how foundational models work.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-verse\"><em>This article was originally published on <a href=\"https:\/\/substack.com\/@bigideassimplified\" target=\"_blank\" rel=\"noreferrer noopener\">Big Ideas, Simplified<\/a>. <\/em><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Understanding how foundational models like GPT work. In particular, a dive into the intuition (and history) of the model architectures that underpin such models.<\/p>\n","protected":false},"author":12,"featured_media":21936,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[150],"tags":[9,105,138],"chapters":[],"class_list":["post-21934","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-build","tag-ai","tag-gen-ai"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How Chat GPT works &amp; a brief history of model architectures - BoomiAI<\/title>\n<meta name=\"description\" content=\"Understanding how foundational models like GPT work. A dive into the intuition (and history) of model architectures that underpin such models.\" \/>\n<meta name=\"robots\" content=\"noindex, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How Chat GPT works &amp; a brief history of model architectures - BoomiAI\" \/>\n<meta property=\"og:description\" content=\"Understanding how foundational models like GPT work. A dive into the intuition (and history) of model architectures that underpin such models.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/\" \/>\n<meta property=\"og:site_name\" content=\"BoomiAI\" \/>\n<meta property=\"article:published_time\" content=\"2024-06-21T13:48:55+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-26T05:53:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/stg-saasboomiorg-staging.kinsta.cloud\/wp-content\/uploads\/2024\/06\/Attention-Please--scaled.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1435\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Millusha\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/\",\"url\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/\",\"name\":\"How Chat GPT works &amp; a brief history of model architectures - BoomiAI\",\"isPartOf\":{\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/Attention-Please--scaled.webp\",\"datePublished\":\"2024-06-21T13:48:55+00:00\",\"dateModified\":\"2024-06-26T05:53:18+00:00\",\"author\":{\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#\/schema\/person\/0f783d66fcd14e685d6b02a880a37544\"},\"description\":\"Understanding how foundational models like GPT work. A dive into the intuition (and history) of model architectures that underpin such models.\",\"breadcrumb\":{\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#primaryimage\",\"url\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/Attention-Please--scaled.webp\",\"contentUrl\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/Attention-Please--scaled.webp\",\"width\":2560,\"height\":1435},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How Chat GPT works &amp; a brief history of model architectures\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#website\",\"url\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/\",\"name\":\"SaaSBoomi\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#\/schema\/person\/0f783d66fcd14e685d6b02a880a37544\",\"name\":\"Millusha\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/99200151f000fc137fc6577ab87b93d78b0a29b88fcf03acfd32e1fe3fc796db?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/99200151f000fc137fc6577ab87b93d78b0a29b88fcf03acfd32e1fe3fc796db?s=96&d=mm&r=g\",\"caption\":\"Millusha\"},\"url\":\"https:\/\/dev.matsio.com\/matsio\/saasboomi\/author\/millusha\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How Chat GPT works &amp; a brief history of model architectures - BoomiAI","description":"Understanding how foundational models like GPT work. A dive into the intuition (and history) of model architectures that underpin such models.","robots":{"index":"noindex","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"og_locale":"en_US","og_type":"article","og_title":"How Chat GPT works &amp; a brief history of model architectures - BoomiAI","og_description":"Understanding how foundational models like GPT work. A dive into the intuition (and history) of model architectures that underpin such models.","og_url":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/","og_site_name":"BoomiAI","article_published_time":"2024-06-21T13:48:55+00:00","article_modified_time":"2024-06-26T05:53:18+00:00","og_image":[{"width":2560,"height":1435,"url":"https:\/\/stg-saasboomiorg-staging.kinsta.cloud\/wp-content\/uploads\/2024\/06\/Attention-Please--scaled.webp","type":"image\/webp"}],"author":"Millusha","twitter_card":"summary_large_image","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/","url":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/","name":"How Chat GPT works &amp; a brief history of model architectures - BoomiAI","isPartOf":{"@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#website"},"primaryImageOfPage":{"@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#primaryimage"},"image":{"@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#primaryimage"},"thumbnailUrl":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/Attention-Please--scaled.webp","datePublished":"2024-06-21T13:48:55+00:00","dateModified":"2024-06-26T05:53:18+00:00","author":{"@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#\/schema\/person\/0f783d66fcd14e685d6b02a880a37544"},"description":"Understanding how foundational models like GPT work. A dive into the intuition (and history) of model architectures that underpin such models.","breadcrumb":{"@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#primaryimage","url":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/Attention-Please--scaled.webp","contentUrl":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-content\/uploads\/2024\/06\/Attention-Please--scaled.webp","width":2560,"height":1435},{"@type":"BreadcrumbList","@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/saas\/ai\/how-gpt-works-history-model-architectures\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/"},{"@type":"ListItem","position":2,"name":"How Chat GPT works &amp; a brief history of model architectures"}]},{"@type":"WebSite","@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#website","url":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/","name":"SaaSBoomi","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#\/schema\/person\/0f783d66fcd14e685d6b02a880a37544","name":"Millusha","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/99200151f000fc137fc6577ab87b93d78b0a29b88fcf03acfd32e1fe3fc796db?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/99200151f000fc137fc6577ab87b93d78b0a29b88fcf03acfd32e1fe3fc796db?s=96&d=mm&r=g","caption":"Millusha"},"url":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/author\/millusha\/"}]}},"_links":{"self":[{"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/posts\/21934","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/comments?post=21934"}],"version-history":[{"count":7,"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/posts\/21934\/revisions"}],"predecessor-version":[{"id":21964,"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/posts\/21934\/revisions\/21964"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/media\/21936"}],"wp:attachment":[{"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/media?parent=21934"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/categories?post=21934"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/tags?post=21934"},{"taxonomy":"chapters","embeddable":true,"href":"https:\/\/dev.matsio.com\/matsio\/saasboomi\/wp-json\/wp\/v2\/chapters?post=21934"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}