[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["没有我需要的信息","missingTheInformationINeed","thumb-down"],["太复杂/步骤太多","tooComplicatedTooManySteps","thumb-down"],["内容需要更新","outOfDate","thumb-down"],["翻译问题","translationIssue","thumb-down"],["示例/代码问题","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-01-31。"],[[["\u003cp\u003eFoundation LLMs are pre-trained on vast amounts of text, enabling them to understand language structure and generate creative content, but they require fine-tuning for specific ML tasks like classification or regression.\u003c/p\u003e\n"],["\u003cp\u003eFine-tuning adapts a foundation LLM to a particular task by training it on task-specific data, improving its performance for that task but retaining the original model size.\u003c/p\u003e\n"],["\u003cp\u003eDistillation produces a smaller, more efficient version of a fine-tuned LLM, sacrificing some performance for reduced computational and environmental costs.\u003c/p\u003e\n"],["\u003cp\u003ePrompt engineering allows users to customize an LLM's output by providing examples or instructions within the prompt, leveraging the model's existing abilities without changing its parameters.\u003c/p\u003e\n"],["\u003cp\u003eOffline inference pre-computes and caches LLM predictions for tasks where real-time response isn't critical, saving resources and enabling the use of larger models.\u003c/p\u003e\n"]]],[],null,["# LLMs: Fine-tuning, distillation, and prompt engineering\n\nThe [previous unit](/machine-learning/crash-course/llm/transformers) described general-purpose LLMs, variously\nknown as:\n\n- **foundation LLMs**\n- **base LLMs**\n- **pre-trained LLMs**\n\nA foundation LLM is trained on enough natural language to \"know\" a remarkable\namount about grammar, words, and idioms. A foundation language model can\ngenerate helpful sentences about topics it is trained on.\nFurthermore, a foundation LLM can perform certain tasks traditionally called\n\"creative,\" like writing poetry. However, a foundation LLM's generative text\noutput isn't a solution for other kinds of common ML problems, such as\nregression or classification. For these use cases, a foundation LLM can serve\nas a *platform* rather than a solution.\n\nTransforming a foundation LLM into a solution that meets an application's\nneeds requires a process called *fine-tuning* . A secondary process called\n*distillation* generates a smaller (fewer parameters) version of the fine-tuned\nmodel.\n\nFine-tuning\n-----------\n\nResearch shows that the pattern-recognition abilities of foundation\nlanguage models are so powerful that they sometimes require relatively\nlittle additional training to learn specific tasks.\nThat additional training helps the model make better predictions\non a specific task. This additional training, called\n[**fine-tuning**](/machine-learning/glossary#fine-tuning),\nunlocks an LLM's practical side.\n\nFine-tuning trains on examples *specific* to the task your application\nwill perform. Engineers can sometimes fine-tune a foundation LLM on just a few\nhundred or a few thousand training examples.\n\nDespite the relatively tiny number of training examples, standard fine-tuning\nis often computationally expensive. That's because standard fine-tuning involves\nupdating the weight and bias of every parameter on each\n[**backpropagation**](/machine-learning/glossary#backpropagation) iteration.\nFortunately, a smarter process called [**parameter-efficient\ntuning**](/machine-learning/glossary#parameter-efficient-tuning)\ncan fine-tune an LLM by adjusting only a *subset* of parameters on each\nbackpropagation iteration.\n\nA fine-tuned model's predictions are usually better than the foundation LLM's\npredictions. However, a fine-tuned model contains the same number of\nparameters as the foundation LLM. So, if a foundation LLM contains ten billion\nparameters, then the fine-tuned version will also contain ten billion\nparameters.\n\nDistillation\n------------\n\nMost fine-tuned LLMs contain enormous numbers of parameters. Consequently,\nfoundation LLMs require enormous computational and environmental resources\nto generate predictions. Note that large swaths of those parameters are\ntypically irrelevant for a specific application.\n\n[**Distillation**](/machine-learning/glossary#distillation)\ncreates a smaller version of an LLM. The distilled LLM generates predictions\nmuch faster and requires fewer computational and environmental resources than\nthe full LLM. However, the distilled model's predictions are generally not\nquite as good as the original LLM's predictions. Recall that LLMs with more\nparameters almost always generate better predictions than LLMs with fewer\nparameters.\n\n#### Click the icon to learn how distillation works.\n\nThe most common form of distillation uses bulk inference to label data.\nThis labeled data is then used to train a new, smaller model (known as the\nstudent model) that can be more affordably served.\nThe labeled data serves as a channel by which the larger model (known as the\nteacher model) funnels its knowledge to the smaller model.\n\nFor example, suppose you need an online toxicity scorer for automatic moderation\nof comments. In this case, you can use a large offline toxicity scorer to label\ntraining data. Then, you can use that training data to distill a toxicity scorer\nmodel small enough to be served and handle live traffic.\n\nA teacher model can sometimes provide more labeled data than it was trained on.\nAlternatively, a teacher model can funnel a numerical score instead of a binary\nlabel to the student model. A numerical score provides a richer training signal\nthan a binary label, enabling the student model to predict not only positive\nand negative classes but also borderline classes.\n\nPrompt engineering\n------------------\n\n[**Prompt engineering**](/machine-learning/glossary#prompt-engineering)\nenables an LLM's *end users* to customize the model's output.\nThat is, end users clarify how the LLM should respond to their prompt.\n\nHumans learn well from examples. So do LLMs. Showing one example to an LLM\nis called\n[**one-shot prompting**](/machine-learning/glossary#one-shot-prompting).\nFor example, suppose you want a model to use the following format to output\na fruit's family:\n\u003e User inputs the name of a fruit: LLM outputs that fruit's class.\n\nA one-shot prompt shows the LLM a single example of the preceding format\nand then asks the LLM to complete a query based on that example. For instance: \n\n```\npeach: drupe\napple: ______\n```\n\nA single example is sometimes sufficient. If it is, the LLM outputs a useful\nprediction. For instance: \n\n```\napple: pome\n```\n\nIn other situations, a single example is insufficient. That is, the user must\nshow the LLM *multiple* examples. For instance, the following prompt contains\ntwo examples: \n\n```\nplum: drupe\npear: pome\nlemon: ____\n```\n\nProviding multiple examples is called\n[**few-shot prompting**](/machine-learning/glossary#few-shot-prompting).\nYou can think of the first two lines of the preceding prompt as training\nexamples.\n\nCan an LLM provide useful predictions with no examples ([**zero-shot\nprompting**](/machine-learning/glossary#zero-shot-prompting))? Sometimes, but\nLLMs like context. Without context, the following zero-shot prompt might\nreturn information about the technology company rather than the fruit: \n\n```\napple: _______\n```\n| **Note:** Prompt engineering doesn't alter the model's parameters. Prompts leverage the pattern-recognition abilities of the existing LLM.\n\nOffline inference\n-----------------\n\nThe number of parameters in an LLM is sometimes so\nlarge that [**online inference**](/machine-learning/glossary#online-inference)\nis too slow to be practical for real-world tasks like regression or\nclassification. Consequently, many engineering teams rely on\n[**offline inference**](/machine-learning/glossary#offline-inference) (also\nknown as *bulk inference* or *static inference*) instead.\nIn other words, rather than responding to queries at serving time, the\ntrained model makes predictions in advance and then caches those predictions.\n\nIt doesn't matter if it takes a long time for an LLM to complete its task if\nthe LLM only has to perform the task once a week or once a month.\n\nFor example, Google Search\n[used an LLM](https://blog.google/products/search/how-mum-improved-google-searches-vaccine-information/)\nto perform offline inference in order to cache a list of over 800 synonyms\nfor Covid vaccines in more than 50 languages. Google Search then used the\ncached list to identify queries about vaccines in live traffic.\n\nUse LLMs responsibly\n--------------------\n\nLike any form of machine learning, LLMs generally share the biases of:\n\n- The data they were trained on.\n- The data they were distilled on.\n\nUse LLMs fairly and responsibly, following the guidelines presented\nin the [**data modules**](/machine-learning/crash-course/numerical-data) and the\n[**Fairness module**](/machine-learning/crash-course/fairness).\n\nExercise: Check your understanding\n----------------------------------\n\nWhich of the following statements is true about LLMs? \nA distilled LLM contains fewer parameters than the foundation language model it sprung from. \nYes, distillation reduces the number of parameters. \nA fine-tuned LLM contains fewer parameters than the foundation language model it was trained on. \nA fine-tuned model contains the *same number* of parameters as the original foundation language model. \nAs users perform more prompt engineering, the number of parameters in an LLM grows. \nPrompt engineering doesn't add (or remove or alter) LLM parameters.\n| **Key terms:**\n|\n| - [Backpropagation](/machine-learning/glossary#backpropagation)\n| - [Distillation](/machine-learning/glossary#distillation)\n| - [Few-shot prompting](/machine-learning/glossary#few-shot-prompting)\n| - [Fine-tuning](/machine-learning/glossary#fine-tuning)\n| - [Offline inference](/machine-learning/glossary#offline-inference)\n| - [One shot prompting](/machine-learning/glossary#one-shot-prompting)\n| - [Online inference](/machine-learning/glossary#online-inference)\n| - [Parameter efficient tuning](/machine-learning/glossary#parameter-efficient-tuning)\n| - [Prompt engineering](/machine-learning/glossary#prompt-engineering)\n- [Zero-shot prompting](/machine-learning/glossary#zero-shot-prompting) \n[Help Center](https://support.google.com/machinelearningeducation)"]]