Flan-T5: ผลลัพธ์ที่ยอดเยี่ยมด้วย LLM ที่เล็กกว่าและมีประสิทธิภาพมากกว่า

Flan-T5 นำเสนอประสิทธิภาพที่โดดเด่นสำหรับแอปพลิเคชัน NLP ที่หลากหลาย แม้ว่าจะเปรียบเทียบกับโมเดลภาษาที่มีขนาดใหญ่มากก็ตาม ลองตอนนี้บน Paperspace ซึ่งขับเคลื่อนโดย IPU

ผู้แต่ง: Harry Mellor วิศวกร AI ที่ Graphcore

ในโลกของโมเดลภาษา AI ไม่มีโซลูชันแบบใดที่เหมาะกับทุกรูปแบบ

ผู้ใช้เชิงพาณิชย์เริ่มตระหนักมากขึ้นว่าโมเดลภาษาขนาดใหญ่พิเศษ แม้จะมีความสามารถในวงกว้าง แต่ก็ใช้ AI มากเกินไปสำหรับแอปพลิเคชันจำนวนมาก

เพนนี (หรือดอลลาร์) มักจะลดลงเมื่อได้รับบิลที่เกินขนาดจากเจ้าของโมเดลที่เป็นกรรมสิทธิ์หรือผู้ให้บริการระบบคลาวด์ที่ต้องการ สมมติว่าพวกเขาสามารถรักษาความพร้อมใช้งานของ GPU สำหรับระบบ A100 และ H100 ที่จำเป็นในการใช้งานโมเดลขั้นสูงได้

ในทางกลับกัน หลายๆ คนกำลังมองหาทางเลือกโอเพ่นซอร์สที่มีประสิทธิภาพมากกว่าเช่น GPT-3/4

ประหม่า T5

ในเดือนธันวาคม 2022 Google เผยแพร่ "Scaling Instruction-Finetuned Language Models" ซึ่งทำการปรับแต่งอย่างละเอียดอย่างกว้างขวางสำหรับการรวบรวมงานที่หลากหลายในโมเดลต่างๆ (PaLM, T5, U-PaLM)

ส่วนหนึ่งของเอกสารเผยแพร่นี้คือการเปิดตัวจุดตรวจสอบ Flan-T5 "ซึ่งให้ประสิทธิภาพการถ่ายภาพสองสามช็อตที่แข็งแกร่ง" โดยมีการนับพารามิเตอร์ค่อนข้างน้อย "แม้จะเปรียบเทียบกับรุ่นที่ใหญ่กว่ามาก" เช่นเดียวกับสมาชิกที่ใหญ่ที่สุดในตระกูล GPT

ในบล็อกนี้ เราจะแสดงวิธีที่คุณสามารถใช้ Flan-T5 ที่ทำงานบน Paperspace Gradient Notebook ซึ่งขับเคลื่อนโดย Graphcore IPU Flan-T5-Large สามารถรันบน IPU-POD4 ได้โดยใช้การทดลองใช้ฟรี 6 ชั่วโมงของ Paperspace ในขณะที่ Flan-T5-XL สามารถรันบน IPU-POD16 แบบชำระเงินได้

เราจะพิจารณาปริมาณงาน NLP ทั่วไปและพิจารณาสิ่งต่อไปนี้:

Flan-T5 ดีจริงแค่ไหน?
ฉันจะรัน Flan-T5 บน IPU ได้อย่างไร
ฉันสามารถใช้ Flan-T5 เพื่ออะไรได้บ้าง?
เหตุใดฉันจึงขยับขึ้นเป็น Flan-T5-XL?

Flan-T5 ดีจริงแค่ไหน?

เริ่มต้นด้วยการดูตัวเลขประสิทธิภาพจากรายงานที่เขียนโดย Google:

ผลลัพธ์เหล่านี้น่าประหลาดใจ โปรดสังเกตว่า:

Flan-T5 ทำงานได้ดีกว่า T5 ประมาณ 2 เท่าใน MMLU, BBH และ MGSM
ใน TyDiQA เรายังเห็นการเกิดขึ้นของความสามารถใหม่อีกด้วย
Flan-T5-Large ดีกว่ารุ่นก่อนหน้าของ T5 ทั้งหมด (แม้แต่ XXL)

สิ่งนี้ทำให้ Flan-T5 กลายเป็นสัตว์ร้ายที่แตกต่างไปจาก T5 ที่คุณอาจรู้จักอย่างสิ้นเชิง ตอนนี้เรามาดูกันว่า Flan-T5-Large และ Flan-T5-XL เปรียบเทียบกับรุ่นอื่นๆ ในเกณฑ์มาตรฐาน MMLU อย่างไร:

เมื่อสังเกตว่า Flan-T5 มี MMLU ไม่ได้รับการฝึกอบรม ตารางนี้แสดงให้เห็นว่า:

Flan-T5-Large และ Flan-T5-XL (พร้อมพารามิเตอร์ 0.8B และ 3B ตามลำดับ) ทำงานคล้ายกับรุ่นอื่นๆ ที่มีพารามิเตอร์มากกว่าอย่างเห็นได้ชัด เช่น GPT-3 (พารามิเตอร์ 175B) และ Galactica (พารามิเตอร์ 120B)
GPT-3 จำเป็นต้องได้รับการปรับแต่งอย่างละเอียดสำหรับงานการวัดประสิทธิภาพเพื่อที่จะเอาชนะ Flan-T5-XL
Flan-T5 มีประสิทธิภาพเหนือกว่า LLM รุ่นล่าสุดที่เล็กกว่า เช่น PaLM และ LLaMA (ในขณะเดียวกันก็เล็กกว่าหลายเท่าด้วย)

ฉันจะรัน Flan-T5 บน IPU ได้อย่างไร

เนื่องจากจุดตรวจสอบ Flan-T5 มีอยู่บน Hugging Face คุณจึงสามารถใช้การรวม Hugging Face ของ Graphcore (🤗 Optimum Graphcore) เพื่อรัน Flan-T5 ได้อย่างง่ายดายด้วยไปป์ไลน์การอนุมานมาตรฐาน

หากคุณมีแอพพลิเคชั่น Hugging Face อยู่แล้วและต้องการลองใช้กับ IPU คุณก็ทำได้ง่ายๆ ดังนี้:

- from transformers import pipeline 
+ from optimum.graphcore import pipeline 
 
- text_generator = pipeline("text2text-generation", model="google/flan-t5-large") 
+ text_generator = pipeline("text2text-generation", model="google/flan-t5-large", ipu_config="Graphcore/t5-large-ipu") 

text_generator("Please solve the following equation: x^2 - 9 = 0") 
[{'generated_text': '3'}]

ตอนนี้เรามากำหนดตัวสร้างข้อความของเราเองเพื่อใช้ในส่วนที่เหลือของสมุดบันทึกนี้ ขั้นแรก ตรวจสอบให้แน่ใจว่าสภาพแวดล้อมเสมือน Python ของคุณมีการติดตั้ง 🤗 Optimum Graphcore เวอร์ชันล่าสุด:

%pip install "optimum-graphcore>=0.6.1, <0.7.0"

ตำแหน่งของไดเร็กทอรีแคชสามารถกำหนดค่าได้ผ่านตัวแปรสภาพแวดล้อมหรือในโน้ตบุ๊กโดยตรง:

import os 
executable_cache_dir=os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "./exe_cache/") 
num_available_ipus=int(os.getenv("NUM_AVAILABLE_IPU", 4))

ต่อไป มานำเข้า pipeline จาก optimum.graphcore และสร้างไปป์ไลน์ Flan-T5 ของเราสำหรับจำนวน IPU ที่เหมาะสม:

from optimum.graphcore import pipeline 
 
size = {4: "large", 16: "xl"} 
flan_t5 = pipeline( 
    "text2text-generation", 
    model=f"google/flan-t5-{size[num_available_ipus]}", 
    ipu_config=f"Graphcore/t5-{size[num_available_ipus]}-ipu", 
    max_input_length=896, 
    ipu_config=ipu_config, 
) 
flan_t5.model.ipu_config.executable_cache_dir = executable_cache_dir

ตอนนี้ เรามาถามคำถามสุ่มๆ กัน:

questions = [ 
    "Solve the following equation for x: x^2 - 9 = 0", 
    "At what temperature does nitrogen freeze?", 
    "In order to reduce symptoms of asthma such as tightness in the chest, wheezing, and difficulty breathing, what do you recommend?", 
    "Which country is home to the tallest mountain in the world?" 
] 
for out in flan_t5(questions): 
    print(out) 
Graph compilation: 100%|██████████| 100/100 [05:20<00:00] 
Graph compilation: 100%|██████████| 100/100 [02:56<00:00] 
 
 
{'generated_text': '3'} 
{'generated_text': '-32 °C'} 
{'generated_text': 'ibuprofen'} 
{'generated_text': 'nepal'}

โปรดทราบว่าคำตอบบางข้ออาจผิด การดึงข้อมูลจากตัวโมเดลเองไม่ใช่จุดประสงค์ของ Flan-T5 อย่างไรก็ตาม หากคุณใช้ Flan-T5-XL ก็มีโอกาสผิดพลาดน้อยลง (กลับมาที่สมุดบันทึกนี้พร้อมกับ IPU-POD16 เพื่อดูความแตกต่าง!)

ฉันสามารถใช้ Flan-T5 เพื่ออะไรได้บ้าง?

Flan-T5 ได้รับการปรับแต่งอย่างละเอียดในงานต่างๆ หลายพันรายการในชุดข้อมูลหลายร้อยชุด ดังนั้นไม่ว่าคุณจะทำงานอะไร ก็คุ้มค่าที่จะดูว่า Flan-T5 สามารถตอบสนองความต้องการของคุณหรือไม่ ที่นี่เราจะสาธิตสิ่งทั่วไปบางประการ:

การวิเคราะห์ความรู้สึก

sentiment_analysis = ( 
    "Review: It gets too hot, the battery only can last 4 hours. Sentiment: Negative\n" 
    "Review: Nice looking phone. Sentiment: Positive\n" 
    "Review: Sometimes it freezes and you have to close all the open pages and then reopen where you were. Sentiment: Negative\n" 
    "Review: Wasn't that impressed, went back to my old phone. Sentiment:" 
) 

flan_t5(sentiment_analysis)[0]["generated_text"] 
Negative

การรับรู้เอนทิตีที่มีชื่อขั้นสูง

ตัวอย่างต่อไปนี้ดัดแปลงมาจากหน้า Wikipedia ที่เกี่ยวข้องกับแต่ละบริษัทที่กล่าวถึง

advanced_ner = """Microsoft Corporation is a company that makes computer software and video games. Bill Gates and Paul Allen founded the company in 1975 
[Company]: Microsoft, [Founded]: 1975, [Founders]: Bill Gates, Paul Allen 
 
Amazon.com, Inc., known as Amazon , is an American online business and cloud computing company. It was founded on July 5, 1994 by Jeff Bezos 
[Company]: Amazon, [Founded]: 1994, [Founders]: Jeff Bezos 
 
Apple Inc. is a multinational company that makes personal computers, mobile devices, and software. Apple was started in 1976 by Steve Jobs and Steve Wozniak.""" 

flan_t5(advanced_ner)[0]["generated_text"]
[Company]: Apple, [Founded]: 1976, [Founders]: Steve Jobs, Steve Wozniak

การตอบคำถาม

ตัวอย่างต่อไปนี้มาจากชุดข้อมูล "squad"

context = 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24-10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.'
question = "Which NFL team represented the AFC at Super Bowl 50?"
# The correct answer is Denver Broncos
flan_t5(f"{context} {question}")[0]['generated_text']
Denver Broncos

การจำแนกประเภทเจตนา

intent_classification = """[Text]: I really need to get a gym membership, I'm exhausted. 
[Intent]: get gym membership 
 
[Text]: What do I need to make a carbonara? 
[Intent]: cook carbonara 
 
[Text]: I need all these documents sorted and filed by Monday. 
[Intent]:""" 

flan_t5([intent_classification])[0]["generated_text"]
file documents

การสรุป

ตัวอย่างต่อไปนี้มาจากชุดข้อมูล xsum

summarization=""" 
Document: Firstsource Solutions said new staff will be based at its Cardiff Bay site which already employs about 800 people. 
The 300 new jobs include sales and customer service roles working in both inbound and outbound departments. 
The company's sales vice president Kathryn Chivers said: "Firstsource Solutions is delighted to be able to continue to bring new employment to Cardiff." 
Summary: Hundreds of new jobs have been announced for a Cardiff call centre. 
 
Document: The visitors raced into a three-goal first-half lead at Hampden. 
Weatherson opened the scoring with an unstoppable 15th-minute free-kick, and he made it 2-0 in the 27th minute. 
Matt Flynn made it 3-0 six minutes later with a fine finish. 
Queen's pulled a consolation goal back in stoppage time through John Carter. 
Summary: Peter Weatherson netted a brace as Annan recorded only their second win in eight matches. 
 
Document: Officers searched properties in the Waterfront Park and Colonsay View areas of the city on Wednesday. 
Detectives said three firearms, ammunition and a five-figure sum of money were recovered. 
A 26-year-old man who was arrested and charged appeared at Edinburgh Sheriff Court on Thursday. 
Summary: 
""" 
flan_t5(summarization)[0]["generated_text"]
A man has been arrested after a firearm was found in a property in Edinburgh.

การจำแนกข้อความ

text_classification_1 = """A return ticket is better value than a single. 
topic: travel cost 

You can start from the basic stitches, and go from there. 
topic: learning knitting 

The desk which I bought yesterday is very big. 
topic: furniture size 

George Washington was president of the United States from 1789 to 1797. 
topic:""" 

flan_t5(text_classification_1)[0]["generated_text"]
George Washington presidency
text_classification_2 = """FLAN-T5 was released in the paper Scaling Instruction-Finetuned Language Models - it is an enhanced version of T5 that has been finetuned in a mixture of tasks. 
keywords: released, enhanced, finetuned 

The IPU, or Intelligence Processing Unit, is a highly flexible, easy-to-use parallel processor designed from the ground up for AI workloads. 
keywords: processor, AI 

Paperspace is the platform for AI developers. providing the speed and scale needed to take AI models from concept to production. 
keywords:""" 

flan_t5(text_classification_2)[0]["generated_text"]
paperspace, AI, scale

เหตุใดฉันจึงขยับขึ้นเป็น Flan-T5-XL?

ดังที่เราเห็นก่อนหน้านี้ เมื่อดูผลลัพธ์จากรายงาน Flan-T5-XL นั้นดีกว่า Flan-T5-Large ประมาณ 40% (โดยเฉลี่ย) ในงานตรวจสอบความถูกต้อง ดังนั้นเมื่อตัดสินใจว่า Flan-T5-XL คุ้มค่ากับคุณหรือไม่ ให้ถามตัวเองด้วยคำถามต่อไปนี้:

ข้อมูลของฉันต้องการความเข้าใจทางภาษามากขึ้นสำหรับงานที่จะดำเนินการหรือไม่?
งานของฉันซับซ้อนเกินไปสำหรับโมเดลที่มีขนาดเล็กถึง Flan-T5-Large และง่ายเกินไปสำหรับโมเดลขนาดใหญ่ถึง GPT-3 หรือไม่
งานของฉันต้องการลำดับเอาต์พุตที่ยาวกว่าซึ่งจำเป็นต้องใช้ Flan-T5-XL เพื่อสร้างหรือไม่

เพื่อเป็นการสาธิต ให้เราดูตัวอย่างงานที่คำตอบของคำถามข้างต้นทั้งหมดคือใช่ สมมติว่าคุณมี AI การบริการลูกค้าที่คุณใช้ตอบคำถามพื้นฐานเพื่อลดภาระงานของเจ้าหน้าที่บริการลูกค้าของคุณ ความต้องการนี้:

ความสามารถทางภาษาที่แข็งแกร่งในการแยกวิเคราะห์และสร้างข้อความขนาดกลาง
LLM ที่สามารถเรียนรู้ได้ดีจากบริบท แต่ไม่มีประวัติของมนุษย์ทั้งหมดฝังอยู่ในพารามิเตอร์
ความสามารถในการสร้างคำตอบหลายประโยค แต่ไม่นานไปกว่านี้มากนัก

เมื่อดูโค้ดด้านล่าง เราจะเห็นบริบทบางอย่างเกี่ยวกับ Graphcore ที่ให้ไว้ในอินพุต รวมถึงไพรเมอร์สำหรับการตอบกลับการสนทนาจากโมเดล ดังที่คุณเห็นจากตัวอย่าง Flan-T5-XL สามารถเข้าใจข้อมูลที่ให้ไว้ในบริบทและให้คำตอบที่เป็นประโยชน์และเป็นธรรมชาติสำหรับคำถามที่ถูกถาม

from IPython.display import clear_output 
 
class ChatBot: 
    def __init__(self, model, context) -> None: 
        self.model = model 
        self.initial_context = context 
        self.context = self.initial_context 
        self.user, self.persona = [x.split(":")[0] for x in context.split("\n")[-2:]] 
 
    def ask(self, question): 
        question += "." if question[-1] not in [".", "?", "!"] else "" 
        x = f"{self.context}\n{self.user}: {question}\n{self.persona}: " 
        # print(f"\n{x}\n") 
        y = self.model(x) 
        response = y[0]["generated_text"] 
        self.context = f"{x}{response}" 
        return response 
 
    def session(self): 
        print("Starting session", flush=True) 
        prompt = input() 
        while prompt != "": 
            if prompt == "reset": 
                clear_output() 
                print("Starting session", flush=True) 
                self.context = self.initial_context 
                prompt = input() 
            print(f"{self.user.title()}: {prompt}", flush=True) 
            answer = self.ask(prompt) 
            print(f"{self.persona.title()}: {answer}", flush=True) 
            prompt = input() 
        print("Ending session", flush=True)
context = f"""This is a conversation between a [customer] and a [virtual assistant]. 
The [virtual assistant] works at Graphcore. Here is some informaton about Graphcore: 
- Graphcore is located in Bristol. 
- Graphcore invented the intelligence processing unit (IPU). It is purpose built for AI applications. 
- The currently available IPU models are: Classic IPU, Bow IPU, C600. 
- IPUs are available on: Paperspace, Gcore Cloud and Graphcloud. 
 
[virtual assistant]: Hello, welcome to Graphcore, how can I help you today? 
[customer]: I'd like to ask some questions about your company. 
[virtual assistant]: Ok, I can help you with that.""" 
chatbot = ChatBot(flan_t5, context) 
chatbot.session()
Starting session 
[Customer]: What is an IPU? 
[Virtual Assistant]: The Intelligence Processing Unit (IPU) is a computer chip that is used to process artificial intelligence. 
[Customer]: Who makes it? 
[Virtual Assistant]: Graphcore is the manufacturer of the IPU. 
[Customer]: Can I use them? 
[Virtual Assistant]: Yes, I'm sure you can. 
[Customer]: Where? 
[Virtual Assistant]: The IPU is available on Paperspace, Gcore and Graphcloud. 
Ending session
flan_t5.model.detachFromDevice()

บทสรุป

โดยสรุป คำตอบสำหรับคำถามที่เราถามในบทนำคือ:

Flan-T5 ดีจริงแค่ไหน?

A: ดีกว่า T5 สองเท่าและเทียบเท่า GPT-3 ตามเกณฑ์มาตรฐาน MMLU

ฉันจะรัน Flan-T5 บน IPU ได้อย่างไร

ตอบ: เปลี่ยนการนำเข้าหนึ่งรายการและเพิ่มอาร์กิวเมนต์คำหลักหนึ่งรายการให้กับการสร้างอินสแตนซ์ไปป์ไลน์ของคุณ

ฉันสามารถใช้ Flan-T5 เพื่ออะไรได้บ้าง?

ตอบ: ด้วยงานที่ได้รับการปรับแต่งที่หลากหลาย แทบทุกอย่าง

เหตุใดฉันจึงขยับขึ้นเป็น Flan-T5-XL?

ตอบ: สำหรับการเพิ่มประสิทธิภาพประมาณ 40% เมื่อเทียบกับ Flan-T5-Large ช่วยให้ทำงานที่มีความต้องการมากขึ้น

หากคุณต้องการเรียนรู้เพิ่มเติมเกี่ยวกับวิธีที่เราทำให้ T5 ทำงานอย่างถูกต้องใน Float16 โปรดดู "บล็อกด้านเทคนิค" ของเราในหัวข้อนี้

คุณยังสามารถลองใช้ T5 รูปแบบอื่นๆ บน IPU ได้:

หากคุณต้องการสำรวจ NLP บน IPU ต่อไป โปรดดูที่ "บล็อก GPT-J Fine-Tuning" และสมุดบันทึกที่เกี่ยวข้อง

Flan-T5: ผลลัพธ์ที่ยอดเยี่ยมด้วย LLM ที่เล็กกว่าและมีประสิทธิภาพมากกว่า

ประหม่า T5

Flan-T5 ดีจริงแค่ไหน?

ฉันจะรัน Flan-T5 บน IPU ได้อย่างไร

ฉันสามารถใช้ Flan-T5 เพื่ออะไรได้บ้าง?

การวิเคราะห์ความรู้สึก

การรับรู้เอนทิตีที่มีชื่อขั้นสูง

การตอบคำถาม

การจำแนกประเภทเจตนา

การสรุป

การจำแนกข้อความ

เหตุใดฉันจึงขยับขึ้นเป็น Flan-T5-XL?

บทสรุป

Flan-T5 ดีจริงแค่ไหน?

ฉันจะรัน Flan-T5 บน IPU ได้อย่างไร

ฉันสามารถใช้ Flan-T5 เพื่ออะไรได้บ้าง?

เหตุใดฉันจึงขยับขึ้นเป็น Flan-T5-XL?

คำถามในหัวข้อ