I-Transformer 1B-7B Apache 2.0 Yes
English

Ungeniso

text

Imveliso

text

Imixholo

generative ainlpresearch

I-Pythia 1B yimodeli yolwimi ye-autoregressive eneeparamitha ezibhiliyoni enye ephuhliswe yi-EleutherAI yaza yakhutshwa ngoMatshi 2023. Yakhiwe kusetyenziswa i-archtecture ye-GPT-NeoX enamaleko ali-16 e-transformer, ubukhulu bemodeli obuyi-2048, kunye neentloko zokunikela ingqalelo ezisi-8, yaza yaqeqeshwa kwi-The Pile, iseti yedatha enkulu evulekileyo yombhalo.

Imodeli iyinxalenye yeseti ebanzi ye-Pythia, eyilelwe ngokukodwa ukuxhasa uphando lokutolika nokuphinda kuveliswe iziphumo ngokubonelela ngee-checkpoint zoqeqesho eziphakathi ezingama-154, nto leyo evumela uhlalutyo oluneenkcukacha lokuba iimodeli zolwimi ziguquka njani ngexesha loqeqesho. I-Pythia 1B iyafumaneka ngokukhululekileyo phantsi kwelayisenisi ye-Apache 2.0, kunye ne-weight epheleleyo ekhutshiweyo kwi-Hugging Face, iyenza ifikeleleke ukuze isetyenziswe kuphando lwezemfundo nolwenzululwazi.

Abayilwe ukuba bayisebenzise ngabaphandi abafunda indlela iimodeli zolwimi eziziphatha ngayo, imithetho yokunyusa isikali, kunye nobuchule bokusebenza koqeqesho kunokuba ibe kukusetyenziswa ngokubanzi.

Imvelaphi noPhuhliso

I-Pythia 1B iyinxalenye yeseti yemodeli yePythia, ingqokelela yeemodeli zolwimi ezinkulu eziphuhliswe yi-EleutherAI, umbutho ongajonganga nzuzo wophando lwe-AI osekelwe e-United States. Imodeli yakhululwa ngomhla we-10 Matshi 2023, yadalwa ngogxininiso oluthile lophando kunokuba ibe kukusetyenziswa ngokubanzi. I-EleutherAI yenzile iseti yePythia ukujongana nomqobo kwizixhobo ezikhoyo esidlangalaleni zokufunda indlela iimodeli zolwimi eziphuhlisa ngayo izakhono nokuziphatha ngexesha lenkqubo yoqeqesho. Uhlobo lwe-1B lubonisa ubungakanani obuphakathi ngaphakathi kweseti, lunikela ngokulinganisela phakathi kokufikeleleka ngokwemiqathango yokubala kunye nobuchule bemodeli obufanelekileyo kufundo oluneenkcukacha olusekwe kubungqina.

Imodeli yaqeqeshwa kwi-The Pile, iseti yedatha evulelekileyo ye-825 GB elungiselelwe yi-EleutherAI equka okubhaliweyo okwahlukeneyo kolwimi lwesiNgesi okuvela kwimithombo efana neencwadi, amaphepha emfundo, iindawo zokugcina ikhowudi, kunye nomxholo wewebhu. Uqeqesho lusebenzise ubungakanani bebhatch yeetokheni ezi-2 million kunye nesantya sokufunda se-3.0e-4, kusetyenziswa ilayibrari ye-GPT-NeoX njengenkqubo esisiseko yoqeqesho.

Uwakhiwo kunye neNgcaciso zeTekhnoloji

I-Pythia 1B yakhiwe phezu kwe-GPT-NeoX transformer architecture, iphumeza gpt_neox uhlobo loyilo. Iiparamitha zayo eziphambili zolwakhiwo ziquka:

  • iileya ezili-16 ze-transformer
  • ubukhulu bemodeli obuyi-2048
  • iintloko zokunikela ingqalelo ezi-8
  • malunga ne-1 bhiliyoni yeeparamitha zizonke, kunye ne-805 yezigidi ezimalunga nezingengomsebenzi wokufaka (non-embedding parameters)

Imodeli iyayamkela imibhalo njengendlela yokufaka nokuphuma, isebenza njengemodeli eqhelekileyo yokuzizalisa (autoregressive) yokwenza umbhalo. I-weight zikhutshwa ngokukhululekileyo kwi-Hugging Face phantsi kwesazisi EleutherAI/pythia-1b yaye zisasazwa phantsi kwelayisensi ye-Apache 2.0, evumela ukusetyenziswa okubanzi kuphando kunye nomsebenzi ovela kuyo (derivative work) ngemida emincinci.

Igalelo loPhando kunye neeNkqonkqo zoQeqesho eziPhakathi

Enye yeempawu ezahlula i-Pythia 1B kukufumaneka kwayo kwee-checkpoints zoqeqesho eziphakathi ezili-154. Ezi checkpoints zibamba imeko yemodeli ngamaxesha aqhelekileyo kulo lonke ixesha loqeqesho, zivumela abaphandi ukuba balandele indlela iindlela zangaphakathi, izakhono, kunye nokuziphatha okuvela ngayo nokuguquka ngayo ngokuhamba kwexesha. Eli nqanaba lokucaca aliqhelekanga phakathi kweemodeli zolwimi ezikhutshwe esidlangalaleni kwaye libalulekile kwifilosofi yoyilo yeseti.

Le mpawu yenza i-Pythia 1B ixabiseke ngakumbi kuphando olujolise kwi-training dynamics, mechanistic interpretability, kunye nokufunda imithetho yokunyusa (scaling laws). Abaphandi banokusebenzisa ii-checkpoints ukujongisisa imibuzo efana nokuba ulwazi oluthile lolwimi okanye olwenyaniso lufunyanwa nini, indlela inkumbulo (memorization) ekhula ngayo, okanye indlela iipatheni zokuqwalasela (attention patterns) ezitshintsha ngayo ngexesha loqeqesho. Usetyenziso loqeqesho olungaguquguqukiyo kuwo onke iimodeli ezikwi-Pythia suite lukwanceda ekwenzeni uthelekiso olulawulwayo phakathi kobukhulu beeparamitha ezahlukeneyo.

Usetyenziso Olucetyiweyo kunye Nokufikeleleka

I-Pythia 1B icetyelwe ngokucacileyo iinjongo zophando. Uyilo lwayo lubeka phambili ukuphinda kuveliswe (reproducibility) kunye nokutolika (interpretability) ngaphezu kokusebenza okujolise kumsebenzi othile, yaye ayilungiselelwanga okanye ayikhuthazwa ukuba isetyenziswe kwimveliso okanye ekusetyenzisweni ngabantu bokugqibela. Ukukhululwa okuvulekileyo kwee-model weights, idatha yoqeqesho, kunye nee-checkpoints eziphakathi kuxhasa uluntu olubanzi lwenzululwazi ekukwazini ukujonga (audit), ukuphinda, nokwakha phezu komsebenzi.

Ngenxa yokuba imodeli incinci ngokwentelekiso kwiiparamitha ezi-1 bhiliyoni, ihlala ifikeleleka ngokwemiqathango yokubala kubaphandi abangenaso isiseko esikhulu se-GPU, nto leyo yehlisa umqobo ekwenzeni iimvavanyo kwizinto zangaphakathi zemodeli yolwimi. Ukufumaneka kwayo kwi-Hugging Face kuqinisekisa ukudityaniswa ngokulula kunye neenkqubo eziqhelekileyo zophando kusetyenziswa ilayibrari ye-Transformers kunye nezixhobo ezinxulumene nayo.

Ingxelo