Tucano
Gelinta
Soo-saar
Qaybaha
Mawduucyo
Tucano waa qoys ka mid ah moodooyinka luqadeed ee transformer-ka oo miisaanno furan (open-weights) leh, kuwaas oo laga sameeyay Brazil lana tababaray si gaar ah qoraal af-Portuguisi ah. Moodooyinku waxay horay u sii tababareen GigaVerbo, oo ah xog-ururin ka kooban qiyaastii 200 bilyan oo calaamado (tokens) Portuguese ah oo laga saaray nuqullo soo noqnoqda (deduplicated), waxaana la heli karaa afar cabbir, laga bilaabo 160 milyan ilaa 2.4 bilyan oo parameters ah.
Noocyo la hagaajiyay (fine-tuned) waxaa ka mid ah kuwa raacaya tilmaamaha (instruction-following) iyo kuwa lagu hagaajiyey doorbid (preference-optimized), waxaana sidoo kale la sii daayay noocyo la xiriira oo multimodal ah oo lagu magacaabay ViTucano. Tucano waxaa loogu talagalay cilmi-baarayaasha iyo horumariyeyaasha ka shaqeeya hawlaha farsamaynta luqadda dabiiciga ah ee af-Portuguisi, oo ah luqad taariikh ahaan aan si weyn uga muuqan horumarinta moodooyinka luqadeed ee cabbir ballaaran.
Mashruuca waxaa lagu dukumenteyay warqad 2025 ah oo lagu daabacay joornaalka Patterns, waxaana lagu sii daayay shatiga Apache 2.0, iyadoo miisaannada iyo koodhka si dadweyne looga heli karo GitHub. Taxanaha moodooyinka hadda waa la kaydiyay (archived).
Asalka iyo Horumarinta
Tucano waxaa lagu sameeyay Brazil iyadoo dadaal go’an lagu doonayay in wax laga qabto yaraanta moodooyinka luqadeed ee baaxadda weyn ee si gaar ah loogu tababaray qoraalka Boortaqiisiga. In kasta oo moodooyin badan oo caan ah lagu tababaro inta badan xog-ururin Ingiriisi, Boortaqiisiga—oo ay ku hadlaan in ka badan 250 milyan oo qof oo ku nool Brazil, Portugal, iyo dalal kale—taariikh ahaan wuxuu helay dareen ka yar cilmi-baarista moodooyinka aasaasiga ah. Mashruuca Tucano wuxuu ujeedadiisu ahayd in farqigaas la xiro iyadoo la dhisayo moodooyin ku salaysan transformer-ka laga bilaabo bilowga, iyadoo la adeegsanayo xog-ururin weyn oo tayo sare leh oo Boortaqiis ah.
Moodooyinka waxaa lagu tababaray ka hor (pre-trained) GigaVerbo, oo ah xog-ururin qiyaastii 200 bilyan oo token Boortaqiis ah oo la kala-duubay (deduplicated) si loo taageero tababar luqadeed oo adag marka la eego miisaan ballaaran. Mashruuca waxaa lagu diiwaangeliyay warqadda Tucano: Advancing Neural Text Generation for Portuguese, oo lagu daabacay joornaalka Patterns 2025, dhammaan miisaannada (weights) iyo koodka tababarka-na si dadweyne ah ayaa looga heli karaa GitHub iyadoo hoos timaadda shatiga Apache 2.0.
Noocyo Moodo iyo Noocyo La Sifeeyay
Qoyska Tucano wuxuu ka kooban yahay afar cabbir oo moodo aasaasi ah, taas oo u oggolaanaysa cilmi-baarayaasha iyo horumariyeyaasha inay doortaan moodo ku habboon xaddidaadaha xisaabinta iyo shuruudaha hawsha:
- Tucano-160m – 160 milyan oo parameters ah
- Tucano-630m – 630 milyan oo parameters ah
- Tucano-1b1 – qiyaastii 1.1 bilyan oo parameters ah
- Tucano-2b4 – qiyaastii 2.4 bilyan oo parameters ah
Iyada oo laga soo tagayo moodooyinka aasaasiga ah ee la tababaray ka hor, mashruucu wuxuu soo saaray dhowr nooc oo la sifeeyay (fine-tuned) oo ka dhashay. Tucano-SFT iyo Tucano-DPO waxay matalaan noocyo tababar sifo-ilaalin ah (supervised fine-tuning) iyo noocyo “direct preference optimization” siday u kala horreeyaan, halka Tucano-2b4-Instruct uu yahay nooc raacaya tilmaamaha (instruction-following) ee moodooyinka aasaasiga ah ee ugu weyn. Noocyada la sifeeyay waxay kordhiyaan faa’iidada moodooyinka aasaasiga ah ee ku wajahan codsiyada wada-hadalka iyo kuwa diiradda saaraya hawlaha.
Moodooyinka la xiriira ee multimodal-ka ah, ViTucano-1b5-v1 iyo ViTucano-2b8-v1, ayaa sidoo kale la sii daayay iyagoo hoos yimaada magaca ViTucano, taas oo muujinaysa shaqo hoos-u-dhac ah oo ku daraysa habab muuqaal ah (visual modalities) marka lagu daro fahamka qoraalka Boortaqiisiga.
Isticmaalka iyo Dhagaystayaasha Loo Qorsheeyay
Tucano waxaa ugu horreyn loogu talagalay cilmi-baarayaasha iyo horumariyeyaasha ka shaqeeya hawlaha farsamaynta luqadda dabiiciga ah ee Boortaqiisiga. Isticmaalka suurtagalka ah waxaa ka mid ah soo saarista qoraalka, imtixaannada (benchmarks) moodooyinka luqadda, sifeeynta (fine-tuning) codsiyada Boortaqiisiga ee ku kooban goob gaar ah, iyo sidii saldhig cilmi-baaris (research baseline) oo lagu baranayo dabeecadda moodooyinka xaaladaha luqadaha kheyraadka yar. Helitaanka dhowr cabbir oo moodo ah waxay taageertaa xaalado kala duwan oo geynta (deployment), laga bilaabo tijaabooyin tacliimeed oo ku kooban qalab ilaa cilmi-baaris la taaban karo oo u baahan kheyraad badan.
Maadaama moodooyinka lagu sii daayay shatiga dabacsan ee Apache 2.0 oo leh miisaanno furan (open weights), si xor ah ayaa loo isticmaali karaa, loo beddeli karaa, loona qaybin karaa mar kale, taas oo ka dhigaysa kuwo la heli karo bulsho ballaaran, oo ay ku jiraan kuwa aan helin adeegyo API moodo oo gaar loo leeyahay.
Xaalada Hadda
Taxanaha moodooyinka Tucano hadda waa la keydiyay (archived), taas oo la micno ah in horumarinta firfircoon ay dhammaatay. Miisaannada, koodka, iyo dukumentiyada la xiriira weli si dadweyne ah ayaa looga heli karaa kaydka GitHub ee mashruuca si loogu tixraaco loona sii wado isticmaalka bulshada cilmi-baarista. Daabacaadda natiijooyinka ee Patterns 2025 waxay bixisaa diiwaan ay dib-u-eegis ku sameeyeen asxaab (peer-reviewed) oo ku saabsan habka (methodology), xogta tababarka, iyo natiijooyinka qiimeynta ee la xiriira mashruuca, taas oo taageerta ku celcelinta (reproducibility) iyo daraasad dheeraad ah.