Tucano
Ntinye
Mmepụta
Ụdị ọmụmụ
Isiokwu
Tucano bụ ezinụlọ nke ụdị asụsụ transformer nwere “open-weights” nke e mepụtara na Brazil ma zụọ ya naanị na ederede asụsụ Portuguese. A zụrụ ụdị ndị a tupu oge (pre-trained) na GigaVerbo, bụ data set nke ihe dị ka ijeri 200 token Portuguese e wepụrụ ugboro ugboro (deduplicated), ma a na-enye ya n’ụdị nha anọ dị iche iche site na nde 160 ruo ijeri 2.4 nke “parameters”.
Ụdị e mere ka ha zụọ nke ọma (fine-tuned) gụnyere ụdị na-eso ntuziaka (instruction-following) na ụdị e mere ka ha dabere na mmasị (preference-optimized), ma e wepụtara ụdị ndị metụtara multimodal n’okpuru aha ViTucano. Ebumnuche Tucano bụ maka ndị nyocha na ndị mmepe na-arụ ọrụ n’ihe omume nhazi asụsụ okike (natural language processing) n’asụsụ Portuguese, nke bụ asụsụ e nweburu n’oge gara aga n’ime mmepe ụdị asụsụ buru ibu.
A kọwara ọrụ a n’akwụkwọ 2025 e bipụtara n’akwụkwọ akụkọ Patterns, ma a tọhapụrụ ya n’okpuru ikike Apache 2.0, ebe a na-eme ka “weights” na koodu dị maka ọha na eze na GitHub. A na-edobe usoro ụdị a ugbu a n’ebe e chekwara (archived).
Ndabere na Mmepe
A mepụtara Tucano na Brazil dịka mbọ e mere iji dozie ụkọ nke nnukwu ụdị nlere asụsụ a zụrụ kpọmkwem na ederede Portuguese. Ọ bụ ezie na ọtụtụ ụdị nlere asụsụ a ma ama ka a na-azụkarị ha site n’akụkụ ederede Bekee, Portuguese—nke a na-asụ site n’akarị mmadụ karịrị nde 250 n’ofe Brazil, Portugal, na mba ndị ọzọ—n’oge gara aga enwetaghị nlebara anya nke ukwuu n’ọmụmụ ihe nlere anya maka foundation model. Ihe oru Tucano bu n’obi imechi ọdịiche a site n’iwulite ụdị nlere dabeere na transformer site na mmalite, site n’iji nnukwu data set Portuguese nke nwere ezigbo mma.
A zụrụ ụdị nlere ndị ahụ tupu oge (pre-trained) na GigaVerbo, data set nke ihe dị ka tokens Portuguese ijeri 200 a wepụrụ ugboro ugboro (deduplicated) nke e chịkọtara iji kwado nlere asụsụ siri ike n’ogo buru ibu. A kọwara oru a n’akwụkwọ Tucano: Advancing Neural Text Generation for Portuguese, nke e bipụtara n’akwụkwọ akụkọ Patterns n’afọ 2025, ma weights na koodu ọzụzụ niile dị n’ihu ọha na eze na GitHub n’okpuru ikike Apache 2.0.
Ụdị Ụdị Model na Ụdị Emelitere (Fine-Tuned)
Ezinaụlọ Tucano gụnyere nha model ntọala anọ, nke na-enye ndị nyocha na ndị mmepe ohere ịhọrọ model dabara adaba maka oke mgbakọ ha na ihe ha chọrọ maka ọrụ ha:
- Tucano-160m – nde parameters 160
- Tucano-630m – nde parameters 630
- Tucano-1b1 – ihe dị ka ijeri parameters 1.1
- Tucano-2b4 – ihe dị ka ijeri parameters 2.4
Na mgbakwunye na ụdị nlere ntọala e zụrụ tupu oge, oru a mepụtara ọtụtụ ihe sitere na ya emelitere (fine-tuned). Tucano-SFT na Tucano-DPO na-anọchi anya ụdị supervised fine-tuning na ụdị direct preference optimization n’otu n’otu. Ka ọ dịkwa, Tucano-2b4-Instruct bụ ụdị na-eso ntuziaka nke model ntọala kasị ukwuu. Ụdị emelitere ndị a na-agbatị uru nke ụdị ntọala gaa n’ọrụ mkparịta ụka na ngwa e mere maka ebumnuche kpọmkwem.
A tọhapụtakwa ụdị nlere multimodal ndị metụtara ya, ViTucano-1b5-v1 na ViTucano-2b8-v1, n’okpuru aha ViTucano, nke na-egosi ọrụ n’ọdịnihu nke na-etinye ụdị ọhụụ (visual modalities) tinyere nghọta ederede Portuguese.
Ojiji a Na-atụ Akwado na Ndị A Na-achọ
A na-ele Tucano anya nke ukwuu maka ndị nyocha na ndị mmepe na-arụ ọrụ na ọrụ nhazi asụsụ eke (natural language processing) n’asụsụ Portuguese. Ihe atụ ojiji nwere ike ịgụnye ịmepụta ederede, benchmarks maka language modeling, fine-tuning maka ngwa Portuguese nke dabara n’otu ngalaba, yana dịka ntọala nyocha (research baseline) maka ịmụ omume model n’ọnọdụ asụsụ nwere obere akụrụngwa. Inweta ọtụtụ nha model na-akwado ọtụtụ ọnọdụ itinye (deployment), site na nnwale agụmakwụkwọ n’akụrụngwa dị ntakịrị ruo na nyocha a na-etinyekwu akụrụngwa.
Ọ bụ n’ihi na a tọhapụrụ ụdị nlere ndị a n’okpuru ikike Apache 2.0 nke na-enye ohere (permissive) yana weights mepere emepe, enwere ike iji ha n’enweghị mgbochi, gbanwee ha, ma kesaa ha ọzọ n’efu, nke na-eme ka ha dịrị nnukwu obodo gụnyere ndị na-enweghị ohere ịnweta proprietary model APIs.
A kwụsịrị (archived) usoro model Tucano ugbu a, nke pụtara na mmepe na-arụ ọrụ akwụsịla. Weights, koodu, na akwụkwọ nkọwa metụtara ya ka dị n’ihu ọha na eze site na nchekwa data GitHub nke oru ahụ maka ntụaka na iji ya gara n’ihu site n’aka obodo ndị nyocha. Bipute nchọpụta na Patterns n’afọ 2025 na-enye ndekọ e nyochachara site n’aka ndị ọgbọ (peer-reviewed) nke usoro (methodology), data ọzụzụ, na nsonaazụ nyocha (evaluation results) metụtara oru ahụ, nke na-akwado imeghari (reproducibility) na ọmụmụ ọzọ.