Transformer Apache 2.0 Yes
Português

Kenyo

text

Sephetho

text

Lihlooho

generative ainlpresearch

Tucano ke lelapa la mehlala ea puo ea transformer e nang le boima bo bulehileng (open-weights) e ntlafalitsoeng Brazil ’me e koetlisitsoe ka ho khetheha ka mongolo oa puo ea Sepotoketsi. Mehlala ena e ile ea koetlisetsoa esale pele ho GigaVerbo, e leng sete ea data e nang le matšoao a ka bang libilione tse 200 a Sepotoketsi a tlositsoeng boiphetetso (deduplicated), ’me e fumaneha ka boholo bo mane bo tlohang ho liparamente tse limilione tse 160 ho isa ho tse 2.4 bilione.

Likarolo tse ntlafalitsoeng ka ho ikamahanya (fine-tuned) li kenyelletsa mefuta e latelang litaelo le e ntlafalitsoeng ka khetho (preference-optimized), ’me mefuta e amanang le eona ea multimodal e ile ea lokolloa tlas’a lebitso la ViTucano. Tucano e reretsoe bafuputsi le bahlahisi ba sebetsang mesebetsing ea ho sebetsana le puo ea tlhaho (natural language processing) ka Sepotoketsi, e leng puo e neng e sa emeloa haholo nalaneng ea nts’etsopele ea mehlala e meholo ea puo.

Morero ona o hlalositsoe pampiring ea 2025 e hatisitsoeng makasineng oa Patterns ’me o lokollotsoe tlas’a laesense ea Apache 2.0, ka boima (weights) le khoutu li fumaneha phatlalatsa ho GitHub. Letoto la mehlala le se le bolokiloe (archived) hajoale.

Semelo le Ntlafatso

Tucano e ile ea ntlafatsoa Brazil e le boiteko bo inehetseng ba ho rarolla khaello ea mehlala e meholo ea puo e koetlisitsoeng ka ho khetheha ka mongolo oa Sepotoketsi. Le hoja mehlala e mengata e hlahelletseng ea puo e koetlisitsoe haholo-holo ka pokello ea litaba tsa Senyesemane, Sepotoketsi—se buuoang ke batho ba fetang limilione tse 250 Brazil, Portugal, le linaheng tse ling—esale se fumana tlhokomelo e fokolang historing ea lipatlisiso tsa mehlala ea motheo. Morero oa Tucano o ne o ikemiselitse ho koala lekhalo lena ka ho aha mehlala e thehiloeng ho transformer ho tloha qalong, ka ho sebelisa pokello e kholo le e boleng bo phahameng ea Sepotoketsi.

Mehlala e ile ea koetlisetsoa pele ho GigaVerbo, e leng pokello ea litaba e ka bang tokens tse limilione tse likete tse 200 tsa Sepotoketsi tse tlositsoeng tse phetoang, e hlophisitsoeng ho tšehetsa ho koetlisa puo ka matla ka tekanyo e kholo. Morero o ngotsoe pampiring Tucano: Advancing Neural Text Generation for Portuguese, e phatlalalitsoeng koranteng Patterns ka 2025, ’me boima bohle le khoutu ea koetliso li fumaneha phatlalatsa ho GitHub tlas’a laesense ea Apache 2.0.

Mefuta ea Mehlala le Liphetolelo tse Koetlisitsoeng ka Tlatsetso

Lelapa la Tucano le akaretsa boholo ba mehlala ea motheo e mene, e lumellang bafuputsi le bahlahisi ho khetha mohlala o loketseng meeli ea bona ea ts'ebetso le litlhoko tsa mosebetsi:

  • Tucano-160m – liparamente tse limilione tse 160
  • Tucano-630m – liparamente tse limilione tse 630
  • Tucano-1b1 – hoo e ka bang liparamente tse limilione tse 1.1
  • Tucano-2b4 – hoo e ka bang liparamente tse limilione tse 2.4

Ho phaella mefuteng ea motheo e koetlisitsoeng pele, morero o hlahisitse litloholo tse ’maloa tse koetlisitsoeng ka tlatsetso. Tucano-SFT le Tucano-DPO li emela mefuta ea supervised fine-tuning le direct preference optimization ka ho latellana, ha Tucano-2b4-Instruct e le mofuta oa ho latela litaelo oa mohlala o moholo ka ho fetisisa oa motheo. Liphetolelo tsena tse koetlisitsoeng ka tlatsetso li atolosa molemo oa mehlala ea motheo ho ea lits'ebetsong tsa lipuisano le tse shebaneng le mesebetsi.

Mehlala e amanang ea multimodal, ViTucano-1b5-v1 le ViTucano-2b8-v1, le eona e ile ea lokolloa tlas’a lebitso la ViTucano, e bontšang mosebetsi o latelang o kenyelletsang mekhoa ea pono hammoho le kutloisiso ea mongolo oa Sepotoketsi.

Liketsahalo tsa Tšebeliso le Bamameli ba Ikemiselitseng

Tucano e lebisitsoe haholo ho bafuputsi le bahlahisi ba sebetsang mesebetsing ea tlhahlobo ea puo ka tlhaho ka Sepotoketsi. Liketsahalo tse ka bang teng tsa tšebeliso li kenyelletsa ho hlahisa mongolo, li-benchmark tsa language modeling, ho etsa fine-tuning bakeng sa lits'ebetso tsa Sepotoketsi tse ikhethileng ka sebaka, le ho sebetsa e le motheo oa lipatlisiso bakeng sa ho ithuta boitšoaro ba mehlala maemong a lipuo tse nang le lisebelisoa tse fokolang. Ho fumaneha ha boholo bo fapaneng ba mehlala ho tšehetsa maemo a fapaneng a ho kenya ts'ebetsong, ho tloha litekong tsa thuto ka lisebelisoa tse fokolang ho ea lipatlisisong tse sebelisang lisebelisoa tse ngata.

Hobane mehlala e lokollotsoe tlas’a laesense e lumellang ea Apache 2.0 ka boima bo bulehileng, e ka sebelisoa ka bolokolohi, ea fetoloa, ’me ea ajoa hape, e etsa hore e fumanehe sechabeng se pharaletseng ho kenyeletsa le ba se nang phihlello ea li-API tsa mehlala tsa khoebo.

M boemo ba Hona Joale

Letoto la mehlala la Tucano hajoale le se le bolokiloe, ho bolelang hore nts'etsopele e sebetsang e felile. Boima, khoutu, le litokomane tse amanang li ntse li fumaneha phatlalatsa ka polokelo ea GitHub ea morero bakeng sa ho shejoa le tšebeliso e tsoelang pele ke sechaba sa lipatlisiso. Ho phatlalatsoa ha liphetho ho Patterns ka 2025 ho fana ka rekoto e hlahlojoang ke lithaka ea mokhoa, data ea koetliso, le liphetho tsa tlhahlobo tse amanang le morero, ho tšehetsa ho ka hlahisoa hape le lithuto tse eketsehileng.

Tlaleho