Transformer Apache 2.0 Yes
Português

Injɛlɛ

text

Bɔli

text

Tèmɛw

generative ainlpresearch

Tucano ye open-weights transformer kan modeli dencogo de ye minw dabɔra Brazil la ani minw kalan kɛra Portuguese kan sɛbɛnni dɔrɔn kan. Modɛliw ye pre-training sɔrɔ GigaVerbo kan, data kɛlɛn min bɛ se ka kɛ Portuguese token miliyari 200 ɲɔgɔnna tɛgɛlen ye, ani olu bɛ sɔrɔ hakɛ naani la, ka bɔ miliyɔn 160 ta ma ka taa bila 2.4 miliyari parameter ma.

Fine-tuned sigiyɔrɔw bɛ taa ni instruction-following ani preference-optimized versiɔnw ye, ani multimodal derivatives min bɛ ɲɔgɔn na, olu bɔra ViTucano tɔgɔ la. Tucano labɛnnen don baaɲɛnafɔlaw ani developers ye minw bɛ baara kɛ naturel language processing kɔlɔsiw kan Portuguese kan na, kan dɔ min ma cɛsiri sɔrɔ kosɛbɛ large-scale language model dabɔli la.

Projeti in sɛbɛnnen bɛ 2025 paper dɔ la min bɔra Patterns journal la, ani a bɔra Apache 2.0 license kɔnɔ, ni weights ani code ye ka bɔ kɛnɛ kan GitHub la. Sisan, model seri in bilennen don archive la.

Kunfɛko ni Labɛnni

Tucano labɛnna Brazil la i n’a fɔ ko a ye baarakɛli kɛra dafalen na walasa ka kanba fɔlɔw dɛsɛ minnu bɛ kalan kɛ kɛrɛnkɛrɛnnen na portugɛsi sɛbɛnni kan, olu dɔgɔya da. Kanba fɔlɔ caman minnu ka tɔgɔ bɛ bonya kosɛbɛ, olu bɛ kalan kɛ tubabukan sɛbɛnni kulu kan ka tɛmɛn, nka portugɛsikan — mɔgɔ miliyɔn 250 kɛlen kɛlen bɛ a fɔ Brazil, Portugal ani jamana wɛrɛw la — a ma sɔrɔ hakilijɛya caman sisan fɔlɔgɔlɔ model kuraw sɛgɛsɛgɛli la. Tucano poroze ye dafalen in datugu walasa ka transformer-taabolo-kan modelw bɔ daminɛ na, ka baara kɛ ni portugɛsi sɛbɛnni sɛtɛ ye min ka bon ni ka ɲi.

Modelw ye kalan sɔrɔ fɔlɔ GigaVerbo kan, corpus dɔ min bɛ se ka kɛ portugɛsi token miliyari 200 ɲɔgɔnna, minnu labɛnnin don walasa ka dafa ka kan modelizasyɔn sɛnbɛma kɛ hakɛ min bɛ bon. Poroze in sɛbɛnnen don fɛnw lajɛbali la Tucano: Advancing Neural Text Generation for Portuguese, min bɔra jurnali Patterns la san 2025, ani pɛzɛnw bɛɛ ni kalanni kodi bɛ se ka sɔrɔ kɛnɛya la GitHub kan Apache 2.0 lisansi jukɔrɔ ye.

Model Suguyaw ni Versionw minnu Labɛnnen don Tugun

Tucano ka denmisɛn bɛ taa base model hakɛ naani kan, min bɛ to sɛgɛsɛgɛlikɛlaw ni baarakɛlaw ye model sugandi min ka kan i ka komputɛri seko ni baara laakariw ma:

  • Tucano-160m – paramɛtɛri miliyɔn 160
  • Tucano-630m – paramɛtɛri miliyɔn 630
  • Tucano-1b1 – paramɛtɛri miliyari 1.1 ɲɔgɔnna
  • Tucano-2b4 – paramɛtɛri miliyari 2.4 ɲɔgɔnna

Base modelw minnu kalan sɔrɔ fɔlɔ, olu fara kan, poroze ye derivative labɛnnen caman fana bɔ. Tucano-SFT ani Tucano-DPO ye supervised fine-tuning ani direct preference optimization versionw jira, kelen kelen, wa Tucano-2b4-Instruct ye instruction-following version ye base model min ka bon kosɛbɛ o la. Versionw minnu labɛnnen don tugun, olu bɛ base modelw nafabaara faran ka taa kumaɲɔgɔnya ni baara-daasira baaraw ma.

Multimodal modelw minnu bɛ ɲɔgɔn na, ViTucano-1b5-v1 ani ViTucano-2b8-v1, olu fana bɔra ViTucano tɔgɔ la, min bɛ jira ko baara dɔw kɛra tugun min bɛ jaɲɛya fɛnw fara portugɛsi sɛbɛnni faamuyali kan.

Baaraw minnu bɛ Kɛ ni Mɔgɔw minnu ye Daanaya

Tucano bɛ dafalen kɛ fɔlɔ sɛgɛsɛgɛlikɛlaw ni baarakɛlaw ye minnu bɛ baara kɛ kan dabiya baaraw la portugɛsikan na. Baaraw minnu bɛ se ka kɛ ni a ye, olu kɔnɔ bɛ sɛbɛnni bɔli, kan modelizasyɔn sɔrɔw lajɛbali, fine-tuning kɛli ka taa portugɛsi baaraw ma minnu ye kɛnɛnkɛnɛn da, ani ka kɛ sɛgɛsɛgɛli duguma ye walasa ka model taabolo lajɛ kan dɔgɔya-kanw sigida la. Model hakɛ caman minnu bɛ yen, olu bɛ dɛmɛ ka sigida suguya caman dafa, ka bɔ kalan-kɛli lajɛliw ma minnu bɛ seko dɔgɔman de sɔrɔ, ka taa baara lajɛliw ma minnu bɛ nafolo ni fɛnw caman bila.

Sabu modelw bɔra kɛnɛya la Apache 2.0 lisansi jukɔrɔ ye ni pɛzɛn dafalenw ye, mɔgɔ bɛ se ka olu kɛ baara la, ka olu yεlεma, ani ka olu yɔrɔ wɛrɛ la, o de kɛra sababu ye min na olu bɛ se ka sɔrɔ jama kɛlɛnkɛlɛn ye, hali ni mɔgɔw minnu tɛ se ka taa model API dogolenw ma.

Jɔyɔrɔ Sisan

Tucano model seri bɛ archive la sisan, o kɔrɔ ye ko labɛnni ka ban. Pɛzɛnw, kodi, ani sɛbɛnniw minnu bɛ ɲɔgɔn na, olu ka kɛnɛya yɔrɔ bɛ jira GitHub marayɔrɔ la walasa ka kɛ lajɛli ni baarajɛya ye sɛgɛsɛgɛli jama fɛ. Sɔrɔw bɔli min kɛra Patterns la san 2025, o bɛ peer-reviewed sɛbɛn dɔ di poroze in ka fɛɛrɛ, kalandata, ani lajɛli sɔrɔw kan, min bɛ dɛmɛ reproducibility ani sɛgɛsɛgɛli tuguni ma.

Rapɔri