By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
KriptotekaKriptoteka
  • Home
  • News
    • Web3
    • Crypto News
    • Market Analysis
  • Market
    • AI
    • Altcoins
    • Bitcoin
    • Blockchain
    • CEX
    • Defi
    • DePIN
    • DEX
    • ETFs
    • Ethereum
    • Gaming
    • ICO/IDO
    • Institutions
    • L1&L2
    • Meme
    • NFT tech
    • RWA
    • Stable coins
  • Data
  • Events
  • Learn
  • Reports
  • Podcasts
  • Pro membership
Reading: Mistral AI Launches Pixtral 12B: Innovative Multimodal Model
Share
Notification Show More
Font ResizerAa
Font ResizerAa
KriptotekaKriptoteka
  • Home
  • News
  • Market
  • Data
  • Events
  • Learn
  • Reports
  • Podcasts
  • Pro membership
  • Home
  • News
    • Web3
    • Crypto News
    • Market Analysis
  • Market
    • AI
    • Altcoins
    • Bitcoin
    • Blockchain
    • CEX
    • Defi
    • DePIN
    • DEX
    • ETFs
    • Ethereum
    • Gaming
    • ICO/IDO
    • Institutions
    • L1&L2
    • Meme
    • NFT tech
    • RWA
    • Stable coins
  • Data
  • Events
  • Learn
  • Reports
  • Podcasts
  • Pro membership
Have an existing account? Sign In
Follow US
  • Advertise
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Kriptoteka > Market > AI > Mistral AI Launches Pixtral 12B: Innovative Multimodal Model
AI

Mistral AI Launches Pixtral 12B: Innovative Multimodal Model

marcel.mihalic@gmail.com
Last updated: September 21, 2024 7:04 pm
By marcel.mihalic@gmail.com 4 Min Read
Share
SHARE


Iris Coleman
Sep 18, 2024 03:29

Mistral AI has unveiled Pixtral 12B, an advanced multimodal model that excels at handling both text and image tasks, showcasing impressive capabilities in instruction adherence and reasoning.



Mistral AI Launches Pixtral 12B: An Innovative Multimodal Model

Mistral AI has officially rolled out Pixtral 12B, marking the company’s first foray into multimodal models that effectively manage both text and image inputs. The model operates under the Apache 2.0 license, as per Mistral AI.

Pixtral 12B Key Highlights

Pixtral 12B is distinguished by its inherent multimodal design, trained using interspersed image and text datasets. It features a newly introduced vision encoder with 400M parameters and a 12B parameter multimodal decoder built on Mistral Nemo. This configuration supports varying image dimensions and aspect ratios while being capable of processing multiple images in its extensive context window of 128K tokens.

In terms of performance, Pixtral 12B shines at multimodal tasks, achieving top-tier results in text-only evaluations. It has recorded a 52.5% score on the MMMU reasoning benchmark, outperforming several models with larger architectures.

Evaluation and Performance Metrics

Designed as a seamless upgrade to Mistral Nemo 12B, Pixtral 12B offers premier multimodal reasoning features without sacrificing prowess in text-oriented tasks such as instruction compliance, coding, and mathematical computation. Consistent evaluation methods were utilized across various datasets, with Pixtral surpassing both open-source and proprietary models, including Claude 3 Haiku. Remarkably, it matches or exceeds the effectiveness of larger models like LLaVa OneVision 72B in multimodal assessments.

In the realm of instruction fidelity, Pixtral has achieved a 20% relative enhancement in text IF-Eval and MT-Bench compared to the closest open-source alternative. It excels in multimodal instruction conformity, outperforming competitors like Qwen2-VL 7B and Phi-3.5 Vision.

Design and Functional Capabilities

Pixtral 12B’s design is optimized for rapid processing and effective performance. The vision encoder tokenizes images at their native dimensions and aspect ratios, transforming them into tokens for each 16×16 image segment. These tokens are organized into a sequence, incorporating [IMG BREAK] and [IMG END] tokens for structure within rows and concluding the image. This enables the model to comprehend complex graphics and documents accurately while ensuring quick inference for smaller images.

The model’s ultimate setup comprises two main components: the Vision Encoder and the Multimodal Transformer Decoder. It is trained to forecast the subsequent text token based on interleaved image and text information, facilitating processing of any number of images with diverse sizes across its expansive 128K tokens context window.

Real-World Implementations

Pixtral 12B has demonstrated remarkable aptitude in a range of practical scenarios, such as reasoning with intricate figures, interpreting charts, and following multi-image instructions. For instance, it can consolidate data from several tables into a singular markdown table or produce HTML code to construct a website from an image prompt.

Getting Started with Pixtral

Users can readily explore Pixtral through Le Chat, Mistral AI’s interactive chat platform, or via La Plateforme, which facilitates integration through API calls. Comprehensive documentation is accessible for those eager to harness Pixtral’s functionalities within their projects.

For users interested in local deployment, Pixtral can be accessed using the mistral-inference library or the vLLM library, providing enhanced serving throughput. Detailed setup and usage instructions are included in the accompanying documentation.

Image source: Shutterstock


You Might Also Like

Claude.ai Launches Advanced Tool for Enhanced Data Analysis

Litecoin’s 2.6-Year HODL Time Ranks Second Behind Bitcoin

LINK Price Analysis: Can It Breach $12 to Reach New Highs?

Retail Bitcoin Holdings Grow Slowly Amid Market Recovery

Top AI Coins This Week: VIRTUAL, NOS, and DMTR Surge

Share This Article
Facebook Twitter Email Print
Previous Article Solana and BNB Investors Eye Rollblock for 100x Potential Gains
Next Article BingX Exchange Loses $52M in Hack, Says Blockchain Security Firm
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
- Advertisement -
Ad image

Latest News

4 Cryptos to Challenge Solana: Potential Growth for Investors
Defi
Bitcoin ETF Inflows Exceed $3B, Demand Reaches 6-Month Peak
ETFs
Japan’s Push for Bitcoin and Ethereum ETFs Gains Momentum
Institutions
Ripple Appeals Court Ruling on XRP’s Institutional Sales
Meme
//

We influence millions of users and is the number one Crypto and Web3 news network on the planet

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
nl Dutchen Englishfr Frenchde Germanel Greekit Italianpt Portugueseru Russianes Spanish
en en
Join Us!

Subscribe to our newsletter and never miss our latest news, podcasts etc..

Zero spam, Unsubscribe at any time.
Welcome Back!

Sign in to your account

Lost your password?