Blog

  • Protected: SailPoint

    This content is password-protected. To view it, please enter the password below.

  • Protected: Microsoft products

    This content is password-protected. To view it, please enter the password below.

  • Meta Deep Dive

    Meta Platforms (Facebook’s parent company) continues to post robust advertising revenue growth in 2024–2025, even as usage on its flagship social apps shows signs of stagnation or decline. In 2024, Meta’s ad revenue jumped to $164.5 billion (up from $134 billion in 2023)sproutsocial.com, and the trend carried into 2025 (Q1 2025 ad revenue grew 16% year-over-year to $41.4 billion)ppc.land. This record growth persists despite “declining” user engagement on Facebook and Instagram – for example, Mark Zuckerberg has admitted that time spent on Meta’s apps has fallen since TikTok’s risesocialmediatoday.com, and Facebook’s usage has plateaued (total hours on Facebook were flat year-over-year in mid-2024)sensortower.com. Notably, Facebook’s monthly active users even saw a first-ever slight decline from Q3 to Q4 2024thesocialshepherd.com. The following report analyzes why Meta’s ad business remains on a growth trajectory amid these engagement headwinds, focusing on six strategic factors:

    • AI and machine learning advancements in Meta’s ad platform (better targeting & optimization)
    • Shifts in advertiser behavior and strategy (performance marketing focus, brand safety, cross-platform campaigns)
    • The role of Reels, Stories, and new ad formats in capturing attention and revenue
    • Global user base expansion and the contribution of non-U.S. markets
    • Advertiser responses to engagement trends – overlooking declines vs. adapting tactics
    • Insights from executives and investor reports (e.g. earnings calls commentary)

    Each section below delves into these areas, with data and primary-source commentary illustrating how Meta converts its massive user base and technological investments into advertising dollars – seemingly regardless of softening engagement on individual platforms.

    1. AI and Machine Learning Enhancing Ad Targeting & Performance

    Meta has increasingly leveraged artificial intelligence (AI) and machine learning to boost the effectiveness of its advertising platform. These innovations have improved ad targeting, delivery, and conversion, allowing Meta to extract more revenue per user and per ad, even if user activity growth is slow. Key points include:

    • Advanced AI-driven targeting: Meta’s algorithms now do much of the work in finding the right audience for ads. Mark Zuckerberg noted that Meta’s ad system can often “better predict who would be interested” in an ad than advertisers can manuallys21.q4cdn.com. In practice, AI has made Meta’s ads better at finding the right audiences and optimizing spend, as the company integrates AI across ad productscreativestrategies.com. This translates to higher return on investment (ROI) for advertisers, encouraging them to keep or increase ad budgets despite any user engagement dips.
    • Algorithmic ad optimization: New machine learning models have significantly improved ad performance. In Q1 2025, Meta introduced a Generative Ads Ranking model (GEM) for ads. According to CFO Susan Li, this model uses a novel architecture that is “twice as efficient at improving ad performance” for a given amount of data/computeppc.land. Early results were impressive – testing GEM on Facebook Reels ads showed up to a 5% increase in ad conversionsppc.land. By improving conversion rates and outcomes, such AI optimizations boost the value of each ad impression. (Notably, in Q1 2025 ad impressions grew only 5% while the average price per ad rose 10%, indicating Meta is monetizing better via targeting/optimization rather than sheer volumeppc.land.)
    • AI-boosted user engagement: Meta is also using AI to enhance the user experience, which indirectly supports the ad business. More powerful recommendation algorithms surface content that keeps users watching, scrolling, and returning. Internally, Meta reported that improvements to its AI-driven feed and video recommendations led to roughly an 8% lift in time spent on Facebook and Instagrams21.q4cdn.com. Zuckerberg highlighted that “advances in AI continue to improve the quality of recommendations and drive engagement” – for instance, a new unified video recommendation system increased engagement on Facebook Reels substantiallys21.q4cdn.com. By regaining some of the user attention (even as TikTok competes for it), Meta ensures a healthy supply of eyeballs for advertisers. In short, AI helps Meta squeeze more engagement and ad clicks out of a given user base.
    • Automation of ad creation and delivery: Meta’s vision is that AI will eventually handle many aspects of advertising automatically. Zuckerberg has described a future where a business can simply specify its goal and budget, and Meta’s AI will “just do the rest”ppc.land – finding the audience, optimizing bidding, and even generating creative. While not fully realized yet, steps in this direction (like Advantage+ automated campaigns) are already attracting advertisers. This AI-driven automation makes advertising on Meta easy and efficient for marketers, which keeps demand high. Zuckerberg framed this as “redefining what advertising is into an AI agent that delivers measurable business results at scale”ppc.land – a strategy to capture more advertising dollars by outperforming traditional manual campaign setups.

    Meta’s heavy investment in AI (it significantly raised capital expenditure guidance to build AI data centersreuters.com) underpins these improvements. In essence, better AI is allowing Meta to monetize each user more effectively, offsetting slower user growth or engagement. Advertisers are willing to spend more when the targeting and outcomes improve, which has propped up Meta’s revenue growth.

    2. Shifts in Advertiser Behavior and Strategy

    Changes in how advertisers allocate budgets and plan campaigns have also benefited Meta. In 2024–25, marketers are prioritizing platforms that deliver strong performance and broad reach, and Meta fits the bill – engagement concerns notwithstanding. Several strategic shifts on the advertiser side explain the continued revenue influx:

    • Focus on performance marketing and ROI: Advertisers have increasingly gravitated toward direct-response and performance-based advertising, especially in uncertain economic times. Meta’s platforms (Facebook/Instagram) are highly developed for performance marketing – offering detailed targeting, conversion tracking, and sales-oriented ad formats. This makes Meta a top destination for advertisers who need to show concrete results (app installs, e-commerce sales, etc.). Indeed, even as overall marketing budgets tightened in 2024, advertisers saw Meta as a “reliable go-to” channel to drive outcomesreuters.com. The sheer scale of Meta’s user base (3+ billion daily actives) gives advertisers confidence they can find enough customers, even if individual user sessions might be shorter or people are more scattered across apps. In short, brands keep spending on Meta because it reliably delivers ROI, which matters more to them than time-spent stats.
    • Advertisers largely undeterred by engagement dips: Notably, there has been little evidence of advertisers pulling back solely due to lower engagement metrics. Industry insiders suggest that even when Meta makes controversial changes or faces usage stagnation, most advertisers don’t make major spending shiftsbusinessinsider.com. For example, when Meta announced looser content moderation (a potential brand safety concern), ad buyers expressed worry but “generally didn’t expect the changes to hurt Meta’s [ad] business”businessinsider.com. The same appears true for engagement declines – advertisers are aware of the trends but are not fleeing. Meta’s advertising reach is too valuable, and no comparable alternative can yet match its combination of scale and ad efficacy. Essentially, advertisers are willing to “ride out” any dips in user engagement as long as their ads continue performing well on the platform.
    • Brand safety improvements and stability: Meta has taken steps to reassure advertisers on issues like brand safety, which helps keep big brand budgets in the ecosystem. In late 2024, Meta introduced new brand safety tools – for instance, giving advertisers control to mute comments on their ads and block ads from appearing alongside certain contentemarketer.com. These measures address advertisers’ concerns about ad adjacency to toxic content, which had led to boycotts in the past. By 2025, many marketers seem to have accepted Meta’s safeguards; there is a sense that “brand safety is a myth” and that leaving Meta would hurt reach more than it helps safetydigiday.com. The outcome is that advertisers continue spending on Meta instead of shifting budgets elsewhere, further fueling revenue growth.
    • Cross-platform campaign integration: Advertisers today approach Meta’s Family of Apps as an integrated marketing channel. Rather than buying Facebook or Instagram in isolation, they use Meta’s unified Ads Manager to run campaigns across Facebook, Instagram, Messenger, and the Audience Network simultaneously for maximum reach. Meta has made this extremely simple – adding Instagram or Reels placements to a Facebook campaign is “as easy as checking a box,” according to analystsreuters.com. This ease of cross-platform advertising means marketers can follow user attention within Meta’s ecosystem without friction. For example, as Instagram Reels gained popularity, advertisers quickly expanded campaigns into Reels ads. Meta’s CFO noted that over 75% of Meta’s advertisers were running ads in Reels by mid-2023reuters.com – reflecting rapid adoption. This cross-app flexibility lets advertisers maintain effective reach even if engagement shifts from one Meta app/surface to another. In effect, advertisers are not abandoning Meta due to engagement changes – they’re adapting within Meta (e.g. shifting spend from Feed to Reels or Stories) while keeping overall budgets in the Meta family.
    • Reliance on Meta amid industry shifts: Broader industry trends (like Apple’s privacy changes limiting third-party tracking and the deprecation of cookies) have paradoxically made Meta more important for advertisers. With less visibility into open web ads, many brands turned back to “walled gardens” like Meta that have rich first-party data and AI to optimize targeting. Additionally, during economic volatility or events (e.g. in 2024 some advertisers cut spend on experimental channels), those budgets often flowed to the known performers – with Meta benefitingreuters.com. In summary, advertiser strategy in 2024–25 has been to consolidate around platforms that drive results at scale, and Meta’s dual focus on performance and brand accommodations keeps it at the top of that list.

    3. The Role of Reels, Stories, and Emerging Ad Formats

    Meta’s introduction and monetization of new content formats – particularly Reels (short-form video) and Stories (ephemeral photo/video posts) – has been crucial in sustaining ad revenue growth. These formats help capture user engagement that might otherwise be lost (e.g. to TikTok or Snapchat) and create new inventory for ads. Key observations:

    • Reels driving video engagement: Reels (vertical short videos on Facebook and Instagram, akin to TikTok) have exploded in popularity. By mid-2024, Meta’s data showed that on Facebook, 30% of a user’s time was spent on Reels – double the share from January 2024sensortower.com. Instagram has effectively become a “video-first” platform, with users now spending about two-thirds of their time on Instagram watching videos (Reels or longer form)emarketer.com. This shift addresses the TikTok threat by keeping users engaged with Meta’s own short-form videos. While Reels initially launched in 2020 to skepticism, by 2023 Meta was touting huge usage: the number of Reels plays across FB and IG reached 200 billion per day (up from 140B/day in late 2022)reuters.comThis surge in Reels engagement provides Meta a fresh avenue to show ads, compensating for any decline in scrolling the news feed.
    • Monetization of Reels (and Stories): Meta has rapidly ramped up advertising in these new formats. Zuckerberg revealed that by mid-2023, Reels reached a $10 billion annual revenue run rate – a steep climb from about $3B in late 2022 and just $1B in mid-2022reuters.com. In other words, Reels went from a zero-revenue format to a significant chunk of Meta’s business in ~18 months, nearly catching up to TikTok’s ~$10B in ad revenuereuters.comStories (the 24-hour posts copied from Snapchat) have also become a mature revenue driver: by 2024, Instagram Stories contributed roughly 25% of Instagram’s ad revenueemarketer.com. (Feed ads were ~54% and Reels/Explore ~9.6%, with Reels share rising fastemarketer.com.) The success of Stories over the past few years proved Meta could introduce a new format and monetize it heavily; now Reels is on the same path. These emerging formats expand the total ad inventory and help offset any revenue loss from users spending less time in feeds. Even if a user’s feed scrolling dropped, the ads they now see in Reels or Stories can make up the difference.
    • Advertiser adoption of new formats: A major factor in monetization success is that advertisers have been quick to embrace Reels and Stories. Meta’s integration of formats means advertisers don’t need a whole new strategy – they can use existing assets or slightly tweaked creatives to run ads in Reels and Stories. According to Meta, more than three-quarters of its advertisers were placing ads on Reels by 2023reuters.com. Analysts noted “it’s as easy as checking a box” in the ad interface to extend a campaign to Reelsreuters.com. This high adoption rate shows that advertisers followed the user shift into short-form video instead of pulling spending. Zuckerberg had predicted this behavior: when Reels usage started cannibalizing some feed time, he told investors he expected advertisers to “embrace the format over time, as they had with…Feed to Stories” transitionsreuters.com. That prediction came true – advertisers adapted by spreading budgets across Feed, Stories and Reels, ensuring that even if one surface (like traditional feed) saw lower engagement, the overall campaign could still reach users elsewhere in Meta’s apps. This adaptability has been key to Meta retaining ad dollars that might have otherwise left for TikTok or other platforms.
    • Emerging platforms and ad opportunities: Beyond Reels and Stories, Meta is also exploring entirely new platforms and ad formats – which, while nascent, represent future growth potential. For example, Meta launched Threads (a Twitter-like text social app) in mid-2023 and quickly amassed 275 million users by Q4 2024jonloomer.com. As of late 2024 Threads had no significant ads and Meta does not expect Threads to contribute meaningful revenue until it scales further in 2025jonloomer.com. However, when the time is right, Threads could open another revenue stream (eMarketer predicts Threads ads will roll out cautiously, likely contributing modestly by late 2025)emarketer.com. Similarly, WhatsApp and Messenger are being monetized via “click-to-message” ads and business messaging. These allow advertisers to pay to start conversations with users in chat apps. While small today, the segment is growing fast – the WhatsApp Business Platform’s revenue grew 55% YoY to $519 million in Q4 2024fifthperson.com (though that sits outside “advertising” in Meta’s reporting). The key takeaway is that Meta is continually adding new ad real estate – whether within its main apps (Reels/Stories/Explore) or in new apps (Threads) or services (messaging). This pipeline of formats ensures that if user engagement shifts, Meta has somewhere to monetize it. Reels and Stories have proven this strategy effective, capturing user time that might have been lost and converting it into revenue.

    In summary, Meta’s agility in product innovation has kept its platforms sticky to users and attractive to advertisers. Reels and Stories helped Meta retain users (especially younger ones) and gave advertisers new ways to reach them. The rapid monetization of these formats is a critical reason ad revenue rose in 2024 even though Facebook usage was flatsensortower.com – Meta simply monetized different behaviors (short videos, ephemeral sharing) instead of the old news feed scrolling.

    4. Global User Base Expansion and Non-U.S. Markets’ Impact

    Meta’s advertising growth is also fueled by its vast international user base, which continues to expand. While Facebook and Instagram may be saturating in North America or seeing engagement declines in some cohorts, the global scale and growth in emerging markets provide a counterbalance. Key points on global dynamics:

    • Sheer user growth overseas: Meta’s family of apps is still adding users worldwide, especially in Asia-Pacific, Africa, and other emerging regions. As of early 2025, 3.43 billion people use at least one Meta app each day (Family DAP), an increase of 6% year-over-yearreuters.com. Monthly active users on Facebook hit ~3.08 billion in 2024thesocialshepherd.com (Instagram and WhatsApp add even more on top). Most of the new users are outside the U.S. Even if an average user in, say, India spends less time or sees fewer ads than an average American user, the volume of new users helps drive up total ad impressions. In Q4 2024, for example, ad impressions delivered across Meta’s apps rose 6% overallfifthperson.com – primarily driven by usage growth in Asia-Pacific (APAC). More people coming online and using Meta in populous countries like India, Indonesia, and Brazil directly translates into more ad views globally.
    • High growth in ad revenue from non-U.S. regions: Advertisers in emerging markets are also ramping up spending on Meta as those digital ad ecosystems mature. Meta’s financials show faster ad revenue growth internationally than in its home market. In Q2 2024, Meta’s worldwide ad revenue was up 22% YoY, but U.S. & Canada ad revenue was up a lesser 17%sensortower.com – implying regions like APAC, Latin America, and Europe grew well above 20%. Indeed, in some quarters Europe and “Rest of World” (Latin America, Africa, etc.) saw 30%+ year-over-year ad revenue growthsensortower.com as advertisers in those regions increased their Meta budgets. By contrast, North America’s ad revenue growth was in the teens. This means an increasing share of Meta’s incremental revenue is coming from outside the U.S. For instance, Asia-Pacific ad demand (especially from e-commerce and gaming advertisers) was very strong in 2024, helping boost pricing and fill more impressionss21.q4cdn.com. Meta’s global diversification has thus mitigated the impact of any stagnation in U.S. user engagement – even if Facebook usage is flat in the U.S., the company can still grow ad sales by expanding in markets where Facebook/Instagram usage is still rising.
    • Majority of revenue now international: Meta’s monetization of its international user base has improved over time, to the point that the majority of ad revenue now comes from outside the U.S.. In 2024, roughly $72 billion of Meta’s ad revenue (about 44%) was generated in the U.S. & Canada, while the remaining ~56% came from other regionsbusinessofapps.com. (Notably, North America accounts for only ~9% of Meta’s usersbusinessofapps.com, but yields a high ARPU; still, non-North America collectively contributes more dollars now.) This global revenue mix means Meta’s growth is less beholden to any single region’s engagement levels. Even as North American usage is mature, there are billions of users in APAC and other areas where advertising on Meta is still gaining traction. In markets like India, where Facebook’s user base grew ~16% from 2022 to 2024 (to over 400 million MAUs)statista.com, Meta is just beginning to tap into the ad budgets of local businesses and global brands targeting those users. Increasing internet penetration and digital ad spend in emerging economies funnel directly into Meta’s ad revenue.
    • Lower ARPU offset by volume and growth: It’s true that average revenue per user (ARPU) is lower in non-U.S. markets, and engagement per user may also be less in developing regions. However, the gap is narrowing. As smartphones become ubiquitous and e-commerce grows globally, advertisers from all industries (CPG, fintech, entertainment, etc.) are pouring money into Meta to reach these new online consumers. Meta’s ability to localize ads (supporting local languages, local businesses advertising on WhatsApp, etc.) has improved, which helps monetize international users more effectively. The strategy of focusing on “monetizing engagement over time”warc.com pays off here – Meta often grows user engagement first in a market (sometimes with years of user growth with minimal ads), then gradually increases ad load or ARPU. We saw this with Instagram: between 2015 and 2025, Instagram’s U.S. user base grew 142%, but its ad revenue grew many times that, as ARPU climbed to surpass Facebook’s by 2019emarketer.com. A similar dynamic is occurring in international markets now. Thus, even if engagement per user is not skyrocketing, Meta is better at monetizing each user each year – especially outside the U.S. – leading to overall revenue gains.

    In summary, Meta’s global expansion ensures that “flat” engagement in one region doesn’t translate to flat revenue. The company’s reach into every continent provides a growth engine: new users and new advertisers coming on board worldwide. Non-U.S. markets both expand the user base and increasingly contribute to ad revenue growth, offsetting any engagement fatigue in core Western markets.

    5. Advertisers: Overlooking Engagement Trends or Adapting to Them?

    A critical part of this puzzle is how advertisers react to reported engagement declines. Are they ignoring these warning signs, or actively adjusting their strategies? The evidence suggests advertisers are aware of engagement shifts but remain confident in Meta – largely because they can adapt their advertising tactics within the platform to still achieve results. Key insights:

    • Outcomes matter more than engagement metrics: Advertisers ultimately care about their own marketing objectives (sales leads, app installs, brand lift) more than they do about Facebook’s internal engagement stats. As long as campaigns on Meta continue to deliver strong outcomes, advertisers will keep spending, even if time spent per user is down. Meta’s Q1 2025 results underscore this – ad revenue rose 16% YoYcreativestrategies.com at a time when some engagement indicators (e.g. Facebook friend content consumption) were down. This implies advertisers were still getting value. Analysts have noted that Meta’s unmatched scale (3.4B daily users) makes it a “go-to ad venue” for marketerscreativestrategies.com. From an advertiser’s perspective, a modest decline in average user session length doesn’t outweigh the fact that almost everyone is still on Facebook/Instagram. Advertisers appear to be overlooking soft engagement trends because Meta’s platforms remain one of the only ways to reach billions of people with sophisticated targeting. In short, so long as Meta ads yield a positive ROI, advertisers aren’t overly concerned with whether users spent 5% less time on the app this year.
    • Adaptation within Meta’s ecosystem: Advertisers have proven very nimble in adapting to how users use Meta, rather than abandoning it. When engagement shifts, they shift their ad placements accordingly. For example, as users devote more attention to Reels videos and less to the news feed, advertisers have expanded into Reels ads (as discussed, 3 in 4 advertisers now run Reels ads)reuters.com. This flexibility means advertisers can follow the user journey within Meta’s walled garden. If engagement declines in one format, they increase focus on another. Zuckerberg gave the example that when Stories first gained popularity, advertisers eventually moved budget there from feed; the same pattern is happening with Reelsreuters.com. The ability to serve ads across multiple surfaces (Feed, Stories, Reels, Messenger, etc.) insulates advertisers from platform changes – they don’t have to leave Meta to find the eyeballs, they just redistribute how they buy inside Meta. Thus, advertisers are effectively mitigating the engagement issue by aligning their campaigns with whatever part of Facebook/Instagram is most engaging at the moment. This adaptability has been critical in Meta retaining ad dollars. Rather than panic about TikTok stealing user time, advertisers waited to see Meta offer a competing format (Reels) and then swiftly adopted it, thereby neutralizing the risk of lost reach.
    • Selective attention to engagement metrics: It’s also worth noting that “user engagement” is multifaceted, and not all declines are relevant to advertisers. For instance, one reported trend is that people spend less time on content from friends and more on algorithmic content. From a brand’s perspective, that shift might even be positive (users paying more attention to public content where ads live, versus private friend updates). In fact, Meta has stopped emphasizing time-spent metrics publicly, and advertisers mostly focus on ad performance metrics. Mark Zuckerberg’s acknowledgment that Meta’s share of social media time has declined with TikTok’s risesocialmediatoday.com was a notable public admission, but advertisers seem to have taken it in stride. Part of the reason is that no single competitor has significantly eroded Meta’s advertising efficacy yet. TikTok, for example, is growing but still smaller in ads (estimated ~$13B revenue in 2023 vs. Meta’s $117B in just the first nine months of 2024). Advertisers typically trial new platforms but often keep the bulk of budgets on proven ones. So while engagement trends are monitored, advertisers appear to be betting that Meta’s innovations (AI, Reels, etc.) will keep users sufficiently engaged. Many advertisers also discount some engagement stats as short-term fluctuations or demographic specifics that don’t affect their particular campaigns.
    • Confidence in Meta’s countermeasures: Advertisers take cues from Meta’s leadership on how the company is addressing engagement challenges. When Zuckerberg outlines plans to improve content discovery with AI or sees Reels as key to winning back young users, advertisers gain confidence that Meta is proactively working to increase engagement again. The fact that Meta’s leadership prioritized “increasing user engagement… before turning to monetization” of new experiencesreuters.com shows a long-term commitment to keeping the user base active. Advertisers are thus willing to stick with Meta, trusting that these efforts (e.g. the pivot to video, AI chatbots to spur interaction, etc.) will bear fruit. Furthermore, the lack of a significant user exodus – Facebook is “dying” slower than predicted, still over 3 billion users – reassures advertisers that they won’t find comparable scale elsewhere. As one industry expert put it, Meta’s ad business has “proven reliability” even in tough timesreuters.com. This reliability makes advertisers tolerant of engagement dips. They assume (so far correctly) that Meta will adapt and user activity will stabilize or shift into new forms that Meta can monetize – and thus they don’t pull their budgets at the first sign of trouble.

    In summary, advertisers are neither blind to engagement trends nor alarmist – they are pragmatic. They continue to allocate large budgets to Meta because it remains essential for reaching audiences and driving results. If anything, engagement declines have prompted advertisers to become more agile in how they use Meta (embracing Reels, leveraging cross-platform ads, etc.), rather than prompting them to abandon ship. As long as Meta provides the tools to reach users effectively (which it has, via AI and new formats), advertisers appear content to maintain or even increase their spending, effectively overlooking the engagement noise in favor of the signal: marketing performance.

    6. Executive and Investor Commentary

    Meta’s leadership and financial reports have provided insight into the strategy behind sustaining ad revenue growth and how the company communicates about engagement. Comments from Meta executives and industry analysts on earnings calls reveal the deliberate approach Meta is taking:

    • Zuckerberg on AI’s role in ads: Mark Zuckerberg has repeatedly emphasized that AI is the key driver of Meta’s advertising momentum. In late 2024, he noted on an earnings call that advances in AI recommendation models had “increased engagement on Facebook Reels” significantlys21.q4cdn.com, underscoring that boosting engagement is foundational. He also explained that on the advertising side, “our ads system [can] predict who would be interested [in an ad] better than the advertisers could themselves” thanks to AIs21.q4cdn.com. This was a candid way to assure investors that Meta’s AI targeting makes the platform extremely effective for marketers. Zuckerberg has even described Meta’s evolving ad strategy as turning advertising into an AI-driven agent for business outcomes“If we deliver on this vision…AI will make advertising a meaningfully larger share of global GDP than it is today,” he told investorsppc.land – a bold statement reflecting confidence that AI will unlock more advertiser spend. Such commentary signals that Meta is doubling down on AI to both keep users engaged and keep ads relevant, which in turn gives investors confidence that the company can navigate engagement headwinds (with AI as the solution).
    • CFO on engagement vs. monetization strategy: Meta’s Chief Financial Officer, Susan Li, has explicitly broken down the formula for revenue growth in terms of user engagement and monetization efficiency. On an earnings call, she highlighted two primary factors driving Meta’s revenue“our ability to deliver engaging experiences for our community and our effectiveness at monetizing that engagement over time.”warc.com She noted that AI is crucial to both – it helps make the apps more engaging and also powers better ad delivery. This reflects Meta’s classic playbook: first grow or maintain engagement (e.g. get people watching Reels), then monetize that engagement more and more effectively (e.g. improve Reels ads yield). Li has frequently updated investors on the monetization gap between new formats and old ones. For example, she acknowledged that Reels was still less monetized than Feed/Stories, but closing the gap was a priorityreuters.com. By Q4 2024, she reported ad prices rising 14% on average, partly due to improved ad performance, even as impressions grew 6%fifthperson.com. That indicates Meta was successfully monetizing existing engagement better – a point she tied to investments in AI and infrastructure. Investor takeaway: Meta’s finance chief is effectively saying “we know engagement isn’t growing like before, but we are making more money per unit of engagement”, and that’s how revenue keeps rising. This clear messaging has kept Wall Street on board with Meta’s strategy.
    • Analysts on advertiser demand and Meta’s position: Industry analysts and observers on earnings calls have provided context that supports why Meta’s ad revenue can grow amid engagement questions. For example, one eMarketer principal analyst noted in December 2024, “As other social platforms flood their services with more ad placements, Meta is focused on making its ads more efficient, primarily through AI.”emarketer.com Reels was cited as a major driver of Instagram’s growth in that comment. This suggests that outsiders see Meta taking a quality-over-quantity approach with ads, which tends to win long-term advertiser dollars. Another analyst remarked that Meta’s huge user base makes it uniquely resilient: during a period when marketers were wary of the economy, Meta’s scale and reliability meant it “rode strong advertising performance” to beat revenue expectationscreativestrategies.com. Advertisers consolidating spend to the biggest platforms played to Meta’s advantage. Additionally, Reuters reported an analyst observation that Meta’s “proven advertising reliability means it stands to gain” even when companies tighten ad budgets elsewherereuters.com. These external commentaries echo the themes that Meta’s executives push – namely, that Meta remains a must-buy for advertisers, and its focus on AI-driven efficiency is yielding tangible revenue benefits.
    • Meta’s guidance and engagement commentary: In investor communications, Meta’s management has also addressed the engagement concern directly at times, aiming to set expectations. Zuckerberg has stated that for newer initiatives (like the AI chatbot features or potentially Threads), Meta will prioritize user growth and engagement “for the next year” before ramping up monetizationreuters.com. This kind of statement is meant to reassure investors that the company is mindful of not squeezing the golden goose too soon. It also implies Meta believes it can increase engagement through product improvements given some time, and that monetization will follow naturally. Such commentary is essentially saying: short-term, we focus on keeping users interested; long-term, we’ll make money off those users. This patient approach was validated by past successes (e.g., Instagram Stories was unmonetized at launch, then became a major ad vehicle). Investors, hearing this, can tolerate flat engagement in the immediate term if they trust Meta’s plan to reignite engagement. And indeed, by late 2024, Meta was reporting some turnaround in engagement: new AI recommendations and video features were boosting time spent on Instagram and Facebook againcreativestrategies.com. That narrative – “we fixed the feed to show more Reels and it’s increasing time spent” – has been emphasized to alleviate concern that TikTok had permanently stolen growth. In essence, Meta uses its earnings calls not just to report numbers but to convince stakeholders that engagement is under control or will be won back, thanks to strategic product moves, and that meanwhile ad business is thriving.

    In conclusion, executive and investor commentary underline a coherent picture: Meta acknowledges engagement challenges but presents a strategy (largely AI-powered and format innovation) to overcome them, and consistently highlights the ongoing strength of the ad business. This transparency and strategy have kept investors optimistic. Meta’s stock performance rebounded in 2024 as the company demonstrated it could navigate Apple’s privacy changes, cut costs, and still grow revenue double-digits with AI and Reels at the forefrontcreativestrategies.comfifthperson.com. The confidence from the C-suite, backed by real financial results, illustrates why Meta’s advertising revenue not only continues to grow in 2024–25 but accelerates, seemingly defying the headwinds of user engagement declines on its core social platforms.

    Conclusion:Meta’s ability to grow advertising revenue in 2024–2025, despite flattening engagement on Facebook/Instagram, boils down to strategic execution on multiple fronts. Advanced AI and machine learning are extracting more value from each user and each ad (better targeting, higher conversion rates), which keeps advertisers spending. Advertisers themselves have shifted toward a performance-driven mindset and have shown loyalty to Meta – adapting their campaigns across Facebook and Instagram’s evolving formats rather than cutting budgets. Meanwhile, new engagement surfaces like Reels and Stories have given users fresh ways to spend time on Meta’s apps, and Meta has monetized these vigorously to offset any decline in legacy feeds. The company’s expansive global user base ensures that growth continues in regions where engagement is still climbing, balancing out saturation in the U.S. Ultimately, advertisers appear to be “looking past” engagement dips, trusting Meta’s massive reach and AI-enhanced ad platform to deliver results. Meta’s leadership has reinforced this trust by openly addressing challenges and investing heavily in solutions (AI, product innovation). All of these factors combine into a robust strategic story: Meta has effectively decoupled revenue growth from the need for ever-increasing user engagement by improving the quality of ad delivery and expanding what “engagement” means (to new formats and markets). This strategy has so far proven successful, as evidenced by Meta’s strong 2024 financials, and provides a template for how the company can continue to thrive even in a world where the Facebook/Instagram of old are no longer the shiny new thing

  • Meta and platfrom names

    Meta’s Strengths (Why It Could Win)

    • Top-tier AI talent (FAIR lab, Yann LeCun, Llama models)
    • Open-source strategy (LLaMA family)—widely adopted, shaping the open-source ecosystem
    • Massive infrastructure (tens of thousands of GPUs, custom silicon plans)
    • Ownership of distribution (Facebook, Instagram, WhatsApp) gives it a natural edge for deploying AI at scale

    🚧 Meta’s Challenges

    • No dominant cloud platform (unlike Azure, GCP, AWS)
    • No enterprise software footprint (unlike Microsoft or Salesforce)
    • Monetization uncertainty for open-source AI (vs. closed-source subscription models)
    • Intense competition from Google (Gemini), OpenAI, Anthropic, etc.

    🔮 Verdict:

    Meta is well-positioned to win in open-source AI influence and consumer-facing AI integration, but it’s less likely to dominate enterprise AI or cloud AI infrastructure, where Microsoft, AWS, and Google are ahead.

    Google and Amazon will benefit from generative ads, but I expect the effect will be the most powerful at the top of the funnel where Meta’s advertising operates, as opposed to the bottom-of-the-funnel search ads where Amazon and Google make most of their money.

    Moreover, there is that long tail I mentioned above: one of the challenges for Meta in moving from text (Feed) to images (Stories) to video (Reels) is that effective creative becomes more difficult to execute, especially if you want multiple variations.

    Meta has devoted a lot of resources over the years to tooling to help advertisers make effective ads, much of which will be obviated by generative AI. This, by extension, will give long tail advertisers more access to more inventory, which will increase demand and ultimately increase prices.

    There is one more channel that is exclusive to Meta: text-to-message ads. These are ads where the conversion event is initiating a chat with an advertiser, an e-commerce channel that is particularly popular in Asia. The distinguishing factor in the markets where these ads are taking off is low labor costs,

    And then the one that I think is going to have the fastest direct business loop is going to be around helping people interact with businesses. You can imagine a world on this where over time, every business has as an AI agent that basically people can message and interact with. And it’s going to take some time to get there, right? I mean, this is going to be a long road to build that out. But I think that, that’s going to improve a lot of the interactions that people have with businesses as well as if that does work, it should alleviate one of the biggest issues that we’re currently having around messaging monetization is that in order for a person to interact with a business, it’s quite human labor-intensive for a person to be on the other side of that interaction, which is one of the reasons why we’ve seen this take off in some countries where the cost of labor is relatively low. But you can imagine in a world where every business has an AI agent, that we can see the kind of success that we’re seeing in Thailand or Vietnam with business messaging could kind of spread everywhere. And I think that’s quite exciting.

    Both of these use cases — generative ads and click-to-message AI agents — are great examples as to why it makes sense for Meta to invest in its Llama models and make them open(ish): more and better AI means more and better creative and more and better agents, all of which can be monetized via advertising.

    Image 35
    Image 36

    1. General Tech & Business Trends

    • Stratechery (Ben Thompson) (stratechery.com)
      • Focus: Deep dives on tech strategy (Apple, Google, Amazon, AI, cloud).
      • Why Follow?: One of the best for understanding tech business models.
    • The Information (theinformation.com)
      • Focus: Premium tech/business reporting with scoops on startups and Big Tech.
      • Why Follow?: High-quality investigative journalism (paywall).
    • Protocol (now part of Politico) (politico.com/protocol)
      • Focus: Enterprise tech, cloud, and policy impacts on tech.
      • Why Follow?: Good for regulatory and infrastructure trends.
    • Axios Pro Rata & Axios Tech (axios.com)
      • Focus: Quick, sharp insights on tech deals, startups, and policy.
      • Why Follow?: Great for daily tech business briefings.

    2. Semiconductors, AI & Hardware

    • SemiAnalysis (semianalysis.com)
      • Focus: Deep semiconductor, AI chip, and supply chain analysis.
      • Why Follow?: One of the best for advanced chip tech (subscription-based).
    • AnandTech (anandtech.com)
      • Focus: CPU/GPU deep dives, data center hardware.
      • Why Follow?: Technical benchmarks and architecture breakdowns.
    • EE Times (eetimes.com)
      • Focus: Semiconductor and electronics engineering trends.
      • Why Follow?: Good for hardware innovation insights.

    3. Cloud, Enterprise & AI

    • Enterprise AI (enterpriseai.news)
      • Focus: AI adoption in enterprises, ML infrastructure.
      • Why Follow?: Niche coverage of AI deployments.
    • Data Center Knowledge (datacenterknowledge.com)
      • Focus: Hyperscale cloud, data center trends.
      • Why Follow?: Key for infrastructure investors.
    • AI Business (aibusiness.com)
      • Focus: AI applications in industry.
      • Why Follow?: Practical AI adoption case studies.

    4. Networking & Connectivity

    • Light Reading (lightreading.com)
      • Focus: Telecom, 5G, optical networking.
      • Why Follow?: Critical for networking industry trends.
    • Fierce Telecom (fiercetelecom.com)
      • Focus: Broadband, wireless, and ISP strategies.
      • Why Follow?: Good for regulatory shifts.

    5. Investor-Focused Tech Research

    • Above Avalon (Neil Cybart) (aboveavalon.com)
      • Focus: Apple ecosystem and consumer tech.
      • Why Follow?: Unique Apple-focused investment insights (paid).
    • Benedict Evans (benedictevans.com)
      • Focus: Big-picture tech trends (AI, mobile, autos).
      • Why Follow?: Great for long-term thematic investing.
    • Matthew Ball’s Essays (matthewball.vc)
      • Focus: Metaverse, gaming, media disruption.
      • Why Follow?: Deep analytical essays on future tech.

    6. News Aggregators & Real-Time Trends

    • Hacker News (news.ycombinator.com)
      • Focus: Crowdsourced tech/startup news.
      • Why Follow?: Early signals on tech shifts.
    • Techmeme (techmeme.com)
      • Focus: Tech news aggregator with smart curation.
      • Why Follow?: Best for daily headline scanning.

    Final Thoughts

    • For investors: Prioritize SemiAnalysis, Stratechery, The Information.
    • For engineers/developersAnandTech, EE Times.
    • For cloud/AI trendsEnterprise AI, Protocol.

    📰 Tech News Websites

    These are the most up-to-date sources for breaking news and industry developments:

    1. TechCrunch – Startups, funding, big tech moves
      https://techcrunch.com
    2. The Verge – Tech culture, product reviews, gadgets
      https://www.theverge.com
    3. Wired – Broader tech and science stories
      https://www.wired.com
    4. Ars Technica – Deep dives into hardware, software, policy
      https://arstechnica.com

    📬 Newsletters for Trends and Analysis

    Concise, often curated by experts:

    1. Benedict Evans – Weekly analysis of tech and strategy
      https://www.ben-evans.com/newsletter
    2. Stratechery by Ben Thompson – Sharp business/strategy insights
      https://stratechery.com
    3. The Pragmatic Engineer – Deep engineering, tech org insights
      https://newsletter.pragmaticengineer.com
    4. TLDR Newsletter – Daily brief of key tech/startup news
      https://www.tldrnewsletter.com

    📈 Trend Analysis & Market Insights

    Great for identifying broader movements and long-term shifts:

    1. CB Insights – Market maps, tech trend reports
      https://www.cbinsights.com/research
    2. Gartner & Forrester – Enterprise-level tech forecasting
    3. PitchBook / Crunchbase – Startup activity and funding trends
      https://www.crunchbase.com

    🌐 Community & Crowd-Sourced Trendspotting

    Follow what developers, entrepreneurs, and early adopters are saying:

    1. Hacker News (Y Combinator) – Tech and startup discussions
      https://news.ycombinator.com
    2. Reddit – Subreddits like r/technology, r/Futurology, r/MachineLearning
      https://www.reddit.com
    3. Product Hunt – New product launches, early trend signals
      https://www.producthunt.com

    📊 Tools to Watch What’s Gaining Traction

    Helpful for identifying rising technologies or companies:

    1. Google Trends – Track keyword popularity over time
      https://trends.google.com
    2. Exploding Topics – Curates fast-growing topics across tech/business
      https://explodingtopics.com
    AspectMeta (LLaMA)Microsoft / OpenAIGoogle (Gemini)
    AI Research TalentTop-tier (FAIR, LeCun)Strong (OpenAI + MSR)Strong (DeepMind + Brain)
    Model StrengthLLaMA 3, strong open modelsGPT-4, ChatGPT dominanceGemini 1.5+, competitive
    Open vs ClosedOpen-sourceMostly closedMostly closed
    Cloud InfrastructureNo public cloudAzure + OpenAI APIsGCP + TPUs
    Enterprise Software ReachWeakVery strong (Office, Copilot, Dynamics)Moderate (Workspace, Cloud AI tools)
    Consumer IntegrationStrong (Instagram, FB, WhatsApp)ModerateVery strong (Search, YouTube, Android)
    Hardware InvestmentBuilding custom silicon, heavy GPU spendOpenAI clusters on Azure; growing hardware partnershipsTPUs, advanced infra
    Revenue Model MaturityEmergingMature APIs + SaaS + Office bundlesAdvanced but fragmented
    Dev Ecosystem InfluenceGrowing fast via LLaMAStrong (Copilot, GitHub, Azure AI Studio)Strong (TensorFlow, Android, Colab)
  • NVIDIA – All you need to know

    Increasing compute for training makes models smarter &

    Increasing compute for long thinking makes the answer smarter

    The statement encapsulates a widely recognized principle in AI development. The concept aligns with the understanding that both training and inference phases of AI models benefit from increased computational resources, albeit in different ways.

    🧠 Training Compute: Building Smarter Models

    During the training phase, AI models learn from vast datasets, adjusting their internal parameters to capture patterns and knowledge.

    Increasing computational power during this phase allows for:

    • Larger Models: More parameters can be trained, enabling the model to capture more complex patterns.
    • Extended Training: Models can be trained over more data and for more epochs, improving generalization.
    • Enhanced Performance: Empirical scaling laws demonstrate that model performance improves predictably with increased compute, data, and model size.

    This relationship is detailed in studies on neural scaling laws, which show how performance metrics like loss decrease as compute resources increase during training.

    🧠 Inference Compute: Enhancing Answer Quality

    Inference refers to the model’s application phase, where it generates outputs based on new inputs. Allocating more compute during inference can lead to:

    • Longer Context Handling: Processing longer input sequences for more coherent and contextually relevant outputs.
    • Improved Reasoning: Allowing the model to perform more complex computations, leading to better problem-solving capabilities.
    • Dynamic Computation: Techniques like test-time compute scaling enable models to allocate resources adaptively based on the complexity of the task

    Research indicates that increasing compute during inference can enhance model performance, especially in tasks requiring complex reasoning.

    🔄 Summary

    In essence, increasing compute during training equips AI models with greater capabilities, while allocating more compute during inference allows them to utilize these capabilities more effectively to produce smarter answers.

    NVIDIA

    Founded in 1993, NVIDIA is the world leader in accelerated computing.

    Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined graphics, revolutionized accelerated computing, ignited the era of modern AI, and is fuelling industrial digitalization across markets.

    NVIDIA is now a full-stack computing infrastructure company with data-center-scale offerings that are reshaping industry.

    AI is advancing at light speed as agentic AI and physical AI set the stage for the next wave of AI to revolutionize the largest industries.

    Founder and CEO: Jensen Huang (1993)

    • Companies and countries around the world are  building NVIDIA-powered AI factories to process,  refine, and manufacture intelligence from data
    • CUDA offers developers a powerful toolkit with over 400 libraries, 600 AI models, numerous software development kits
    • Blackwell is one of the most important products in  our history, boasting technologies that power AI training and real-time large language model inference for models scaling up to 10 trillion parameters.

    1. Core Revenue Segments

    Data Center

    • Products: GPUs for AI, deep learning, and data analytics (e.g., H100, A100).
    • Customers: Cloud service providers (AWS, Azure, Google Cloud), research institutions, enterprises.
    • Growth Driver: AI adoption, especially large language models (LLMs), generative AI, and high-performance computing (HPC).

    Gaming

    • Products: GeForce GPUs, gaming laptops, and accessories.
    • Customers: Consumers, PC gamers, and OEMs.
    • Revenue Model: GPU sales, gaming platform services (like GeForce NOW), and software (e.g., DLSS, Reflex).

    Professional Visualization

    • Products: GPUs for designers, engineers, and creatives (e.g., RTX A6000).
    • Use Cases: 3D modelling, video production, CAD software acceleration.
    • Customers: Enterprises in media, architecture, and design.

    Automotive

    • Products: NVIDIA DRIVE (hardware + software stack for autonomous vehicles).
    • Customers: Automotive OEMs and suppliers (e.g., Mercedes-Benz, BYD).
    • Emerging Revenue: AI cockpit systems and autonomous driving platforms.

    OEM & Other

    • Products: Entry-level GPUs, embedded systems, and other licensing.
    • Role: Legacy and niche markets; less significant than core areas.

    2. Platform and Software Focus

    NVIDIA is increasingly becoming a platform company, not just a chipmaker:

    • CUDA: Proprietary parallel computing platform used widely in AI and scientific computing.
    • NVIDIA AI Enterprise: Software suite for training, deploying, and managing AI workflows.
    • Omniverse: A 3D collaboration and simulation platform for industrial digital twins.
    • DGX Systems: Turnkey AI supercomputers combining hardware + software stack.

    3. Business Model Mechanics

    • Fabless Design: NVIDIA designs its chips but outsources manufacturing to foundries like TSMC.
    • Hardware + Software Integration: Creates higher margins and customer lock-in.
    • Ecosystem Development: Encourages developers to build on CUDA and Omniverse, boosting long-term platform adoption.
    • Subscription & Licensing: Software (AI Enterprise, Omniverse) is shifting toward recurring revenue.

    🔧 1. Hardware Products

    A. GPUs (Graphics Processing Units)

    • Data Center / AI:
      • H100 (Hopper) – Flagship GPU for AI and large language models.
      • A100 (Ampere) – Widely used in AI training and inference.
      • L40 / L40S – High-performance GPUs for AI and visualization workloads.
      • Grace Hopper Superchip – CPU-GPU combo optimized for AI and HPC.
    • Gaming:
      • GeForce RTX Series (e.g., RTX 4090, RTX 4080) – Consumer GPUs for gaming and content creation.
      • GeForce GTX Series – Older generation gaming GPUs.
    • Professional Visualization:
      • RTX A6000 – Workstation GPU for design, engineering, and content creation.
      • Quadro Series – High-end GPUs for professionals (branding phased into RTX).

    B. Full Systems

    • DGX Systems – AI supercomputers combining NVIDIA GPUs with software.
    • HGX Platforms – Modular server platforms used by hyperscalers.
    • OVX Systems – Built for digital twins and industrial simulations using Omniverse.

    🧠 2. Software & Platforms

    A. AI & Machine Learning

    • CUDA – Parallel computing platform and API.
    • cuDNN, TensorRT – Deep learning libraries optimized for NVIDIA hardware.
    • NVIDIA AI Enterprise – Full software suite for AI development and deployment.

    B. Visualization & Simulation

    • Omniverse – Platform for real-time 3D collaboration and digital twins.
    • RTX Remix – AI tools for remastering old video games.

    C. Cloud & Edge

    • NVIDIA AI Foundry Services – For building custom generative AI models.
    • NIM (NVIDIA Inference Microservices) – Prebuilt microservices for model deployment.

    🚗 3. Automotive & Embedded

    • NVIDIA DRIVE – Hardware + software platform for autonomous driving:
      • DRIVE Thor / DRIVE Orin – SoCs for autonomous vehicle compute.
      • DRIVE Hyperion – Reference architecture for self-driving cars.
    • Jetson – AI edge computing modules for robotics, drones, and IoT:
      • Jetson Orin Nano / Xavier / AGX Orin

    📡 4. Networking

    • NVIDIA Mellanox (acquired 2020):
      • InfiniBand and Ethernet switches/adapters for high-speed data center networking.
      • BlueField DPUs (Data Processing Units) – Smart NICs for offloading tasks from CPUs in data centers.

    Blackwell is NVIDIA’s next-generation GPU architecture announced in 2024 and is crucial to its future roadmap, especially for AI and data centers. Blackwell is the successor to the Hopper architecture (used in the H100) and underpins NVIDIA’s most advanced GPUs for AI workloads.

    🔧 Major Blackwell Products:

    🔹 B100

    • Direct successor to the H100.
    • Built for training and inference of large language models (LLMs).
    • Includes major improvements in energy efficiency, FP8 performance, and multi-GPU scaling.

    🔹 GB200 (Grace Blackwell Superchip)

    • Combines a Blackwell GPU with a Grace CPU (ARM-based).
    • Designed to address the memory bottleneck in training massive models.
    • Target: cloud AI infrastructure, hyperscalers, LLMs like GPT and Claude.
    ArchitectureProductStatusKey Use Case
    HopperH100Widely DeployedTraining/inference of LLMs, HPC
     GH200 (Grace + Hopper)DeployedLarge-scale AI, supercomputing
    BlackwellB100Launching (2024–25)Next-gen AI training, more efficient
     GB200AnnouncedCombines Grace CPU + Blackwell GPU

     Why Blackwell Matters:

    • 2× performance vs H100 (esp. for LLMs)
    • Up to 30x energy efficiency improvements
    • Designed for scaling across GPU clusters
    • Already pre-ordered by AWS, Microsoft, Meta, Google Cloud

    What Investors Ask Most About NVIDIA ?

    Growth Sustainability & AI Demand

    • Can NVIDIA maintain its rapid growth?
    • Is the AI boom already priced into the stock?

    Geopolitical Risks & China Exposure

    • How will U.S.-China tensions impact NVIDIA?
    • What is the threat from Chinese competitors?

    Revenue Concentration & Customer Dependence

    • Is revenue too concentrated among a few customers?

    Regulatory Scrutiny & Legal Challenges

    • What are the implications of antitrust investigations?
    • How is NVIDIA addressing past legal issues?

    Financial Management & Capital Allocation

    • How is NVIDIA managing its capital expenditures?
    • What is the impact of the recent stock split?

    Product Innovation & Competitive Edge

    • How will new architectures like Blackwell and Rubin shape NVIDIA’s future?
    • Is NVIDIA’s software ecosystem a sustainable moat?

    NVIDIA Architecture Roadmap (Past → Future)

    ArchitectureCodenameLaunch YearKey ProductsFocus Areas
    VoltaV1002017Tesla V100AI training, HPC
    AmpereA1002020A100AI training/inference, HPC
    HopperH1002022H100, H200LLMs, AI training, HPC
    BlackwellB1002024B100, B200Generative AI, LLMs
    RubinR100Expected 2026R100, Rubin UltraNext-gen AI, HPC
    FeatureHopper (H100)Blackwell (B100/B200)
    Transistor Count~80 billion~208 billion
    AI Performance (FP8)~4 PFLOPSUp to 20 PFLOPS
    MemoryHBM3HBM3e
    Interconnect Bandwidth~900 GB/sUp to 10 TB/s
    Energy EfficiencyBaselineUp to 25x improvement
    Transformer EngineFirst-generationSecond-generation
    Decompression EngineNot presentIncluded

    The transition from Hopper to Blackwell marks a substantial leap in performance and efficiency. Blackwell’s advancements cater to the escalating demands of AI workloads, offering enhanced performance and efficiency. Rubin is anticipated to succeed Blackwell, introducing significant enhancements in memory bandwidth and compute performance.

    Compute Performance: Anticipated to deliver up to 1.2 ExaFLOPS of FP8 performance, marking a 3.3x improvement over Blackwell.

    🧩 Strategic Positioning and Market Impact

    NVIDIA’s architectural advancements are not just technological feats but strategic moves to maintain and extend its market leadership:

    • Annual Release Cadence: Accelerating innovation cycles to meet the rapid evolution of AI workloads.
    • Ecosystem Integration: Tight coupling of GPUs with CPUs (e.g., Grace and Vera) and networking solutions (e.g., NVLink, CX9) to offer comprehensive platforms.
    • Market Leadership: Continued dominance in AI and HPC sectors, with widespread adoption across major cloud providers and enterprises.

    These strategies ensure NVIDIA remains at the forefront of AI and HPC innovation.

    What We Know About Rubin (as of 2025)

    • Rubin is confirmed in NVIDIA’s long-term roadmap.
    • Likely to be paired with Vera, a new Grace CPU successor.
    • Will further optimize AI model training and inference, possibly for post-LLM AI models (e.g., multi-modal systems, agentic AI).
    • May push NVIDIA further into custom silicon AI clusters for hyperscalers.

    🛠️ Expected Features (Speculative):

    • Better energy-per-watt than Blackwell
    • More efficient multi-GPU communication
    • Possibly new memory technologies (e.g., CXL, HBM4)

    🧠 NVIDIA’s AI Architecture Strategy:

    1. Blackwell (2024–2025): Dominating AI compute now (H100 → B100).
    2. Rubin (2026+): Ensuring leadership as model and compute demands scale further.
    3. Yearly cadence: Following a tick-tock cycle: architecture → refinement → new architecture

    2. Gaming GPUs (GeForce)

    ArchitectureProduct SeriesUse Case
    Ada LovelaceRTX 40 Series (4090, 4080, etc.)High-end gaming, ray tracing
    AmpereRTX 30 SeriesMid-tier / legacy gaming

    3. Professional & Visualization

    ProductUse Case
    RTX A6000 / A50003D rendering, design, CAD
    OmniverseIndustrial simulation, digital twins
    Image 17
    Image 19
    Image 19
    Image 18
    Image 20
    Image 32

    NVIDIA Data Center growth drivers

    • Broad data center platform transition from general-purpose to accelerated computing – The $1T installed base of general-purpose CPU data center infrastructure is being modernized to a new GPU-accelerated computing paradigm
    • Emergence of AI factories—optimized for refining data and training, inferencing, and generating AI
      • The entire computing stack has been reinvented—from CPU to GPU, from coding to machine learning, from software to generative AI.
      • Computers generate intelligence tokens, a new commodity.
      • A new type of data center, AI factories, is expanding the data center footprint to $2T and beyond in the coming years.
      • Eventually, companies in every industry will operate AI factories as the digital twin of their workforce, manufacturing plants, and products. A new industrial revolution has begun
    • Broader and faster product launch cadence
    Image 25

    Nvidia’s data center revenue is $101.5 billion, 77.8% of its total revenue

    In the same period, global capital expenditures (capex) on data centers by major technology companies—including Microsoft, Amazon, Google, and Meta—are projected to exceed $300 billion.

    Based on these figures, Nvidia’s data center revenue represents roughly 33.8% of the total data center capex by these leading tech firms.

    This significant proportion underscores Nvidia’s pivotal role in supplying the hardware—particularly GPUs—that power the infrastructure of modern data centers, especially those dedicated to artificial intelligence and high-performance computing.

    It’s important to note that this percentage is a broad estimate. Not all of Nvidia’s data center revenue comes exclusively from these companies, and not all data center capex is allocated to Nvidia products. Nevertheless, the figure highlights the substantial impact Nvidia has on the data center industry.

    Image 22

    NVIDIA Gaming growth drivers

    • Rising adoption of NVIDIA RTX in games
    • Expanding universe of gamers and creators
    • Gaming laptops and generative AI on PCs
    • GeForce NOW cloud gaming

    Professional Visualization

    • Generative AI adoption across design and creative industries
    • Enterprise AI development, model fine-tuning, cross-industry
    • Expanding universe of designers and creators
    • Omniverse for digital twins and collaborative 3D design

    Automotive

    • Over 40 customers including 20 of top 30 EV makers, 7 of top 10 truck makers, 8 of top 10 robotaxi makers
    • Adoption of centralized car computing and software-defined vehicle architectures

    Products

    Image 33
    Image 29

    This chart visualizes the energy efficiency improvements of NVIDIA’s GPU architectures over time, specifically in terms of Joules per Token (J/Token). It shows how much energy is required to process one token — a unit commonly used in AI and NLP tasks like those in large language models (LLMs).

    Breakdown by Architecture:

    • Kepler (2014):
      🔋 42,000 J/Token – Extremely high energy consumption.
    • Pascal:
      🔋 17,640 J/Token – Major improvement, but still energy-intensive.
    • Volta:
      🔋 1,200 J/Token – A significant leap in efficiency.
    • Ampere:
      🔋 150 J/Token – More than 8x improvement from Volta.
    • Hopper:
      🔋 10 J/Token – Efficiency increases dramatically.
    • Blackwell (2024):
      🔋 0.4 J/Token – State-of-the-art, about 25x better than Hopper.
    Image 28

    This chart titled “Blackwell Giant Leap in Inference Performance” visually compares the inference performance of different NVIDIA systems across two axes:

    • Y-axis (Vertical): Throughput, measured in Transactions Per Second (TPS) per Megawatt (MW). Higher is better — it means more inference operations are completed per unit of energy.
    • X-axis (Horizontal): TPS for a single user — essentially a proxy for latency or responsiveness. Further right is better for smart, fast-responding AI, because it means faster response times for individual queries.

    Key Components and What the Chart Shows

    Systems Compared

    Several NVIDIA configurations are plotted:

    • Hopper Dynamo
    • Blackwell NVL8 (FP8 and FP4 precision)
    • Blackwell NVL72 (FP4)
    • Blackwell Dynamo NVL72 (FP4)

    Each curve represents a specific hardware setup and inference precision mode:

    • FP8 (Floating Point 8-bit): Prioritizes energy efficiency and throughput.
    • FP4 (Floating Point 4-bit): Offers higher responsiveness and memory efficiency for ultra-large models.

    Interpretation of Curves

    Each curve maps throughput vs. responsiveness:

    Hopper Dynamo

    • Located in the bottom-left corner.
    • Lower TPS and lower throughput.
    • Represents the previous generation — serves as a baseline for comparison.

    Blackwell NVL8 (FP8)

    • Sharp increase in throughput compared to Hopper.
    • Less responsive per user than FP4, but very energy-efficient.
    • Ideal for high throughput, low-latency environments like batch inference.

    Blackwell NVL8 (FP4)

    • Trades a bit of throughput for better per-user responsiveness.
    • Shows the flexibility of the NVL8 platform with different precision modes.

    Blackwell NVL72 (FP4)

    • Much farther along the x-axis (smart, responsive AI) and still maintains high throughput.
    • Indicates that this configuration supports larger models or more demanding inference tasks with better user experience.

    Blackwell Dynamo NVL72 (FP4)

    • Furthest on the x-axis (TPS for 1 user) and maintains strong throughput.
    • Represents the pinnacle of performance among all plotted systems.
    • Best suited for real-time, large-scale inference — such as for AI agents, chatbots, and LLMs with many concurrent users.

    Why This Matters

    For AI deployment, especially with large language models or multimodal AI:

    • Low latency (right side of x-axis) is crucial for interactive applications.
    • High throughput per megawatt is essential for scaling inference economically and sustainably.

    This graph illustrates how NVIDIA’s Blackwell platform advances both fronts — enabling AI systems that are not just faster, but also greener and more responsive.

    Image 31

    “Blackwell 25X Hopper”, gives a deeper and more technical breakdown of the inference performance gains achieved by NVIDIA’s Blackwell architecture over the previous generation, Hopper. Here’s a detailed analysis:

    🧠 What the Chart Represents

    Axes:

    • Y-Axis (Vertical): Throughput measured in Transactions Per Second (TPS) per Megawatt (MW). Higher is better — more output per unit of energy.
    • X-Axis (Horizontal): TPS for a single user — an indicator of response time or latency. Further right means faster and more responsive AI.

    🔄 Compared Technologies

    📦 Hopper FP8 / NVL8 / Dynamo (Left shaded area)

    • Previous generation baseline.
    • Constrained in both energy efficiency and latency.
    • Systems cluster in the lower-left quadrant of the chart — low throughput and slower response times.

    🚀 Blackwell FP4 / NVL72 / Dynamo (Right gold curve)

    • New generation of inference architecture.
    • Achieves 25x performance per watt over Hopper in this configuration.
    • Dominates in both throughput and user responsiveness.

    🧪 Breakdown of Conditions and Color Coding

    Each point on the Blackwell performance curve includes a label describing the configuration used:

    🟠 Orange: EP8 / EP32 configurations

    • High batch sizes (e.g. 3072, 1792).
    • “Disagg Off” indicates no disaggregated memory, which typically means memory is tightly integrated (low latency).
    • These points dominate throughput (top-left of curve).

    Orange Labels (Top Left)

    Examples:

    • EP8, Batch 3072, Disagg Off
    • EP32, Batch 1792, Disagg Off

    These setups are:

    • Using larger batches → good for processing lots of inputs at once (e.g., summarizing 1,000 emails).
    • Disagg Off = Using local memory for faster speed.
    • These are super energy-efficient but slow to respond to a single user.

    🔑 Use Case: Mass AI tasks, not real-time.

    🟢 Green: EP64, Batch 896 to 128, Disagg Off

    • These represent moderate batch sizes with 64-engine parallelism.
    • Showcases the scalability of throughput as batch size scales down.
    • Still not tuned for individual latency — they optimize for overall energy efficiency.

    Examples:

    • EP64, Batch 896
    • EP64, Batch 256

    These are still doing batch processing, but with smaller groups.

    • Less efficient than orange, but slightly faster.
    • They balance between throughput and response.

    🔑 Use Case: Medium-speed apps — not fully real-time, not fully batch.

    🔵 Cyan/Teal: EP64+EP4, various batch sizes, “Context”, “MTP On”

    • These configurations introduce contextualized inference (e.g., 26% to 1% context).
    • MTP On likely refers to a memory or multi-tasking optimization (e.g., Memory Traffic Prediction).
    • As batch size and context reduce, these points move further right — more optimized for real-time interaction.

    Examples:

    • EP64+EP4, Batch 64, 26% Context, MTP On
    • EP64+EP4, Batch 8, 7% Context, MTP On

    Now we’re entering real-time response territory:

    • These models are designed to reply quickly to user prompts (e.g., chatbot messages).
    • The Context % shows how much memory from previous messages is used. Less context = faster.

    🔑 Use Case: Chatbots, AI agents, personal assistants

    🟣 Purple/Pink: TEP + EP4, ultra-low batch sizes and minimal context

    • TEP16 and TEP8 indicate the use of Token Efficient Parallelism.
    • Target the extreme end of real-time AI — like AI agents responding instantly to prompts.
    • These are closest to the bottom-right: maximum responsiveness with reasonable throughput.

    Examples:

    • TEP8+EP4, Batch 4, 1% Context, MTP On
    • TEP16+EP4, Batch 2, 1% Context

    These are the fastest and most responsive AI setups:

    • Tiny batch size = instant reply.
    • Minimal memory (1% context) = lightning fast.
    • TEP = Token Efficient Pipeline, perfect for streaming tokens one-by-one (like how ChatGPT responds).

    🔑 Use Case: Instant AI assistants, token-based LLMs

    Batch size is the number of inputs (or data samples) the AI model processes at the same time.

    Batch SizeUse CaseProsCons
    Small (1–8)Real-time apps (chatbots, games)Fast responseLess efficient
    Medium (16–128)Hybrid workloadsGood balanceSome latency
    Batch SizeProsCons
    Small (1–4)Fast reply per userLess GPU efficiency
    Medium (16–64)Some efficiency gainSlightly more latency
    Large (128+)High GPU useWay too slow for chat — not used here

    Batch Size = Number of Inputs (Requests) – Think of it as how many separate queries or users are processed at the same time.

    Tokens = Size of Each Input – Number of text units per input i.e. Tokens are chunks of text — like words or parts of words.

    Batch SizeTokens per InputTotal Work
    4100 tokens400 tokens total
    6450 tokens3,200 tokens total
    12,000 tokens2,000 tokens total

    What Does Affect Performance? Performance depends on:

    • ✅ Batch size (number of parallel inputs)
    • ✅ Number of tokens per input (input length)
    • ✅ Model size (e.g., GPT-3, GPT-4)
    • ✅ Precision (FP4, FP8, etc.)
    • ✅ Context window (how much memory the model has to look back)

    Context – It means that the model is using past conversation or prior tokens (called “context”) to help generate the next response — and that context takes up 52% of the total token window available.

    Most large language models have a context window, which is the total number of tokens they can “see” at one time. For example:

    • GPT-4 has context windows up to 128k tokens
    • GPT-3.5 might use up to 4k or 16k tokens
    Context %BehaviourUse Case
    Low (1–5%)Fast, efficientChatbots, token-by-token response
    Medium (10–30%)BalancedAI assistants with memory
    High (50%–90%)Slower, smarterResearch, document Q&A, memory-heavy agents

    Understanding context size is important for:

    • Avoiding repetition
    • Keeping long conversations coherent
    • Knowing when the AI might “forget” earlier parts (which can happen in long documents or multi-turn chats with older models)

    Context & tokens used – “Context is 1–5% for fast, efficient chatbot-type behavior.”

    And now I also said: “I retain 100% of our entire conversation.”

    So… how can both be true?

    In short chat, ChatGPT uses 100% of the context (because it’s short). In high-performance systems, models may use only 1–5% of recent context to keep response times low and energy use efficient.

    This chart shows how token usage shifts as chat length increases, using a model with a 128,000-token context window (like GPT-4 Turbo):

    • 🔹 Short chats (500–2,000 tokens): Almost all the past messages (context) are retained. There’s plenty of room for new input.
    • 🔹 Medium chats (8,000 tokens): About half the window is used for context; the other half is available for new messages.
    • 🔹 Long chats (32,000–128,000 tokens): The system begins to truncate older context, keeping only the most recent ~10% of tokens for context, so it can handle new input efficiently.

    This balancing act ensures the model stays responsive while maintaining coherence — even in very long conversations.

    Image 31

    A larger context window lets the model “remember” way more of your chat, making it ideal for:

    • Long technical discussions
    • Multi-document analysis
    • Persistent memory in AI agents

    📊 Key Takeaways

    🔹 1. Blackwell Dominates the Entire Spectrum

    • From batch inference (top-left) to real-time LLM interactions (bottom-right).
    • The chart shows a smooth scalability curve, which is critical for data centers running both types of workloads.

    🔹 2. 25X Energy Efficiency Boost

    • The Hopper region is boxed in the bottom-left, showing its limits.
    • Blackwell’s curve extends far beyond that — offering 25 times better performance per watt across various configurations.

    🔹 3. Flexibility Across Use Cases

    • Blackwell isn’t just about raw power — it can be fine-tuned for:
    • Massive batch inference
    • Chatbots/AI assistants
    • Streaming, token-based models (e.g., LLMs)
    • The diverse configurations (EP32, EP64+EP4, TEP16, etc.) allow precise tuning.

    ⚡ Terminology Breakdown

    TermMeaning
    EPExecution Pipe — likely a measure of parallel inference lanes
    TEPToken Efficient Pipeline — optimized for LLM token processing
    BatchNumber of inputs processed together — larger batch = better throughput
    Disagg OffDisaggregated memory off — using unified memory for speed
    MTP OnLikely “Memory Traffic Prediction” or related optimization
    Context %Portion of context relative to input, affecting inference depth and latency

    🧠 Why This Matters

    The chart illustrates how Blackwell is not just faster — it’s adaptable:

    • For data centers, it’s a huge energy and cost saver.
    • For AI services, it ensures fast, real-time interaction at scale.
    • For LLMs, it allows tradeoffs between context size, token responsiveness, and energy

    This chart looks complex, but once we break it down, it tells a powerful story about how NVIDIA’s Blackwell platform is a massive leap forward for AI model performance.

    Major trends

    Today, two transitions are occurring simultaneously—accelerated computing and generative AI—transforming the computer industry,

    Accelerated computing

    A full-stack approach: silicon, systems, software (not just superfast chip). Requires full-stack innovation— optimizing across every layer of computing

    • Chip(s) with specialized processors
    • Algorithms in acceleration libraries
    • Domain experts to refactor applications

    Accelerated computing is needed to tackle the most impactful opportunities of our time—like AI, climate simulation, drug discovery, ray tracing, and robotics

    AI Driving a Powerful Investment Cycle and Significant Returns

    • AI Agents will take action to automate tasks at superhuman speed, transforming businesses and freeing workers to focus on other tasks.
    • Copilots based on LLMs will generate documents, answer questions, or summarize missed meetings, emails, and chats—adding hours of productivity per week. Specialized for fields such as software development, legal services or education and can boost productivity by as much as 50%.
    • Social media, search, and e-commerce apps are using deep recommenders to offer more relevant content and ads to their customers, increasing engagement and monetization.
    • Creators can generate stunning, photorealistic images with a single text prompt—compressing workflows that take days or weeks into minutes in industries from advertising to game development.
    • Call center agents augmented with AI chatbots can dramatically increase productivity and customer satisfaction.
    • Drug discovery and financial services are seeing order-of-magnitude workflow acceleration from AI.
    • Manufacturing workflows are reinvented and automated through generative AI and robotics, boosting productivity.
    • Generative AI is trained on large amounts of data to find patterns and relationships, learning the representation of almost anything with structure. The era of generative AI has arrived, unlocking new opportunities for AI across many different applications. It can then be prompted to generate text, images, video, code, or even proteins. For the very first time, computers can augment the human ability to generate information and create.
    • The next AI wave is physical AI—models that can perceive, understand, and interact with the physical world. Physical AI will embody robotic systems—from autonomous vehicles to industrial robots and humanoids, to warehouses and factories

    Three computers and software stacks are required to build physical AI: NVIDIA AI on DGX to train the AI model, NVIDIA Omniverse on OVX to teach, test, and validate the AI model’s skills, and NVIDIA AGX to run the AI software on the robot

    NVIDIA AGX refers to a family of high-performance computing platforms designed for AI-powered autonomous machines and embedded systems. The AGX lineup includes the Jetson AGX Xavier and Jetson AGX Orin modules, each tailored for specific applications ranging from robotics to autonomous vehicles.

    Image 24
    • Perception AI – e.g., image classification, speech recognition
    • Generative AI – e.g., text/image generation (like GPT, DALL·E)
    • Agentic AI – decision-making, planning, goal pursuit
    • Physical AI – robotics, embodied agents in the real world

    AI factories are a new form of computing infrastructure. Their purpose is not to store user and company data or run ERP and CRM applications. AI factories are highly optimized systems purpose-built to process raw data, refine it into models, and produce monetizable tokens with great scale and efficiency. In the AI industrial revolution, data is the raw material, tokens are the new commodity, and NVIDIA is the token generator in the AI factory.

    Image 23

    CUDA Libraries

    Unlike CPU general-purpose computing, GPU-accelerated computing requires software and algorithms to be redesigned. Software is not automatically accelerated in the presence of a GPU or accelerator.

    NVIDIA CUDA libraries encapsulate NVIDIA-engineered algorithms that enable applications to be accelerated on NVIDIA’s installed base. They deliver dramatically higher performance—compared to CPU-only alternatives—across application domains, including AI and high-performance computing, and significantly reduce runtime, cost, and energy, while increasing scale.

    Energy consumption and product evolution

    Image 26

    Amdahl’s Law is a formula in computer science that describes the theoretical maximum speedup achievable by parallelizing a task. It states that the overall speedup is limited by the portion of the task that cannot be parallelized, even if other parts of the task are significantly improved. Essentially, the bottleneck of the task, the part that can’t be parallelized, dictates the overall performance improvement.

    Key Points:

    • Limited Speedup: The law demonstrates that adding more processors or resources will not always lead to a linear increase in performance.
    • Serial Fraction: The portion of a task that cannot be parallelized (often referred to as the serial fraction) is a critical factor in determining the maximum speedup.
    • Maximum Speedup: Even with a large number of processors, the overall speedup will never exceed the inverse of the serial fraction.
    • Parallelizable Fraction: The portion of the task that can be parallelized determines the potential for speed improvement

    Tokens refer to the outputs of AI model inference—specifically, the smallest units of data that large language models (LLMs) like ChatGPT process and generate during operation.

    What are tokens, really?

    • In natural language processing (NLP), a token is typically a word, part of a word, or even a character, depending on how the model is trained.
    • For example – The sentence “AI is powerful.” might be broken into 4 tokens: [“AI”, ” is”, ” powerful”, “.”].
    • During inference (when an AI model generates text), these tokens are:
      • Consumed as input tokens.
      • Produced as output tokens.

    Why are tokens considered a “commodity”?

    In this metaphor:

    • Data is the raw material (just like crude oil).
    • AI models are like refineries.
    • Tokens are the refined product—the thing you can sell, consume, or use for value.

    Tokens are monetizable because:

    Companies pay for the generation and processing of tokens. The more tokens a system can generate quickly and cost-effectively, the more value it can produce.

    For example, OpenAI or any provider might charge per 1,000 tokens of input/output.

    And NVIDIA? NVIDIA is called the “token generator” because:

    • Their GPUs (graphics processing units) are the core hardware powering AI factories. These chips run the computations that generate tokens during AI model inference and training. The better and faster their chips, the more tokens can be produced—hence the “token generator” metaphor.

    Summary

    In this context, tokens = the fundamental unit of AI-generated output, and they’re likened to a commodity in the AI economy, produced in vast numbers by high-performance computing systems like those powered by NVIDIA hardware.

    Price of tokens

    The price of tokens depends on the AI service provider, the model used, and the usage type (e.g. input vs output). Here’s a breakdown using OpenAI’s pricing as an example (as of early 2025):

    OpenAI GPT-4 (Turbo) Pricing Example

    (Prices per 1,000 tokens)

    ModelInput TokensOutput Tokens
    GPT-4 Turbo (128k context)$0.01$0.03
    GPT-3.5 Turbo$0.001$0.002

    1,000 tokens ≈ 750 words (roughly 3-4 paragraphs of English text).

    What does this mean in real use?

    If you send a prompt with 500 tokens and get a response of 700 tokens:

    • Total tokens used = 1,200 tokens
    • With GPT-4 Turbo:
      • Input: 500 × $0.01 / 1,000 = $0.005
      • Output: 700 × $0.03 / 1,000 = $0.021
      • Total = $0.026

    Enterprise & Custom Models

    • For large-scale AI factories or enterprise customers, pricing can differ.
    • Providers like OpenAI, Anthropic (Claude), Google (Gemini), and Mistral may offer:
      • Bulk pricing
      • Dedicated infrastructure
      • Token generation quotas

    Why token prices matter

    • They determine cost per request, especially in high-usage AI products.
    • Companies building AI applications need to optimize token usage for profitability.
    • NVIDIA’s role as a “token generator” ties to the idea that more efficient hardware means cheaper tokens.

    GPUs require specialized software because of their fundamentally different architecture compared to CPUs. GPUs need specialized software because they operate differently from CPUs. To harness their parallel processing power, algorithms must be rewritten, memory must be managed differently, and dedicated APIs like CUDA must be used. Without these changes, the GPU remains idle—even if present.

    GPUs are built for massively parallel operations, with thousands of smaller cores ideal for repetitive, data-parallel tasks (like matrix operations, image processing, etc.).

    For a GPU to be effective, the software must identify and schedule operations that can run in parallel. CUDA libraries and frameworks provide tools and abstractions to rewrite software in a way that exposes this parallelism.

    CPUs and GPUs often have separate memory spaces, and data must be explicitly transferred between them. Efficient GPU use requires careful memory management. GPU-aware software must manage these data transfers and optimize memory usage to avoid performance bottlenecks.

    Image 27

    Move “up and to the right” on the chart — increasing both tokens per user (speed) and tokens across users (scale).

    ElementDescriptionBusiness Impact
    DataThe input to train or run modelsFoundational asset
    Compute (FLOPS)Determines processing speedHigher = faster token generation
    HBM Memory & BandwidthLimits or enables fast data accessEssential for low-latency AI
    Architecture + SoftwareOptimizes resource usageBetter efficiency, more output per $
    Tokens/sec (Throughput)Tokens produced across all usersHigher = greater monetizable volume
    Tokens/sec (Latency)Tokens delivered per user per requestFaster = better UX, higher retention
    RevenueComes from charging per tokenScale directly = scale profit

    Just like traditional factories turned raw materials into products for sale, AI factories turn data into tokens, which are the billable product of modern computing.

    NVIDIA and similar players are building the infrastructure to maximize token output. Every optimization in compute, memory, and architecture leads to higher revenue per watt, per chip, per rack.

    AI Factory: The New Digital Industrial Model

    ⚙️ Analogy: AI as a Factory

    • Raw Material → Data
    • Factory Machines → GPUs + Memory + AI Models
    • Refined Product → Tokens (AI-generated output)
    • Output Unit → Tokens per second
    • Revenue → Tied directly to how many tokens you produce and sell

    The Three Scaling Laws

    Pre-Training Scaling

    • Traditional model scaling: increase data, model size, and compute during initial training.
    • Fuels capabilities in perception and early generative tasks.

    Post-Training Scaling

    • Refers to improving models after initial training (e.g., via fine-tuning, RLHF, tool use).
    • Critical for making models more useful, safe, and capable in nuanced tasks.

    Test-Time Scaling (Long Thinking)

    • Inference-time enhancements: allowing the model to think longer or use more compute dynamically during use.
    • Examples: chain-of-thought reasoning, tool-assisted reasoning, external memory.
    • Supports more agentic, deliberative, and planning-heavy tasks.
    Image 30
    Image 1

    Kepler (2012) – 28 nm and a Focus on Efficiency

    Introduced in 2012, Kepler was built on TSMC’s 28 nm process and succeeded the Fermi architecture. It was NVIDIA’s first design focused heavily on energy efficiency.

    Kepler struck a balance between graphics and compute. It improved gaming performance while significantly advancing GPU computing. Its efficiency and new features (Hyper-Q, dynamic parallelism) made GPUs more attractive and easier to use in HPC clusters.

    Overall, Kepler established a foundation that future architectures would build on, especially for making GPUs more general-purpose and energy-efficient.

    New features: Kepler introduced several capabilities to improve GPU compute and parallel programming. Dynamic Parallelism allowed GPUs to spawn new work without CPU intervention, and Hyper-Q enabled multiple CPU threads or MPI processes to more efficiently share a GPU by feeding work into its queues in parallel. These features reduced CPU-GPU idle time and improved utilization in HPC applications.

    Notable products: Consumer: GeForce GTX 680 was the first Kepler flaghip, later followed by GTX 780 and the first “GTX Titan” (2013) using a full GK110 chip with 2688 cores and 6 GB memory, which brought compute capabilities to a prosumer card.

    Data Center: Tesla K20 and K20X accelerators (2012) were used in Oak Ridge’s “Titan” supercomputer and other HPC systems, delivering ~1.3 TFLOPS FP64 with much higher efficiency than Fermi GPUs.  The later Tesla K40 (2013) and dual-GPU K80 (2014) gave increased memory (up to 12 GB per GPU) for HPC workloads. Kepler GPUs proved their worth in early deep learning as well – for instance, researchers leveraged GTX 680/770s (and even K20s) for training some of the first neural nets, though Kepler lacked dedicated tensor hardware.

    Pascal (2016) – FinFET Performance Leap and Memory Innovation

    Announced in 2016, Pascal marked a major generational jump, fabricated on 16 nm FinFET (TSMC) – a shrink from 28 nm that brought huge gains in transistor density and power efficiency

    Pascal was introduced first in the Tesla P100 accelerator in April 2016, then in GeForce GTX 1080 for consumers a month later. It was the first architecture to feature HBM2 memory and NVIDIA’s high-speed NVLink interconnect in its HPC variant, while the consumer cards used improved GDDR5X memory.

    Impact on AI, HPC, and graphics: Pascal set the stage for the AI boom. Researchers and companies adopted Pascal GPUs (like the DGX-1’s P100s) to train neural networks, and it was instrumental in the transition from CPU-only deep learning to GPU-accelerated deep learning across industry. Though still general-purpose cores, its FP16 capability hinted at specialized AI use. In HPC, P100 GPUs delivered big leaps in simulation and scientific computing, enabling GPU-based supercomputers to dominate the Top500. For graphics, Pascal GPUs increased performance per watt dramatically, making high-end gaming and VR more accessible, and features like SMP improved VR image quality. Pascal essentially bridged the gap between a pure graphics architecture and one ready for compute and AI, laying a foundation that the next architecture (Volta) would greatly expand on for AI acceleration.

    VOLTA.

    In late 2017, NVIDIA launched Volta, an architecture squarely aimed at compute and AI, though it carries the GeForce lineage forward from Pascal. Volta (GV100 GPU) was manufactured on a custom 12 nm TSMC process.  It was NVIDIA’s first architecture to introduce specialized hardware beyond the traditional CUDA cores: namely, the Tensor Cores, which profoundly boosted deep learning performance.

    Impact on AI/ML, HPC, and graphics: Volta was a watershed moment for AI/ML. It substantially reduced training times for neural networks – models that took weeks on Pascal could be trained in days on Volta. This enabled researchers to iterate faster and train larger, more complex models (Volta’s Tensor Cores were crucial in the AI breakthroughs around 2018–2019, such as image recognition improvements, machine translation, and the early transformers). In HPC, Volta continued NVIDIA’s disruption of supercomputing: V100s offered stellar FP64 performance and helped GPU-based systems dominate the top of the Top500 list, while also accelerating mixed-precision scientific computing and AI workloads for scientific research (folding in AI with traditional HPC). For graphics, although gamers didn’t get a Volta GeForce, the architectural advancements (like concurrency and tensor core acceleration) foreshadowed technologies that would soon enter gaming with Turing and later Ampere (e.g., DLSS – an AI upscaling feature – was made possible by Tensor Cores, which debuted in Volta). Overall, Volta firmly established that GPU architectures would henceforth cater not just to rendering graphics, but to accelerating AI at scale – a theme that continues through Ampere, Hopper, and Blackwell.

    Ampere (2020) – Unified Architecture for CUDA, Tensor & Ray Tracing

    In 2020, NVIDIA’s Ampere architecture sought to unify advances for both data center AI/HPC and consumer graphics. Succeeding Volta (and Turing on the consumer side), Ampere was manufactured on two process nodes: the A100 data center GPU used TSMC 7 nm, while GeForce RTX 30-series GPUs used a custom Samsung 8 nm process. The Ampere family thus covered everything from the largest AI supercomputers to gaming PCs, with shared architectural principles but some differences in implementation.

    Relevance to AI/ML, HPC: Ampere (A100) was the dominant AI training and inference GPU of its time – it’s estimated that by 2022, the vast majority of machine learning model training computations were happening on NVIDIA Ampere GPUs in data centers. With its MIXED-precision and sparsity features, Ampere allowed training of larger models like GPT-3 with fewer GPUs or in less time than Volta would require. For HPC, A100’s strong FP64 (9.7 TFLOPS, ~1.25× V100) and huge memory made it ideal for HPC centers upgrading from V100 – many scientific computing sites integrated A100s for simulation and AI convergence workloads. Graphics and gaming: GeForce Ampere cards became very popular (notwithstanding supply issues during the 2020–2021 crypto-driven shortage). They enabled high-fidelity 4K gaming with ray tracing and were instrumental in pushing technologies like DLSS 2.0 (which uses the Tensor Cores to boost frame rates via AI upscaling). Ampere’s success in both arenas demonstrated the versatility of a GPU architecture that could accelerate traditional graphics and the new frontier of AI simultaneously.

    Hopper (2022) – Data Center AI Beast with Transformer Engine

    Unveiled in March 2022, Hopper (named after Grace Hopper) is an architecture designed exclusively for data center and compute, succeeding Ampere’s HPC role (while the Ada Lovelace architecture, launched in late 2022, catered to consumer GPUs). The flagship NVIDIA H100 GPU is built on a custom TSMC 4N process (optimized 4 nm) with around 80 billion transistors on a huge 814 mm² die

    Hopper introduces significant new features to accelerate AI training, especially for large language models, and pushes the envelope in GPU computing performance and scalability.

    Relevance and use cases: Hopper H100 quickly became the premium solution for cutting-edge AI labs and enterprise AI. Its introduction coincided with the explosion of large language models like GPT-3, and later GPT-4 – models which greatly benefit from Hopper’s Tensor engine and huge memory. For training giant models (hundreds of billions of parameters), H100 can reduce training times and also make feasible the real-time inference of those models using FP8. For example, NVIDIA reported up to 30× faster LLM inference on

    This capability is transformative for deploying AI (e.g., chatbots, AI services) at scale. In HPC and scientific computing, H100’s FP64 is about 2× A100 (reaching ~20 TFLOPS FP64), and its DPX and other enhancements open GPUs to new workloads like genomics and dynamic programming tasks. Essentially, Hopper extends NVIDIA’s dominance in AI computing by not just throwing more cores at the problem but adding targeted hardware and architectural tweaks for AI and data-centric computing. It underscores the shift of GPUs from pure graphics to general-purpose compute devices tailored for AI. As such, Hopper (H100) has been adopted in top supercomputers (it’s a key part of the AI-focused Leonov and Alps systems, for instance) and in cloud GPU offerings where maximum performance is needed (OpenAI, Meta, etc., all source H100s for training their latest models). For completeness: Graphics impact: While Hopper itself didn’t directly impact gaming (since it wasn’t in GeForces), the architectural learnings did. The idea of large L2 caches and massive bandwidth was seen in Ada Lovelace (which increased L2 cache dramatically for GeForce 40-series, inspired partly by what was done in A100/H100 for compute). And some of Hopper’s improvements, like better scheduling and concurrent execution, continue to influence future consumer designs. But primarily, Hopper will be remembered as the AI-focused architecture that enabled the “AI factories” of the mid-2020s – specialized datacenters churning out AI models and services.

    Blackwell (2024–2025) – Next-Gen Architecture for AI and Graphics

    Blackwell is the codename for NVIDIA’s newest GPU architecture (named after mathematician David Blackwell), succeeding Hopper (data center) and Ada Lovelace (consumer). Announced in 2024, Blackwell GPUs target both the data center (for exascale AI and HPC) and the next generation of GeForce RTX 50-series graphics cards for consumers. This architecture continues NVIDIA’s trajectory of AI-centric design, while also doubling down on graphics performance, particularly ray tracing and neural rendering.

    Blackwell Ultra: Scaling AI Performance (2025)

    Announced in 2025, the Blackwell Ultra architecture represents a substantial leap in AI processing capabilities. Key features include:

    • Dual-Die Design: Combining two GPU dies to increase computational density.
    • Enhanced Memory: Up to 288 GB of HBM3e memory, facilitating larger AI models.
    • Improved Performance: Achieving 20 petaflops of FP4 performance, doubling the capabilities of its predecessor, the Hopper architecture.

    These advancements cater to the growing demand for AI applications requiring higher throughput and efficiency.

    🔭 Vera Rubin Architecture: The Next Frontier (2026)

    Scheduled for release in the second half of 2026, the Vera Rubin architecture is poised to redefine AI computing. Named after astronomer Vera Rubin, this architecture introduces:

    • Vera CPU: Nvidia’s first custom-designed processor based on the Olympus core architecture.
    • Rubin GPU: Delivering up to 50 petaflops of FP4 performance, more than doubling Blackwell Ultra’s capabilities.
    • Advanced Memory: Utilizing HBM4 memory with 13 TBps bandwidth, significantly enhancing data throughput.
    • NVLink 6 Interconnect: Providing 3,600 Gbps of bandwidth for efficient GPU-to-GPU communication.

    The Vera Rubin platform is designed to support complex AI models, including those requiring advanced reasoning and decision-making capabilities

    ArchitectureCodenameYearNamed AfterField / Contribution
    FermiFermi2010Enrico FermiNuclear physics, quantum theory, father of the atomic bomb
    KeplerKepler2012Johannes KeplerAstronomy, planetary motion laws
    MaxwellMaxwell2014James Clerk MaxwellElectromagnetism, Maxwell’s equations
    PascalPascal2016Blaise PascalMathematics, probability theory, Pascal’s triangle
    VoltaVolta2017Alessandro VoltaElectricity, inventor of the electric battery
    TuringTuring2018Alan TuringComputer science, codebreaking, Turing machine
    AmpereAmpere2020André-Marie AmpèreElectrodynamics, Ampère’s law
    HopperHopper2022Grace HopperComputer programming, developed first compiler
    BlackwellBlackwell2024David Harold BlackwellStatistics, game theory, first Black member of US National Academy
    RubinRubin2026*Vera RubinAstronomy, dark matter research
    Image 21
  • Protected: Tear Sheets

    This content is password-protected. To view it, please enter the password below.

  • Protected: Rapid7 products and M&A

    This content is password-protected. To view it, please enter the password below.

  • Protected: R7 Inc

    This content is password-protected. To view it, please enter the password below.

  • Protected: Rapid7

    This content is password-protected. To view it, please enter the password below.

  • Protected: Rapid7 Inc.

    This content is password-protected. To view it, please enter the password below.

0

Subtotal