Advertising System Architecture: How to Build a Production Ad Platform


Digital advertising has become one of the most important revenue sources for lots of technology companies nowadays. Google, Facebook, Twitter and Snapchat, all these famous names monetize heavily on people’s attention. Also, it may be one of the most important new businesses in the 21st century. Unlike those traditional advertising techniques, digital advertising unlocks far more reliable insights about the campaign performance. Therefore, more and more advertisers start to put more budget on online ads, and more and more technology companies start to offer advertising options on their platform. However, how in the world does these ads engine work behind the scene? What if we want to build our own ads system to serve ads on our platform as well? (But, to be honest, for a small business or individual developer, I would suggest you use existing solutions like Google DFP). Today, I’m going to show you the technical system design of such a system, so maybe you could get some inspiration after reading.

System Overview

Ads system is no different than a live exchange. It exchanges information between two parties: advertisers and end users. In an ideal world, we would just need: 1) a web interface where advertisers or internal salespeople can manage the order. 2) your client application where the ads get displayed. 3) a backend system that maintains these orders and filters out which ad to present. Unfortunately, this design won’t scale well for a serious advertising business due to few constraints.

The first constraint is the frequency. Advertisers set up their target audience and budge to reach as many audiences as possible. However, as a platform, you don’t want to hurt user experiences by showing too many ads too. Even if you are always suggesting useful information to users, it’s still a distraction from your primary service.

The second constraint is the speed. Unlike posts, images, and videos, ads are a lot more dynamic. An advertising campaign may reach its budget any time and may get canceled any second. However, to make ads a consistent part of our platform, we have to make it lightning fast. When the ads response time is longer than other services, we even have to drop the advertising opportunity completely.

The third constraint is the quality. Although satisfying the first two constraints makes the ads engine run smoothly, you have to provide the right ad to where it’s needed the most to maximize your profit. Advertisers are usually willing to pay tens or hundreds of times of money if you can get a click for their website or an install for their app. So, the more interaction you gain from users, the more money you make. Hence we need to maximize the probability of such interactions.

All these constraints make a simple information exchange problem much more complicated now. So this is why we also need to build so many peripheral components. Ranking service to help improve the ad quality, pacing service to help control the ad frequency, a low-latency delivery module to guarantee the response time, etc.

The above image shows the overall architecture of a fully functioning ads system. In bigger companies like Google and Facebook, their system may have more enhancement to meet the specific needs. However, this architecture could get you a pretty good start. As you can see, advertising is a cycle between advertisers and end users. The system we build first collect campaign information from advertisers, then build ad indices based on pacing and forecasting. With this ad index and the user profile from a targeting service, the ad server asks ranking service to score candidate ads and find the most suitable ad for the current ad request. Once the user finished interacting with the ad, either skip it or click it, we collect the metrics and send them back to metrics collectors. A data pipeline behind these collectors will aggregate the event history into more valuable real-time stats and business log. The real-time stats, in turn, helps the ad server and pacing service for more accurate control of delivery. The business log will be used by inventory forecasting and billing so that advertisers can then set up more campaigns to experiment with their marketing strategy.

Next, I will dive deep into each component in this ads system to discuss some more technical details.


Web Interface

Like I mentioned above, an ads system is essentially an information exchange. So the first thing is to have a way to feed all the information into this exchange. Usually, we could set up a web interface to help people manage their advertising campaigns. This web interface is usually a single page Javascript application that can handle complicated form input, large tables, and rich multimedia content. Just register an account at Google Ads, Facebook Ads Manager or Snap Ads Manager, you will have a rough understanding about how this UI should be like.

Before I jump into the specific technical problems, let’s get familiar with the typical digital advertising hierarchy first. Usually, an advertiser creates a Campaign which represents a marketing initiative. In the meantime, there’s also a notion of Ad which is the single unit of an ads that will be delivered to the audience. To provide fine-grained control, large advertising platforms also introduce another layer called Ad Set or Line Item in between Campaign and Ad, to group a bunch of similar Ad. Also, to make it possible for an advertiser to switch out the actual content of an Ad, there is a notion of Creative or Media which used to represent the content of an Ad. This abstraction could help us decouple the logic well. These four entities are the most important things to make an ad run, but there’re also some other auxiliary entities just like any other platform: Account, Payment Source, Predefined Audience, etc. But for now, let’s focus on the main flow.

One of the biggest challenges that an advertiser web UI could have is complex state management. For an application like the above, there could be hundreds or even thousands of different states to track at the same time. For example, to make it possible to buy a simple video ad, the software needs to maintain all the hierarchy, metadata and targeting information in a temporary place before the user commits the data. Each entity would also have tons of different fields that might interlock with other fields. The objective of a Campaign could affect the type of ad that you can buy. The location targeting of an Ad Set could define a minimum daily budget. The type of an Ad could also limit the options of “call to action”.

Another big challenge is a large variety of UI components which is required by the business flow. For instance, let’s take a look at the targeting section when creating a campaign. The Location targeting needs a map component. The audience group requires a tree select component. The duration setting is a datetime range selector. Furthermore, all these components could appear in any place of your application, so you’d better reuse or at least abstract most of them.

The third challenge I would like to mention is the big table experiences. As we know, there’re tens of fields to control how to run an advertising campaign. In fact, there’re also tons of other columns in the table are used to report metrics of a given entity. Different advertisers may rely on different metrics to measure the performance of the ads, so your main table should be versatile enough to show tons of columns given any order or preference.

Thankfully, Facebook has open sourced their UI framework React.js a few years ago. By adopting the Flux philosophy and a good encapsulation of components, I believe it’s by far the most comfortable way to build an advertiser web app. The Flux pattern addresses the headache of intertwined states, and the JSX makes it so easy to write reusable UI components. In addition, you could also add Redux to your tech stack to make state transition more predictable and maintainable.

With the proper tech stack, we can now divide this web UI into the following major areas:

  • Home Tables: Where all the entities like Campaign, Ad are shown. Further editing could be made from here too.
  • Creation Flow: A wizard form that helps advertiser (or internal ad ops) to place an order step by step.
  • Stats and Reporting: Where advertisers can track the ad performance like impression counts, and also export the data for reference
  • Asset Library: A place to manage all the multimedia content that a user has uploaded. Most of the time, there’re dedicated creative studios which help advertisers to make the asset. But, we can also offer some basic media editing web tools to help smaller business that don’t have budget for professional creative service.
  • Billing and Payment: To manage the payment source and view the billing information. This part is usually integrated with Braintree APIs to support more funding source.
  • Account: To manage role hierarchy and access control of multiple users. Usually, a big organization would have multiple ad accounts. Also, different roles would have different access.
  • Audience Manager: This might not be necessary at the beginning. However, it could be convenient for frequent ad buyer to be able to define some reusable audience group so that they don’t need to apply complex targeting logic every time.
  • Onboarding: This may be the most critical part of the system at an early stage. A good onboarding experience could increase sign-up significantly.

With a good web interface, the friction of buying an ad in our system could be reduced to the minimum from now on. However, keep in mind that the graphic user interface isn’t the only entrance of our system. Next, let’s take a look at Ads APIs (Application Programming Interface).


First of all, what’s Ads APIs? Why do we need it? As you can see from the last paragraph, our web interface needs to handle a very sophisticated form and help our clients (advertisers) to manage their orders. To persist all these changes and also provide data for the UI, we will need a service layer to do CRUDs. However, if that’s the only functionality, we won’t call it APIs. It can just be like any other backend.

In fact, advertisers usually won’t put all eggs into one bucket. In addition to using this web interface we build for them, they will also consult with some advertising agencies and spend part of their budget there. Those ad agencies usually have their own software to track marketing campaigns and have direct access to all major digital advertising platform. Therefore, Ads APIs is also meant to be used by agencies. If we build our in-house solution on top of this external Ads API, we will be able to identify problems earlier than our APIs customers.

To cope with third-party agencies, often the best shot is to build a RESTful APIs, as it’s almost the standard way for two unfamiliar parties to communicate. There’re also fancier solutions like gRPC and GraphQL, but RESTful can guarantee the most compatibility here. Besides, it’s also easier to write public documents for your RESTful APIs because things are grouped by resources and methods. Take a look at Twitter’s API references:

Now that we have a general idea of how to structure our APIs. I want to talk about four pillars in implementing these APIs briefly.

Campaign Management
In short, campaign management is the CRUD of all advertising entities that I listed in the last section. However, from a business perspective, it’s more complicated than just persisting some values. First, we need to deal with tons of validation for all sort of business rules. Hence, a proper design pattern like Facade and consistent abstraction using interfaces and inheritance are important. Second, to ensure the workflow to be manageable, it’s also likely to use a state machine here to maintain the different status of campaigns, ad set or ad. Third, lots of ads operations are long-running or asynchronous. For example, whenever a new video gets uploaded, we need to trigger a build process to prepare different resolutions/encoding of the video to be usable on multiple platforms. This type of asynchronous jobs are usually controlled by some PubSub or TaskQueue system and integrate with the state machine I mentioned before. Fourth, your database should support transactions in a scalable way because most of the operations in Campaign Management may affect multiple tables.

Access Control
Since these APIs will be used by external users eventually, we need to be really careful here on AuthN and AuthZ. A typical cookie session authentication system is acceptable. But more often, a token secret is preferred for stateless RESTful APIs. We can either use third-party OAuth 2.0 or build our own. A simple JWT token exchange may not be secure enough as we are talking about real money business here. Authorization is also a big part. In the ads system, there could be tens of different roles like Account Manager, Creative Manager, Account Executives, etc. A creative manager may upload new creative but not allowed to create a new campaign. An account manager is allowed to create a new campaign, but only an account executive is allowed to activate it. Luckily, most entities in the ads system fall into some hierarchy. Accounts belong to an organization; campaigns belong to an account, etc. Hence, we could adopt the chain of responsibility pattern, and track their access of certain entity through the hierarchy chain.

Billing and Payment
In the beginning, you may rely on the internal operation and direct sales of your ads. So the line of credit and coupons could be enough to handle payments. After all, operation guys can do it all in their ERP or even spreadsheet. However, to accept public payment source like a credit card, most likely you need to ask some third party service to help you. Braintree and Stripe are both leading payments solution for the enterprise. Furthermore, another concern from the external payment source is the risk of abuse and spam. Proper rate limit, anomaly alarm, and regular audit shall be enforced to avoid such risk.

Metrics and Reporting
Last but not least, Ads API also handles the reporting for both ad agencies and our web interface. The challenge of reporting could vary a lot based on the data warehouse and the way you collect metrics. So be careful when you design the metrics collection pipeline, especially for aggregated results. For example, some data might not be available when the granularity is WEEKLY in stats query. However, one thing remains the same for all reporting service is that the QPS is usually higher than other endpoints. You could support batch stats query to reduce the additional requests, but still, people check the metrics much more often than actually putting a new order.

There’re still many more modules in a real production Ads APIs system, such as budget control, content review pipeline and so on. We can also incorporate machine learning models to do auto-tagging to expedite the review process. However, you should already have a good understanding of where to start now.

Ad Index Publisher

Advertisers can create a campaign and throw in their own image or video ads in our system now. This is great. From now on, we are stepping into the user’s side of the system. To determine which ads to be delivered to which user, the first thing we need to know is that the active ads at this moment. The easiest way to achieve this is by querying the database and filter by the status field. However, the query could usually take too long and couldn’t meet our speed requirement. The database tables are usually structured in a way for an easy write but not an easy read. For example, you could have four tables which record Campaign, Ad Set, Ad and Creative. This is easy when we want to update some values for a specific entity. However, when we serve the ads, we have to query four tables to actually know that if a Creative and all its parent are in active status.

To solve this issue, we need an index publisher to pre-calculate lots of useful indices and save time for the later serving. It publishes the indices to some storage service, and the ad server loads them into memory periodically. One of the challenge to generate the live index is the various business rules we need to apply, like spend limit and ad scheduling. Also, those tables could relate to each other and require a very complex validation. To manage the dependency here, we could introduce Spark or Dataflow. This would often lead to a multi-stage data pipeline like this:

In general, we need to generate three types of index:

Live Index
This is an index which tells us all the live ads in the system. Also, it contains all the necessary information that the ad server needs to form an ad response, such as the resource location and ad metadata. Besides the primary index from id to metadata, we could also build some secondary indices based on targeting rules. The ad server uses these secondary indices to filter out irrelevant ads and only preserve the best candidates for auctions. I’ll discuss auction and filtering in the ad server section.

Pacing Index
Another index we need to prepare is about pacing status and factors. We intentionally separate this with the live index because pacing usually requires much more calculation, so we want it to be independent. It also helps to make our system more resilient because we can still serve live ads when there’s a pacing issue.

Feature Index
This index contains ad features which will be used by ranker later. We can also replace this index with a low-latency datastore like Cassandra or in-memory database like Redis.


Before we get into the actual delivery of an ad, let’s consider this scenario first. An advertiser may want to advertise an ad over a month to get 100K impressions. However, we don’t want to exhaust all the impressions in the first few hours of the campaign. Instead, we’d like to deliver it throughout the lifetime of the campaign. Also, from the end user’s perspective, we don’t want to overwhelm them with the same ad at one time while showing them nothing at another time, which is also called ad fatigue. The mechanism to control this delivery process is called pacing. Pacing is like a north star for our ad system; the pacing index we generated will guide the direction of ad delivery later.

The simplest way to do this is by splitting the budget into an hour or minute trunk. If one ad exhausts its budget within the current minute, it will be filtered out from index publishing. However, this way doesn’t have much flexibility for more fine-grain control, and it still bursts in a smaller period of time.

One of the most traditional ways to control the pace is called PID controller.

For the detailed explanation of PID controller, you can refer to Wikipedia. In short, by analyzing the difference between the desired state and current state, this controller can tell us how much input should we give to the delivery system. If the pacing is lagging, this controller will tell us to give a bigger input, which translates to a higher pacing factor. And a higher pacing factor would end up in higher bid, thus beat everyone else to win the ad opportunity.

It’s easy to know the current state (current total impressions, current total click, etc.) by connecting with the metrics system. Yet, what about the desired state? As we know, the X-axis of this PID controller is the time. What about Y-axis? To start simple, we can project a linear line from 0 to the total impression (or click, depends on the configuration) that the ad wants to reach. Instead of having the total number as Y-axis in our PID controller, we use the rate (the number of delivered impressions per minute) to reflect the desired state. If our current delivery rate is lower than the desired rate, then our pacing factor needs to increase.

One trick to notice is that sometimes PID controller could get too slow to start. Also, the fluctuation could be pretty significant sometimes. Therefore we could introduce some more multipliers into this formula. With this pacing factor, we can now implement a simple pacing service to make sure smooth delivery of ads. There’re also other pacing techniques to address some particular problems. For example, we can also make a local pacing factor for each ad server machine to balance the differences in a distributed system. Moreover, a reach and frequency factor could also be introduced to make sure the frequency requirement is satisfied.


At this section, I’m going to discuss the heart of the ad engine: Auction. An auction is a process of buying and selling goods or services by offering them up for bid, taking bid and then selling the item to the highest bidder. In acution-based ads systems, ads from different advertisers participate in an , and the ad with the highest bid win the auction and will be shown to the user.

When we combine estimated value with the incoming inventory (opportunity) information from an ad request, we can now determine which ad to show for this opportunity. The trick to add pacing into an auction system is to multiply the bid value using the pacing factor. The more urgent an ad is, the pacing factor is bigger and therefore lead to a higher bid.

To build an auction house, the first thing you need to decide is which auction strategy to use. Let’s assume we are going to use the most common strategy called Second-Price Auctions. In real time bidding, the second-price auction gives the winner a chance to pay a little less than their original submitted offer. Instead of having to pay the full price, the winning bidder pays the price offered by the second-highest bidder (or plus $0.01), which is called the clearing price. This could lead to an equilibrium that all bidders are incentivized to bid the true value. Although advertisers give us a maximum bid value, we also need to consider other factors like the probability of the event occurring and our pacing status. By using a formula like below, we can then calculate a real bid price to use in the current auction.

Total Bid = k * EAV + EOV
Where k is the pacing factor, EAV is the estimated advertiser value, and EOV is the estimated organic value

Advertiser value is the actual return that an advertiser could gain from displaying this ad. Different advertising goal might have different ways to calculate EAV. If we want to optimize for clicking, then:

EAV_click = b_click * p_click
Where b_click is the max bid price for getting a click, p_click is the probablity of getting a click

Organic value is the benefit to the platform or user experience. To calculate the advertiser value, the simplest way is to multiply the probability of an event with the max bid price of this type of event. For example, if the probability of getting a click for an ad is 0.1 if we deliver this ad, and the advertiser is willing to pay $1 for a click, the AV will be 0.1*1=0.1 here. To calculate the organic value, the formula is different in different platform. For example, skipping an ad could mean a bad user experience, so the organic value could assign a negative weight to skip.

EOV = p_click * w_click + p_finish * w_finish + ... - p_skip * w_skip
Where p_event is the probability of such event, and w_event is the weight that this event contribute to the final value

In a more complex auction system, some more formula could be designed to reflect the real bid price based on different business priority. Bear in mind that the math your use determines the flavor of your ad engine. For guaranteed delivery, we could make up a high bid to have it win over all other ads in the auction.

Note that this bid price here is only for the auction. Since we are adopting second-price auction here, the advertiser only needs to pay the price offered by the second-highest bidder. Also, if we would like to penalize bad user experience ad, we can also do:

price = Total Bid(second highest bidder) - EOV(winner)

By doing so, if the winner ad has great value to the users, the price is going to drop, and its ROI will increase, vice versa.

So far, we talked about the strategy to use when comparing different bidders, and also how to calculate the real bid price. The final piece of the puzzle is the actual auction engine that connects all these. When a request comes into the auction engine, we first sort all the candidate ads and find the highest bidder for the current opportunity. Sometimes, there’re some business reasons to group candidates into several priority groups. For instance, a first party promotion could be more important than anything else. Naturally, the auction engine now becomes like a waterfall. The opportunity request falls through each priority group and does the auction only within the group. Only when there’s no suitable candidate found will the request goes to the next group. One caveat for this auction is that delivery ads have a cost as well (network bandwidth, computation resource, etc.). Hence we could also set a floor bid price to filter out those candidates with a neglectable bid price.

However, there’s a limitation here that we are only auctioning for one item (like one ad slot) at one time. In some case, we need to be able to auction multiple items together because they are related. In this case, we could implement the generalized second price auction. The generalized second-price auction (GSP) is a non-truthful auction mechanism for multiple items. Each bidder places a bid. The highest bidder gets the first slot, the second-highest, the second slot and so on, but the highest bidder pays the price bid by the second-highest bidder, the second-highest pays the price bid by the third-highest, and so on.


In general, the problem that an ads ranking system tries to solve is like this: Given an opportunity to pick an ad for a user, deliver the best ad by taking into account the user’s past behavior, user interest and ads historical performance that maximizes the ROI for the advertisers. So how does it work? Remember the formula to calculate the real bid price for each ad? We not only need the ad max bid price but also need the probability of the event and the weight of organic values. In the most naive version of the ad server, this ranker could return a hardcoded score. The auction engine will loop through all candidate ads and calculate the EOV and EAV super quickly. However, if we extend this ranker into a separate service, we can incorporate more techniques like machine learning to improve performance and profit.

To build such a machine learning pipeline, we need to do the data cleaning and aggregation first. This step is also called feature engineering, which extracts useful information from the raw user and ad metrics. Common features include but not limited to:

  • Context features: time, location, device, session, etc
  • Demographic features: gender, age, nationality, language, etc
  • Engagement features: view count, click count, etc
  • Ad features: campaign id, brand id, account id, etc
  • Fatigue features: number of times the user has seen this ad, brand, etc
  • Content features: category, tags, object detection, etc

With the help of Dataflow/Spark/Flint, we can batch transform all these features and put them into a feature store. The next step would be training a model using these features. Unlike computer vision or NLP, machine learning techniques in ads are usually more straightforward and more mature, such as Sparse Logistic Regression and Gradient Boosting Decision Tree. We can build a generic trainer using TensorFlow, XGBoost or scikit-learn. To run experiments, we feed different config protobuf to the training.

When the trainer finishes training and passes all the validation and evaluation, it publishes a model file to some remote repository and notifies the ranker to pick up this new model. Often a new model is only deployed to a small subset of the production fleet to verify the performance.

Because some input of features come from real-time data, we also need to build a feature index on a timely basis. As I discussed earlier, this index is prepared by the ad index publisher. The ranker would then look up things like ad stats, user engagement metrics from this index during runtime and perform the inference.

In a production machine learning pipeline, a central scheduling system orchestrates all the tasks. The system maintains the jobs information, tracks the status of each task and helps to manage the DAG of the workflow. It’s not hard to write this service from the ground up, but using container operations in Airflow or Kubeflow is also a good option.


Forecasting, or Inventory Forecasting, is a way to predict the number of impressions for a particular targeting requirement at a given time. Although not necessary for a minimum viable version of the ad system, inventory forecasting can be useful in lots of aspects. It can give our sales team or web interface an idea about how much future inventory can be booked. It can also improve the pacing system by providing the actual and predicted traffic, thus serving more impressions during periods of plenty, and tightening down when traffic is scant. Furthermore, it can also be used in new feature testing and debugging.

A naive implementation would be counting the potential future inventory (e.g., impressions) by looking back to the historical data, then categorize them into different buckets. Then, we take the new ad (or ad set, depends on where the targeting rules are applied) and find the buckets for it and accumulate the inventory. However, the downside is that we need to maintain these categories separately than our production serving logic which is not entirely scalable.

A more robust solution is to take the logic from the production serving path and simulate it. By simulation, I mean take the new ad, pretend that it’s a live ad, and feed in the historical impressions and try to “serve” it. Since it’s using the same logic as our production ad server, it more accurately reflects the future behavior.

However, if the traffic is enormous, a full simulation could take too long to run. If that’s the case, we need some techniques to improve performance. Generally, there’re a few methods:

  • Downsample the historical requests: For example, Only take 1/10th of all the impressions per user. The reason we need to sample by the user is that frequency capping could affect the delivery when there are more impressions.
  • Turn off some server functionalities: Some features we have on ad server is not necessary for us in the simulation. If we can turn off some unimportant ones like using a fixed weight instead of query ranking service, the run time for each simulated auction will be reduced.
  • Assign the inventory for existing ad separately: When we forecast the inventory for a new ad, we can reuse the inventory assignment for other existing ads from a recent forecasting job. Furthermore, we could set up a separate job to prepare the assignment for existing ads.
  • Parallel the simulation: Now that we have already assigned slots for existing ads, we only need to deal with one new ad at one time. Therefore, there won’t be any interference from other ads, and we can simulate the delivery in multiple machines in parallel.

This simulation design could help us understand how much inventory do we have for a new ad. But let’s also take a look at other benefits of simulations. First, the event generated from the simulated run could also be used as event estimation. We can now predict how much click or app install we could get from this ad. Second, we can further develop a reach and frequency booking system. Sometimes, an advertiser would like to book an inventory to guarantee to reach X user for Y times. If an ad has booked 100K impressions, we will take these 100K impressions out from our future simulations to make sure we are not overbooked.


Although machine learning can help us find potential good matches between ads and users, advertisers often have their own opinion about where the ad should be delivered to. Sometimes, the targeting audience would be really broad for brand awareness type of advertisers like Coca-Cola. Sometimes, the scope will be a small niche market like certain zipcode or certain hobby. Thus, we need a structural definition of those targeting rules. One confusing part is the AND / OR logic, and INCLUDE / EXCLUDE between each of the targeting rules.

    "demographics": [
            "occupation": ["teacher"]
    "geolocations": [
            "country": ["us"]

Although this structure is easy for human reading and interaction, it’s not quite so for our ad server to determine if there’s a targeting match for given targeting spec. A simple way to address this is to flatten this nested json blob into this so that we could loop through the list to find if there’s match during candidate filtering.

        "operation": "equals",
        "value": "us",
        "key": "country",
        "group": "geolocations"
        "operation": "equals",
        "value": "teacher",
        "key": "occupation",
        "group": "demographics"

Another way to address this is the Boolean Expression Matching algorithm. The need for fast boolean expression matching first arose in the design of high throughput low latency PubSub systems. Later, the researchers realized that e-commerce and digital advertising all face the same issue. Some standard algorithms are K-Index[6], Interval Matching[7], and BE-Tree[8].

With the targeting spec above, we can now understand who is the target audience of a given ad. However, we still need to associate it with the end user. To make this connection, a user profile is required when we match an ad with an ad opportunity. This user profile is compiled from all sort of historical user data, such as purchase history, and browsing history. (Yes, this is also why Facebook is so infamous on privacy, they profit on your data, but they won’t tell you). A targeting pipeline could infer user interests and demographics out of history. If you don’t have enough data to describe the end user to make such a connection, you could also integrate with some third-party data company to get more insights.

Ad Server

So far, I’ve introduced almost all the components needed to move an ad from the advertiser to an end user. Putting all these together, we can now build the final ad server. The ad server exposes some endpoints for our client (web or mobile), runs the auctions, calls ranking service, fetches user profiles, and consume the indices. To explains this in more details, let’s take a look at what happens when a new ad request comes in.

Request Anatomy

  1. When the client starts, it first talks to our ad server and let us know that there’s a new active user online now. The reason for doing this is because it gives us time to load user profile into some memory database or BigTable for a faster lookup later. In the meantime, the initialization response could also guide the client for further actions.

  2. Next, when the client realizes that we need to show an ad soon, it will send an ad request to our ad server and ask where to load the ad content. If the ad is just text, we can return the metadata together with the ad response. If the ad is a static file, a CDN location shall be returned. The load balancer will route the request to any healthy node/pod because they all contain the latest ad index.

  3. Given the context information carried in the request, and also the live ad index, we can filter out those ads that do not have targeting match first. Then, some other filters like frequency cap or budget cap can also be applied before sending the ad candidates into auctions.

  4. When the auctions finish, the ad server records the winner as served (or empty if no winner) in the metrics table. In the meantime, the ad server also send winner information back to the client side so that it can start to prepare the ad.

  5. Once the ad gets displayed, the client will send a tracking event back to our metrics collector.

Unlike other modules I discussed above, the ad server is on the very frontline of our ad business. A little problem within the ad server could mean an immediate failure of ad delivery, which is a loss of revenue essentially. Therefore, reliability is the most important attribute here. In the case of a non-core module failure, we need a fallback to some backup plan. For example, if the ranking system failed to respond, our ad server needs to have a default model to estimate the ad value. In the case of the total system failure, a quick rollback and re-deployment are critical. It’s usually not an easy job for ad server because of the considerable memory initialization. While we are optimizing the initialization code path, we can also adopt a canary deployment strategy or have some warm backup servers.

To serve the ad request in lightning speed, the trick is to load the index into the memory. However, there’s usually a limit on the memory size. The reasons for this limit are various. It could come from the cloud provider or the physical machines we have. It could be also because of the cost-efficiency requirement from the business side. If the index grows bigger than the biggest memory size we have for each machine, we need to start to split the index into groups. One option is to split index by region. An ad request from Europe only reaches the European cluster, which only hold the index for all European ads. This approach looks more natural at the beginning, but it also imposes a hard region limitation over the entire advertising workflow. Another alternative is to have a dispatcher to query multiple machines at the same time. Each machine in the same group has a different shard of the index.


Even when we successfully deliver the ad to our end users, it’s not the end of our journey. The last big piece of this puzzle is the metrics and stats service. Usually, there’re two types of workload on these stats service. On the one hand, business analysts will need to pull a large amount of historical data to find patterns, compiling campaign reports and generating invoices and bills. On the other hand, advertisers are highly interested in the real-time performance of their currently running ads. Its

These two different goals also require different infrastructure to ensure the highest efficiency. For business analytics, we usually store all the raw data into an OLAP database, or cloud analytics software like BigQuery. And then tooling and data engineering teams can build more pipelines starting from there. For real-time stats, we often store the data into some Time Series Database. We could either use existing solutions like OpenTSDB, InfluxDB or Graphite or build our TSDB query engine on top of some scalable databases like BigTable. Although different solutions have a different focus, the main concern here will be the granularity and retention period of the data. A small granularity and low latency is the key for such real-time stats.

Nevertheless, storing metrics in different places doesn’t mean that we need to build two different data pipelineS. A typical design would be a generic metrics collectors as a frontend to receive all metrics and logging from the client or other services. Then, a message queue system like Kafka or PubSub would stream all these events to some streaming data processing applications, such as Spark or Dataflow. The Spark application would transform and aggregate those raw events into our desired format, and then fan out to different data warehouse for persistence. In our situation, we are routing the final data to both TSDB and OLAP DB.

Bear in mind, in such kind of data pipeline, one bad design or discrepancy would lead to lots of serious issues or limitations to its downstream consumers. Although one cannot imagine all the future requirements during the initial design, we should still make sure the flexibility so that it can be easily extended in the future. For instance, some stats aggregation tasks would require data spanning over several days or weeks, or from different stages of the data transform output.


Congratulations! You are now ready to start to build your own advertising platform! Remember, the design I layout here is just the start of the journey. In a real-world scenario, it’s hard to ignore all the legacy and build a fresh new platform from scratch. Often, we need to make some tradeoff and add some redundancy due to existing tech stack, politics or business requirements. Moreover, building everything above will be a huge project that usually requires dozens of hundreds of engineers. It’s not only about the software itself, but also requires a strong infrastructure and tooling to ensure efficient deployment, scaling and resource planning.

That being said, you can still work on a part of the system first if you can’t hire hundreds or dozens of engineers soon. Some start-ups specialize in campaign management, which only needs an excellent web interface and APIs. Others could focus on ads ranking and use machine learning to maximize ROI.

If you like my overview of the ads system, feel free to share it with your friends and leave your comments. The ad tech is continually evolving, and I hope we can come up with more creative ideas through the discussion.

Disclaimer: This article is only about my personal opinion and has no association with any real companies or products.


  1. How Do First-Price and Second-Price Auctions Work in Online Advertising?
  2. Wikipedia: PID Controller
  3. Wikipedia: Generalized second-price auction
  4. Meet Michelangelo: Uber’s Machine Learning Platform
  5. Time Series Database (TSDB) Explained
  6. Indexing Boolean Expressions
  7. Efficiently Evaluating Complex Boolean Expressions
  8. BE-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space
  9. 大型广告系统架构概述
Loading Likes...

6 thoughts on “Advertising System Architecture: How to Build a Production Ad Platform

  1. Hi Ethan, thanks for writing this blog,
    I have several questions:
    1. Where is the bid price of each ad came from? Who provides the bid price? The advertiser set them in the campaign? So it’s always a static bid prize for the same ad?
    2. In the auction, the ranking is purely used to calculate the weight of EAV/EOV? e.g. to calculate the estimate Click rate of certain ads?
    3. The targeting and the ranking seems duplicated, why we need to have the two process? Why not when the advertiser sets the targeting audience, we just consider it as a feature in the ranking system, and it will reflect in the auction result?
    4. For the auction part, I’m still confused about why it needs to have several auction groups. Why not just make them in the same group, and give those lower priority ads with a lower score. This way can also benefit the latency.

    1. 1. yes, the advertisers could set the bid price, or you could design an auto bid system for them. however, the bid price is not necessary the paid price, it depends on the auction strategy you have.
      2. It’s up to your business, different businesses have different emphasis, but the common part is making the money.
      3. In targeting, you just follow your customer’s order to target specific group of people, and you need to build engineering capabilities to differentiate people by their profiles. In ranking, you are optimizing for yourself, it helps you to exploit most profit by optimizing the delivery. Of course, if you ranking is really good, it could also boost ROI for your customers (advertisers) which is good too.
      4. for a MVP, you could just use one group for auction. However, sometimes you have some specific business requirements which force you to have multiple groups. For example, only start auctioning for bids from third-party after first-party bids finish.

  2. Hi Ethan, you have an awesome right up.Pls is it Possible for you to build this for me? I was expecting to see where one can make further enquiries on you or your company building this.


  3. Good afternoon, I was just taking a look at your site and filled out your feedback form. The “contact us” page on your site sends you these messages via email which is why you are reading through my message right now right? That’s the holy grail with any kind of online ad, making people actually READ your ad and I did that just now with you! If you have something you would like to promote to tons of websites via their contact forms in the US or anywhere in the world send me a quick note now, I can even target your required niches and my costs are very affordable. Shoot me an email here:

Leave a Reply

Your email address will not be published.