preface
The system architecture of a mature large website (such as Taobao, JD, etc, Even technical personnel have developed from several people to a department or even a product line. Therefore, the mature system architecture is improved with the business expansion, and is not achieved overnight; Systems with different business characteristics will have their own priorities. For example, Taobao needs to solve the search, order and payment of massive commodity information. For example, Tencent needs to solve the real-time message transmission of hundreds of millions of users. Baidu needs to deal with massive search requests. They all have their own business characteristics and different system architectures. However, we can also find out the common technologies from the background of these different websites. These technologies and means can be widely used in the architecture of large-scale website systems. Here we will introduce the evolution process of large-scale website systems to understand these technologies and means.
1、 Initial site architecture
The initial architecture, applications, databases, and files are all deployed on one server, as shown in the figure:
2、 Application, data and file separation
With the expansion of business, a server can no longer meet the performance requirements, so the application, database, and file are deployed on separate servers, and different hardware is configured according to the purpose of the server to achieve the best performance.
3、 Using cache to improve website performance
While hardware optimizes performance, software also optimizes performance. In most website systems, caching technology is used to improve system performance. Caching is mainly due to the existence of hot data. Most website visits follow the 28 principle (that is, 80% of access requests end up on 20% of data), so we can cache hot data, Reduce the access path of these data and improve the user experience.
The common ways of cache implementation are local cache and distributed cache. Of course, there are also CDN, reverse proxy, etc., which will be discussed later. Local cache, as its name implies, caches data locally in the application server, which can be stored in memory or files. OSCache is a commonly used local cache component. Local cache is characterized by high speed, but the amount of cached data is limited due to limited local space. The feature of distributed cache is that it can cache massive data and is very easy to expand. It is often used in portal websites and is not as fast as local cache. The common distributed cache is Memcached and Redis.
4、 Using Clusters to Improve Application Server Performance
As the portal of the website, the application server will bear a large number of requests. We often share the number of requests through the application server cluster. The load balancing server is deployed in front of the application server to schedule user requests and distribute the requests to multiple application server nodes according to the distribution strategy.
Common load balancing technologies include F5 hardware, which is expensive, and LVS, Nginx, and HAProxy software. LVS is a four layer load balancer. Internal servers are selected according to the target address and port. Nginx is a seven layer load balancer and HAProxy supports four layer and seven layer load balancers. Internal servers can be selected according to the message content. Therefore, LVS distribution paths are better than Nginx and HAProxy, with higher performance. Nginx and HAProxy are more configurable, For example, it can be used for static and dynamic separation (select static resource server or application server according to the characteristics of the request message).
5、 Database read/write separation and database/table separation
With the increase of the number of users, the database becomes the biggest bottleneck. The common means to improve the performance of the database is to separate the read and write and divide the tables. As the name implies, the read and write separation is to divide the database into read databases and write databases, and achieve data synchronization through the primary and standby functions. The sub database and sub table can be divided into horizontal segmentation and vertical segmentation. The horizontal switch is to split a database table with large size, such as the user table. Vertical segmentation is to switch according to different businesses. For example, tables related to user business and commodity business are placed in different databases.
6、 Use CDN and reverse proxy to improve website performance
If our servers are all deployed in the computer room in Chengdu, the access is faster for users in Sichuan, but slower for users in Beijing. This is because Sichuan and Beijing belong to different developed regions of China Telecom and China Unicom, respectively. Beijing users need to go through a longer path through the Internet router to access the server in Chengdu, and the return path is the same, Therefore, the data transmission time is relatively long. In this case, CDN is often used to solve the problem. CDN caches the data content to the operator's computer room, and users first obtain data from the nearest operator when accessing, which greatly reduces the path of network access. More professional CDN operators include Lanxun and Netcom.
The reverse proxy is deployed in the computer room of the website. When the user's request reaches, the reverse proxy server first accesses the reverse proxy server, which returns the cached data to the user. If there is no cached data, the reverse proxy server will continue to use the application server to obtain data, which also reduces the cost of obtaining data. Reverse proxy includes Squid and Nginx.
7、 Using the Distributed File System
Users are increasing day by day, business volume is increasing, and more and more files are generated. A single file server can no longer meet the demand. Distributed file system support is required. The commonly used distributed file system is NFS.
8、 Using NoSql and Search Engines
For massive data queries, we can use the nosql database and search engine to achieve better performance. Not all data should be placed in relational data. Common NOSQL includes mongodb and redis, and the search engine includes lucene.
9、 Split the application server business
With the further expansion of business, the application becomes very bloated. At this time, we need to split the application business, for example, Baidu is divided into news, web pages, pictures and other businesses. Each business application is responsible for relatively independent business operations. The business can communicate with each other through messages or share the database.
10、 Build distributed services
At this time, we found that each business application will use some basic business services, such as user services, order services, payment services, and security services. These services are the basic elements supporting each business application. We extract these services to build distributed services by using a distributed service framework. Taobao's Dubbo is a good choice.
Summary
The architecture of large websites is constantly improved according to business needs, and specific designs and considerations will be made according to different business characteristics. This article only describes some technologies and means involved in a regular large website.