Large websites have evolved step by step. Architects need to understand the evolution process of the overall architecture @ mikechen
Features of large website system
1. High concurrency and high traffic
It needs to face high concurrent users and high traffic access.
2. High availability
The system has 7 x 24 hours non-stop service.
3. Massive data
Massive data needs to be stored and managed, and a large number of servers need to be used.
4. Wide distribution of users and complex network conditions
Many large Internet websites provide services for users all over the world. Users are widely distributed, and the network conditions vary widely. In China, there is also the problem of network interoperability among operators.
5. Bad safety environment
Because of the openness of the Internet, Internet sites are more vulnerable to attacks. Large websites are attacked by hackers almost every day.
6. Requirements change rapidly and are released frequently
Different from the release frequency of traditional software versions, Internet products release very frequently in order to quickly adapt to the market and meet user needs. Generally, new versions of products on large websites are released every week, and small and medium-sized websites are released more frequently, sometimes dozens of times a day.
7. Progressive development
Almost all large Internet websites have developed gradually from a small website. Good Internet products are operated slowly, not developed at the beginning, which also corresponds to the evolution process of website architecture.
1. Website architecture at the initial stage
Large websites are all developed from small websites, so is website architecture, which evolves from small website architecture.
At the beginning, small websites didn't have many visitors, and only one server was enough. At this time, the website architecture is shown in the following figure:
All resources such as applications, databases, and files are on the same server.
2. Separation of application services and data services
With the development of website business, one server can't meet the demand gradually: more and more users visit, which leads to worse performance, and more and more data leads to insufficient storage space.
At this time, it is necessary to separate the application and data. After the separation of application and data, the entire website uses three servers: application server, file server and database server. These three servers have different requirements for hardware resources:
The application server needs to process a lot of business logic, so it needs a faster and more powerful CPU;
The database server needs fast disk retrieval and data cache, so it needs faster disks and larger memory;
The file server needs to store a large number of files uploaded by users, so it needs a larger hard disk.
At this time, the architecture of the website system is shown in the following figure:
After the separation of applications and data, servers with different characteristics assume different service roles. The concurrent processing capacity and data storage space of the website have been greatly improved, supporting the further development of the website business.
However, with the gradual increase of users, the website is facing another challenge: too much database pressure leads to access delay, which affects the performance of the entire website, and the user experience is affected. At this time, the website architecture needs to be further optimized.
3. Use cache to improve website performance
The characteristics of website access follow the same law as the wealth distribution in the real world: 80% of business access is concentrated on 20% of data.
Since most business access is concentrated on a small amount of data, caching this small amount of data in memory can reduce the database access pressure, improve the data access speed of the entire website, and improve the database write performance.
The caches used by websites can be divided into two types: local caches on application servers and remote caches on dedicated distributed cache servers.
The access speed of the local cache is faster, but due to the memory limit of the application server, the amount of cached data is limited, and memory contention with the application may occur.
The remote distributed cache can use the cluster mode. The server with large memory can be deployed as a special cache server. In theory, the cache service is not limited by the memory capacity.
After using cache, the pressure of data access is effectively relieved, but the connection of requests that a single application server can handle is limited. At the peak of website access, the application server becomes the bottleneck of the whole website.
4. Use application server cluster to improve the concurrent processing capability of websites
Clustering is a common means for websites to solve the problem of high concurrency and massive data.
When the processing capacity and storage space of a server are insufficient, do not try to replace a more powerful server. For large websites, no matter how powerful a server is, it cannot meet the growing business needs of the website.
In this case, it is more appropriate to add a server to share the access and storage pressure of the original server.
As far as the website architecture is concerned, as long as the load pressure can be improved by adding a server, the system performance can be continuously improved by adding servers in the same way, so as to achieve the scalability of the system 。
The application server implementation cluster is a relatively simple and mature kind of website scalable architecture design, as shown in the following figure:
Through the load balancing scheduling server, access requests from user browsers can be distributed to any server in the application server cluster. If there are more users, more application servers will be added to the cluster, so that the pressure of the application server will no longer be the bottleneck of the entire website.
5. Database read/write separation
After the website uses cache, most data read operations can be accessed without passing through the database. However, some read operations (cache access misses, cache expires) and all write operations need to access the database. After the website users reach a certain scale, the database becomes the bottleneck of the website due to the high load pressure.
At present, most mainstream databases provide the master-slave hot standby function. By configuring the master-slave relationship of two databases, you can synchronize data updates from one database server to the other.
The website uses this function of the database to realize the separation of reading and writing of the database, thus improving the database load pressure, as shown in the following figure:
When the application server is writing data, it accesses the master database. The master database synchronizes data updates to the slave database through the master slave replication mechanism, so that when the application server reads data, it can obtain data from the database.
In order to facilitate the application program to access the database after read/write separation, a special data access module is usually used on the application server side to make the database read/write separation transparent to the application.
6. Use reverse proxy and CDN to accelerate website response
With the continuous development of website business, the number of users is increasing. Due to the complex network environment in China, users in different regions have great speed differences when visiting websites. Research shows that website access delay is positively related to user churn rate. The slower the website access is, the easier it is for users to lose patience and leave.
In order to provide a better user experience and retain users, the website needs to speed up website access, mainly by using CDN and direction proxy, as shown in the following figure:
The basic principle of CDN and reverse proxy is caching.
CDN is deployed in the computer room of the network provider, so that users can obtain data from the nearest network business room when requesting website services
The reverse proxy is deployed in the central computer room of the website. When the user requests to reach the central computer room, the server to be accessed first is the reverse proxy server. If the reverse proxy server caches the resources requested by the user, it will be directly returned to the user
The purpose of using CDN and reverse proxy is to return data to users as soon as possible. On the one hand, it can speed up user access, and on the other hand, it can reduce the load pressure on back-end servers.
7. Use distributed file system and distributed database system
Any powerful single server cannot meet the continuous growth business needs of large websites. After the database is read and written separately, it is split from one server to two servers. However, with the development of website business, it still cannot meet the demand, so a distributed database is needed.
The same applies to file systems. Distributed file systems are required, as shown in the following figure:
Distributed database is the last way to split the website database. It is only used when the scale of single table data is very large. When necessary, the more commonly used database splitting method for websites is business sub databases, which deploy data from different businesses on different physical servers.
8. Use NoSQL and search engine
With the website business becoming more and more complex, the demand for data storage and retrieval is also becoming more and more complex. The website needs to adopt some non relational database technologies such as NoSQL and non database query technologies such as search engines.
As shown in the figure below:
NoSQL and search engines are both technical means derived from the Internet, and have better support for scalable distributed features. The application server accesses all kinds of data through a unified data access module, reducing the trouble of the application program managing many data sources.
9. Business split
In order to cope with increasingly complex business scenarios, large websites divide the entire website business into different product lines by using divide and conquer.
For example, large shopping and transaction websites will divide the home page, shops, orders, buyers, sellers, etc. into different product lines and assign them to different business teams.
Technically, a website will be divided into many different applications according to the product line, and each application will be deployed independently. The relationship between applications can be established through a hyperlink (each navigation link on the home page points to a different application address), or data can be distributed through message queues. Of course, the most important thing is to access the same data storage system to form an associated complete system.
10. Distributed Services
As business splitting becomes smaller and smaller, storage systems become larger and larger, the overall complexity of application systems increases exponentially, and deployment and maintenance become more and more difficult.
Because all applications need to connect to all database systems, the number of these connections is the square of the server size in a website with tens of thousands of servers, resulting in insufficient database connection resources and denial of service.
Since each application system needs to perform many identical business operations, such as user management and commodity management, these shared businesses can be extracted and deployed independently. These reusable services connect to the database and provide common business services. The application system only needs to manage the user interface and complete specific business operations by calling common business services through distributed services.
As shown in the figure below:
The architecture of large websites has evolved to this point, and basically most of the technical problems have been solved, such as real-time data synchronization across data centers and problems related to specific website businesses can also be solved by combining and improving the existing technical architecture.
More large-scale website architecture design series