What is a short URL system?
In the Internet era, we need to access a specified resource through a URL, which may be an HTML page or an image, video, etc. After the domain name is accessed, it will be finally resolved to a specific IP and port through DNS.
However, because of the deep level of resources, the URL we may want to access is:
http : //www.zenworld.com/2022/01/01/page? id=12345
This URL is very long. If there is a short URL system at this time, we can convert it into:
http : //www.short.com/j8ve55y
See, the URL is much shorter. What are the benefits of shortening URLs
It is convenient for users to remember SMS links and other scenarios, shorten the link address to avoid occupying the length of SMS, simplify QR code links, and make mobile phones scan QR codes more accurately (less information)
Therefore, the short chain is also generally open to the outside world as a basic platform capability.
Core ideas
hash function
First, we need a hash function to map a specific URL to a unique alias. There are many ways to implement this hash function, such as the common CRC32.
F (URL)=new alias
storage
We can use relational databases, such as MySQL, to store association relationships, such as:
ID, original URL
j8ve55y,http: //www.zenworld.com/2022/01/01/page? id=12345
redirect
After the server gets the new address after processing, it returns the HTTP request, writes the status code into 302 (redirect), and writes the new URL address in the Location field of the Header.
The overall request implementation is shown in the figure:
Optimization measures
The above scheme can be used directly, and can basically achieve good performance in most scenarios after going online. However, when the amount of requests and storage continues to rise, some problems may arise that are difficult to control.
Request magnitude
Because the short chain system is the basic business system as the first link of the request, the traffic may be very high. If a large number of requests are all sent to the relational database, there may be availability problems.
The optimization measures are:
The relational database (assumed to be MySQL) enables master-slave synchronization, and the business reads from the slave database first, thus sharing a certain pressure to join the cache layer. For example, Redis can read the cache first, but cannot read the database again. If the magnitude is not large, you can directly use Redis to store+cache, which can significantly reduce the delay and achieve simple implementation
Storage magnitude
When the amount of data is large, the cost of inserting the Uuid as the MySQL primary key will become higher and higher. Because the IDs are not continuous, you need to query the page of the latest ID when inserting, and there may be a risk of page splitting. If self increasing ID is used, it can be directly inserted to the last page. Therefore, in order to optimize the insertion performance, we can use the self increasing id as the id generation algorithm.
However, the cost of performance improvement is at the expense of flexibility. For example, the business may support users to customize short chain addresses, so the Uuid mode PK ID can be directly stored as the user configured ID. Therefore, we should weigh the demands of the business and the storage magnitude as a whole whether we need to use self increasing IDs in the end.
summary
The implementation of the short connection system is not complicated. Understand the simple HTTP 302 redirection mechanism and store the URL Hash mapping results. Of course, many other business transformations may also be introduced into the business demands, such as allowing users to define addresses, timed expiration, and so on.
Finally, regarding the storage selection, I would like to suggest Redis or MySQL, which can also be used in combination. Of course, the specific use of storage depends on the positioning and magnitude of the business. We should not be divorced from the actual business in system design.
I hope it will be helpful to your work!