The last article shared the product thinking and design of content management system CMS in the content production link. Today, let's talk about the thinking and summary of content filtering in CMS.
Content filtering
Abide by national laws and regulations, protect the platform from the impact of illegal content, and well avoid platform operational risks; Maintain the health of the content community, protect users from junk content, and maximize user consumption experience.
All these need to be based on effective filtering of content. The workload of filtering is huge, especially when the user volume and the atmosphere of the content community are done, it is unimaginable. It is infeasible, inaccurate and extremely inefficient to do it manually. Machine and system detection must be used. Machine learning and system building require time cycles and are also very difficult to achieve.
A content community may have several different types of content at the same time, such as text, pictures, videos, audio, etc. The technical means used for filtering vary according to the type of content.
If you are in an entrepreneurial team and are working on a pilot product, it is impossible and unwise to build this content filtering system synchronously in the process of building the CMS system.
Now in the market, there are many mature saas platforms for various types of content, providing high-quality content filtering services, and it is also very convenient to deploy and dock.
There are some large platforms. Because of the maturity of business, the security of data, and the strength of resources and technology, they have developed their own content filtering systems, and now many of them are also commercially available, such as Tencent, Alibaba, Baidu, and NetEase.
1.1 Reference point for selecting saas services
Because of her work and study, Sue has investigated some content filtering saas platforms. She has her own immature summary on how to select them, and can share it with you:
Choose according to the main content types. Large platforms are not always good; Compare and analyze the billing methods of services based on the product stage and user level; Fully consider the cost performance ratio when meeting the demand; Don't hang yourself on a tree. Different stages and magnitudes will not only switch between different packages, but also consider changing to another company for cooperation (which may be more cost-effective).
1.2 Connection of saas services
If you access a third-party service and use the third-party technical solution to complete content filtering, you only need to process the content differently according to the filtering results returned by the third-party.
Generally, the third-party filtering system will return the following information:
1) Judgment basis
Illegal text paragraphs, pictures, audio and video clips; This is the proof of the accuracy of the manual quality inspection system and the feedback of the content publisher on the illegal content.
2) Risk description
Description of the types of violations, such as:
Text: advertising text, pornographic text, violent terrorist text, political text, abusive text, irrigation text, etc; Pictures: pornographic pictures, political pictures, violent and terrorist pictures, illegal pictures, advertising pictures, etc; Audio (live/on-demand): yellow related voice, illegal voice, promotion voice, etc; Video (live/on-demand): pornographic video, political video, violent video, illegal video, advertising video, etc.
3) Filter Results
The judgment results of content filtering and the level identification of content violations are generally divided into three categories: security, suspicion and danger.
What we need to do is to process the content according to the filtering results, that is, to define whether the content publishing is effective, whether the content display status is front-end display or masked display, and so on.
The published content belongs to the information wealth of the producer on the platform. We need to give full respect. Once we want to delete/block the content published, we need to notify the producer responsibly, clearly inform the reason, provide the way of communication with the other party, and even provide a complaint channel.
To protect the enthusiasm of content producers (especially users), we need to have high requirements for the accuracy of the filtering system. However, this content filtering saas service is aimed at users from all walks of life and various products, which may not be highly targeted or the standard is too strict. Therefore, it is necessary to invest some energy to help the system run in with our products for a period of time after the completion of access.
It can be handled in two stages:
In the running in stage:
Sue's proposal is: according to the filtering results fed back by the filtering system, which are "safe, dangerous, and suspicious", respectively check the content;
The status is adjusted as follows: safety - display content, danger - shielding content, while "suspicious" content is in the running in stage.
There are two processing methods:
Judge suspicious - block content (notify user) - manual inspection - confirm too strict - restore content to judge suspicious - manual inspection - confirm too loose - block content (notify user)
Sue thinks that if the second method should be selected, the review of the second confirmation should be completed manually, and the shielding content should not be processed directly according to the filtering results. This approach will neither deliver the problem that the filtering content of the product is not rigorous, nor cause annoyance to users due to inaccurate judgment, but also speed up the running in of the system and the product. The only disadvantage is that it takes some manpower, but Sue thinks it is worth it in this time period.
In one of Sue's previous content community projects, there was a saas service that accessed content filtering. The main content type of filtering was text. In our content community, the topic that users discuss can be described as the world, ancient and modern.
From Sue's screenshots in the background, it can be seen that the user's discussion of "movies and passwords" is judged as "dangerous". If we define the processing method as "dangerous - blocking content", the user will be deeply "hurt". It feels that speech is not free and talking about movies is restricted. Similarly, our users chat in the literature section Lu Xun or Zhou Shuren were also restricted.
We can't let our precious users become the victims of the accuracy of our debugging and filtering system. If we really want to take users as "Duplicants", please steal them and don't scare them away (haha).
The default treatment of "dangerous content" in the run in stage as shielding is based on the early full investigation of the saas service to be accessed, and also to enable the limited manpower to focus more and better on "suspicious content", but it does not mean that we can be completely relieved (hahaha, it is just the example of "movie")
Therefore, in the running in stage, the filtering standard can be relatively strict. Manually review "suspicious content"; Conduct manual quality inspection for "dangerous content". During this period, it is necessary to maintain communication with relevant personnel of the saas platform, and adjust the content filtering standards suitable for its own products.
After running in:
Generally, the saas platform will have a set of evaluation and analysis criteria for the accuracy of their filtering system. We can also define a basic evaluation standard by referring to the results of evaluation and analysis. For example, in the running in stage, the proportion of machine judgment "suspicious" and manual review "dangerous" has been optimized to 60% (the value is hypothetical). The quality inspection accuracy of machine judgment "dangerous" also meets our expectations, which can be regarded as a smooth transition to the running in stage.
After that, we can adjust the processing scheme of "suspicious content" to be the same as that of "dangerous content", which is also the default shielding, further releasing the manpower invested, and then maintaining the manual quality inspection of "suspicious content" and "dangerous content".
1.3 Intermediate state easy to ignore
One thing to consider here is that the response of third-party services is millisecond level. But after it is really put into use, you will find that in addition to the results and status of feedback content filtering, there will also be a "to be filtered"/"to be processed" status.
This is an intermediate state, and it is often possible to ignore or ignore the processing of content in the intermediate state, which will lead to unclear and bad experience of front-end users at any time.
Sue summarized that the front end and back end can have three processing methods for this situation:
1) False success
When the filtering system does not return a clear filtering result (clear security/clear danger), it will create a false impression of successful release in order not to affect the user's experience and protect the user's creativity.
The phenomenon of this illusion is that after the user successfully submits the release on the front end, he or she will enter the content square page (such as the circle of friends) by default, and at the same time, he or she can see the content he or she just released at the first item of the list. However, at this time, the content may be in the status of pending review or suspicious secondary confirmation review. Other people cannot see the content for the time being, while the publisher is unaware. He thinks that other people can see it in the square (circle of friends) just like him.
In this way, the impact of the filtering system on publishers is minimized. If the content is not judged as a problem by the filtering system, the platform will notify the publisher when shielding the content. They may not know that the original content will be filtered and approved. Some platforms do not notify publishers when they "harmoniously" drop content, which is more difficult to perceive.
The product of this scheme: WeChat.
2) Waiting for results
This method is similar to the page process of the first method. It also takes the user to the content square page after the user successfully submits, but it will also prompt the progress of the release on the page (in the form of progress bar "soul") or prompt "in review" below the content (in the form of text prompt "probing"). The same thing is that the contents of the intermediate state are not displayed to others temporarily; The difference is whether the publisher is aware of the existence of the filtering system.
In this way, publishers will clearly know that the content needs to be reviewed after submission, and will consciously standardize their own opinions as much as possible when editing.
The product of this scheme: exploration.
3) Post filter
The third way is that the content in the intermediate state is equal to the temporarily safe content. By default, it is displayed directly to everyone first. Content filtering is post set. When the filtering results are available, it is handled as it should be.
This way ensures the experience of publishers to the greatest extent, but it may have some impact on the user experience of consumer content, and may also bring some risks to the platform operation.
After experiencing the content publishing process of many products, it seems that most products adopt this method. (It may really be that the response speed of content filtering is fast enough)
The product of this scheme: Oasis.
1.4 Black and white list management
The third-party filtering system generally supports the management of the following lists/databases:
User list IP list Device list URL list Contact information library
The main significance is to reduce the accidental killing of specific objects (users, IP, devices) or special content (URLs, contact information).
For example:
The account operated by content may publish a large amount of content in a short time. If there is no list management, it may be judged as the watering behavior of a large number of posts in a short time (similar to the case of devices/IPs). The published content may be attached with the URL of operation promotion, or the contact information left by users to contact customer service and staff. If there is no such list management, it may be judged as advertising content and advertising users.
In addition, we can establish our own blacklist management outside of the third-party filtering system to mark problematic users, IPs, devices, URLs and contact information.
If the content published by the user is detected to be a mark in the blacklist, it can be processed directly without being pushed to the third-party filtering system for judgment.
The main significance is to reduce the repeated filtering of specific objects (users, IP, devices) and content (URLs, contact information) and reduce unnecessary filtering costs.
Above is Sue's summary and sharing of CMS in content filtering.
Preview of the next update: build a content management system CMS (3) - template content presentation
Share some personal small thoughts and ideas, so as to maintain the habit of input transformation and summary output. If there is anything immature or incorrect, we hope that some young partners can give us advice and advice, welcome discussion, and make progress together.
This article was originally published by @ Su Xiaobai. Everyone is a product manager. Reproduction without permission is prohibited
The picture is from Unsplash, based on CC0 protocol