Challenges and solutions
There are sources, each source has own feeds, for each feed and source, the status of the feed and the source as a whole is displayed with the status display.
Feeds are divided into topics for the purpose of submitting news filtering.
24 sources of 86 feeds (RSS feeds).
The log-in news is downloaded in full (Title, Short description, Full text), cleaned of debris (stop words, punctuation marks, decreased to normal form). A statistical evaluation and selection of simple (from one word) and complex trends (and several) are carried out.
Making a record of similarity between news, with the formation of a coefficient of similarity based on Key words, Abbreviations, Trends, Complex Trends. Trend Abbreviations.
Clustering. Dynamic clustering with the ability to view a more general structure. (Modified for the needs of this service Single Link clusterizer).
The system settings include the weight of various coefficients.