How We Use NLP To Create Article Summaries For Our Newsletter

Using a mix of both Man and Machine, we are working on creating an amazing summary experience.

The process starts with the curation of news & articles. This is done with a mix of our personal preferences and the analysis of our preferred websites with article activity on Twitter, Reddit, etc. to find articles that meet our criteria.

This first part is not an exact science, our goal is merely to find correlations between the types of articles we pick and user feedback so that we can pick better articles in the future.

After the article has been chosen, it goes through a number of processes to provide a quality summary using a number of services.

We first analyze the article, then scrape it into a clean text-only format. This works most of the time, but every website has its own way of structuring articles and we have many different methods to grab it.

If all else fails, I’ll personally copy, edit & paste or retype the article!

We then take that text and perform a sentiment analysis. Personally, we like articles that are relatively neutral on most topics. This not only helps our heuristic analysis of the text but we also believe that neutral reporting is typically better to share. So the goal is to make sure that the text meets that benchmark, if not we find another article.

Nevertheless, our team and audience preferences have a significant influence on this process.

The Robot Overlords aren’t taking over just yet!

Summary Creation Process:

The summary generation process is interesting and ever-changing.

Using a number of services we are able to take our text, analyze the importance of every sentence and its order of importance, then it displays a summary (or tl;dr) that it believes is adequate.

As a proper subset of Artificial Intelligence some implementations are better than others, but using a mix of services increases its accuracy.

This is primarily a complex analysis of simple heuristics to rank importance, such as:

  • – What are the keywords?
  • – How long is the sentence?
  • – Is there a name?
  • – Where does this sentence appear in the text?
  • – etc

Here’s a list of some of the algorithms that we have worked with to create our summaries:

  • – Luhn’s Heuristic Method – Source
  • – Edmunson’s Method in Automatic Extracting – Source
  • – Latent Semantic Analysis – Source
  • – LexRank – Source
  • – TextRank – Source
  • – Sum Basic – Source
  • – KL Divergence – Source

There are some issues that we have seen that can affect the efficiency of the summary, such as:

  • – The article was poorly written
  • – We didn’t properly filter the text into a readable format
  • – The bot is unable to accurately find more important sentences
  • – And much more

By relying on the amazing work of great reporters, news sources and constantly iterating our processes, we haven’t had any serious problems.

Nevertheless, we have humans that oversee everything and are able to alter the summaries to make it better.

If you are a fan of Tech & Finance News! I’d love for you to sign up!

And if you’d like to help us by referring your friends. Here’s our rewards for you.

Leave a Reply