Building a Keyword Extractor from Scratch: Python Flask Tutorial

 Welcome to our step-by-step guide on implementing a keyword extractor using Python Flask. Keyword extraction is a vital process in natural language processing (NLP) that involves identifying and extracting important keywords or phrases from a given text. In this tutorial, we will walk you through the process of building a keyword extractor using Python and Flask, a powerful web framework. So, let's dive in!

Photo by Clément Hélardot on Unsplash



Prerequisites

Before we get started, make sure you have the following prerequisites installed:

  • Python (version 3 or above)


Development of Keyword Extractor


Step 1: Setting Up a Flask Project

First, let's create a new directory for our Flask project and set up a virtual environment. Open your terminal or command prompt and follow these steps:

  • Create a new directory: `mkdir keyword_extractor`
  • Navigate into the directory: `cd keyword_extractor`
  • Create a virtual environment: `python3 -m venv venv`
  • Activate the virtual environment:
    • On Windows: `venv\Scripts\activate`
    • On macOS/Linux: `source venv/bin/activate`


Step 2: Installing Dependencies

Once the virtual environment is activated, let's install the required dependencies. We need Flask and NLTK for our keyword extraction implementation. Run the following commands:

pip install flask
pip install nltk


Step 3: Initializing Flask and Creating Routes

In this step, we will initialize our Flask application and create the necessary routes for our keyword extraction functionality. Create a new file called `app.py` in the project directory and open it in your preferred code editor. Then, add the following code:
from flask import Flask, render_template, request 
from rake_nltk import Rake
import json

app = Flask(__name__)

@app.route('/')
def home():
    return render_template('index.html')

@app.route('/extract', methods=['POST'])
def extract_keywords():
    text = request.form['text']
    # Add your keyword extraction logic here
    # ...
    # Return the extracted keywords
    return {'keywords': response}

if __name__ == '__main__':
    app.run(debug=True)

Here, we import the necessary Flask modules and NLTK libraries. We define two routes: the home route ("/") for rendering the input form and the "/extract" route for processing the form submission.

Step 4: Creating HTML Template

To allow users to input text and display the extracted keywords, we need to create a HTML template in which we also handle server response using Javascript. Create a new folder called templates in the project directory. Inside the templates folder, create a HTML file: `index.html`

In index.html, add the following code:
    <!DOCTYPE html>
    <html>
      <head>
        <title>Keyword Extractor</title>
      </head>
    
      <body>
        <h1>Keyword Extractor</h1>
        <form id="keyword-form">
          <textarea id="text-input" rows="10" cols="50"></textarea>
          <button type="submit">Extract Keywords</button>
        </form>
    
        <div id="keywords"></div>
        </body>
      </html>
      


Step 4: Modifying extract_keyword function

In the `extract_keywords` function, we handle the POST request for the '/extract' route. This route is responsible for extracting keywords from the submitted text using the `RAKE` (Rapid Automatic Keyword Extraction) algorithm provided by the Rake class from the rake-nltk library.
    @app.route('/extract', methods=['POST'])
    def extract_keywords():
        text = request.form['text']
        rake_nltk_var = Rake()
        rake_nltk_var.extract_keywords_from_text(text)
        keyword_extracted = rake_nltk_var.get_ranked_phrases()
        response = json.dumps({'Keyword': keyword_extracted, 'keywordLen': len(
                    keyword_extracted)}, default=str)
        
        return {'keywords': response}
Here,
  • The input text is obtained from the form submission using `request.form['text']`.
  • An instance of the `Rake` class from rake-nltk is created.
  • The `extract_keywords_from_text` method of the Rake object is called, passing in the input text. This method processes the text and extracts keywords using the RAKE algorithm.
  • The `get_ranked_phrases` method is used to retrieve the extracted keywords, which are returned as a list.
  • The extracted keywords, along with the length of the keyword list, are stored in a dictionary and converted to a JSON-formatted string using `json.dumps`.
  • Finally, the dictionary with the JSON response is returned, which will be sent back as the response to the client.


Step 5: Handling response from server using JS

Add following code snippet in `body` tag of our index.html template
    <script>
       document.getElementById('keyword-form').addEventListener('submit', function (event) {
       event.preventDefault()
          
       var text = document.getElementById('text-input').value
          
       fetch('/extract', {
           method: 'POST',
           headers: {
           'Content-Type': 'application/x-www-form-urlencoded'
           },
           body: 'text=' + encodeURIComponent(text)
        })
        .then(function (response) {
             return response.json()
          })
         .then(function (data) {
            var GotRes = JSON.parse(data.keywords)
            var keywords = GotRes['Keyword']
            var keywordsElement = document.getElementById('keywords')
            keywordsElement.innerHTML = '<h2>Extracted Keywords:</h2><ul>'
          
            for (var i = 0; i < keywords.length; i++) {
               keywordsElement.innerHTML += '<li>' + keywords[i] + '</li>'
             }
          
             keywordsElement.innerHTML += '</ul>'
           })
          })
    </script>

Here,

  • The code adds an event listener to the form element with the ID 'keyword-form' and listens for the form submission event.
  • When the form is submitted, the event listener function is triggered. It begins by preventing the default form submission behavior using event.preventDefault().
  • The text input value is obtained from the input field with the ID 'text-input' using document.getElementById('text-input').value.
  • The fetch function is used to make a POST request to the '/extract' endpoint of the server. It sends the text data in the request body using the 'application/x-www-form-urlencoded' content type.
  • Upon receiving the response, the code chains a series of promises using .then to handle the response asynchronously.
  • In the first `.then` block, the response is converted to JSON format using the `response.json()` method.
  • In the second `.then` block, the JSON data is accessed. The extracted keywords are obtained from the 'Keyword' property of the parsed JSON response.
  • The extracted keywords are then dynamically added to the HTML page. The code retrieves the element with the ID 'keywords' and sets its inner HTML to include an 'Extracted Keywords' heading and a list of the extracted keywords.

Congratulations! You have successfully implemented a keyword extractor using Python Flask. Throughout this tutorial, we walked through the step-by-step process of setting up a Flask project, installing dependencies, creating routes, implementing the keyword extraction logic, and testing our application (github). Now, you can further enhance this keyword extractor or integrate it into your own projects You can see a screenshot of the updated design below and test it using this link .

Keyword-extractor

That's it for our tutorial. We hope you found it helpful. If you have any questions or feedback, please let me know. Thank you for reading!

Post a Comment

0 Comments