Adding Robots.txt file to Django Application

tips 1 25817

Robots.txt is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.

Why robots.txt is important:

Before a search engine crawls your site, it will look at your robots.txt file as instructions on where they are allowed to crawl/visit and index on the search engine results.

If you want search engines to ignore any pages on your website, you mention it in your robots.txt file.

Basic Format:

User-agent: [user-agent name]
Disallow: [URL string not to be crawled]

Example:

      User-agent: Mediapartners-Google
      Disallow:

      User-agent: TruliaBot
      Disallow: /

      User-agent: *
      Disallow: /search.html

      User-agent: *
      Disallow: /comments/*

      User-agent: Mediapartners-Google*
      Disallow:

Steps to add robots.txt in Your Django Project:

Let's say your project's name is myproject.

Create the directory 'templates' in the root location of your project.

Create another directory with the same name as your project inside the 'templates' directory.

Place a text file robots.txt in it.

Your project structure should look something like this.

myproject
 |
 |--maypp
 |--myproject
 |    |--settings.py
 |    |--urls.py
 |    |--wsgi.py
 |--templates
 |    |--myproject
 |    |   |--robots.txt

Add user-agent and disallow URL in it.

User-agent: *
Disallow: /admin/
Disallow: /accounts/

Now go to your project's urls.py file and add below import statement

from django.views.generic import TemplateView

Add below URL pattern.

urlpatterns += [
    url(r'^robots\.txt$', TemplateView.as_view(template_name="myproject/robots.txt", content_type='text/plain')),
]

Now restart the server and go to localhost:8000/robots.txt in your browser and you will be able to see the robots.txt file.

Serving robots.txt from web server:

You can serve robots.txt directly from your web server.

Below is the sample configuration for apache.

<Location "/robots.txt">
 SetHandler None
 Require all granted
</Location>
Alias /robots.txt /var/www/html/project/robots.txt

Quick Tips:

robots.txt is case sensitive. The file must be named robots.txt, not Robots.txt or robots.TXT.
robots.txt file must be placed in a website’s top-level directory.
Make sure you’re not blocking any content or sections of your website you want to be crawled as this will not be good for SEO.

Host your Django App for Free.

tips 1 25817

1 comment on 'Adding Robots.Txt File To Django Application'

David Panofsky Aug. 15, 2020, 3:41 p.m.

Anything added to robots.txt file is an advertisement to script kiddies and malicious bots of potentially valuable endpoints. Not saying people should avoid robots.txt, just that they should be aware.

atexit with example

There is an option in python where you can execute a function when the interpreter terminates. Here we will see how to use atexit module....

Read Full Article

Improve Your Python Practices: Debugging, Testing, and Maintenance

improving your python skills, debugging, testing and practice, pypi...

Read Full Article

How to start with Python Programming - A beginner's guide

starting with python programming, how to start learning python programming, novice to expert in python, beginner to advance level in python programming, where to learn python programming, become expert in python programming...

Read Full Article

Server Access Logging in Django using middleware

Creating access logs in Django application, Logging using middleware in Django app, Creating custom middleware in Django, Server access logging in Django, Server Access Logging in Django using middleware...

Read Full Article