Robots.txt is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.
If you want search engines to ignore any pages on your website, you mention it in your robots.txt file.
User-agent: [user-agent name] Disallow: [URL string not to be crawled]
User-agent: Mediapartners-Google Disallow: User-agent: TruliaBot Disallow: / User-agent: * Disallow: /search.html User-agent: * Disallow: /comments/* User-agent: Mediapartners-Google* Disallow:
Create the directory 'templates' in the root location of your project.
Create another directory with the same name as your project inside the 'templates' directory.
Place a text file robots.txt in it.
Your project structure should look something like this.
myproject | |--maypp |--myproject | |--settings.py | |--urls.py | |--wsgi.py |--templates | |--myproject | | |--robots.txt
Add user-agent and disallow URL in it.
User-agent: * Disallow: /admin/ Disallow: /accounts/
from django.views.generic import TemplateView
Add below URL pattern.
urlpatterns += [ url(r'^robots\.txt$', TemplateView.as_view(template_name="myproject/robots.txt", content_type='text/plain')), ]
Now restart the server and go to localhost:8000/robots.txt
in your browser and you will be able to see the robots.txt file.
Below is the sample configuration for apache.
<Location "/robots.txt"> SetHandler None Require all granted </Location> Alias /robots.txt /var/www/html/project/robots.txt