Storing valid, sanitised HTML in database to mitigate malicious code injection

Storing valid, sanitised HTML in database to mitigate malicious code injection

In a typical cross-site scripting (XSS) attack, the hacker submit a HTML form which include malicious code. When another user visit the page in which this data is rendered, the malicious code is executed.

There are at least a couple of ways to mitigate this risk:

  1. When the hacker submit the HTML form, any malicious code is removed before storing the data in the database
  2. When retrieving data from the database, removing any malicious code before rendering it on the web page

In this blog post we’ll briefly discuss the first of these two, using the Django module

In one of my Django apps, I’m using TinyMCS to allow users to insert HTML data in a form. As the default HTML tags and attributes provided by TinyMCS are safe (i.e. not exploitable by hackers) HTML tags, I wanted to allow only these to be stored in my database.

After installing django-bleach, I updated my model to use its BleachField for storing safe HTML data:

from django_bleach.models import BleachField

class Entry(models.Model):
    my_html_data = BleachField()

This code snippet is sufficient to make sure that the data stored in the my_html_data database column, contain only the whitelisted HTML tags.

Next, to generate the list of allowed tags, I used Chrome DevTools to inspect the HTML generated by TinyMCS, and added these to my django-bleach whitelist. For reference, this is the settings I ended up with:

BLEACH_ALLOWED_TAGS = ['p', 'b', 'i', 'u', 'em', 'strong', 'a', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'span', 'ul', 'ol', 'li']
BLEACH_ALLOWED_ATTRIBUTES = ['href', 'title', 'style']
    'font-family', 'font-weight', 'text-decoration', 'font-variant', 'text-align', 'background-color']

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.