Oct
30
2011

Cleaning BlogEngine.NET spam

Martinique 2011

I just migrated my blog to the latest version of BlogEngine.NET 2.5.0.6.

I had a shock when I saw the number of spam that I had on the blog!

447883 Spam! Wow. So I started the cleaning by using BlogEngine tools but it was damn slow, and no way to stop it when you started the delete all.

So I stopped the web site which was a bad idea because then one XML file was damaged. As I always do a backup before doing something like that I was on the safe side, and just reverted the files.

Then I used 7zip to zip the posts folder which is located in the App_Data which was 338 MB, again wow.

Downloaded the zip file on my local machine, installed BlogEngine and imported the post.

I thought it would be faster on my machine because it is a recent one. But still to slow to treat 447833 spam messages.

So as a developer I went on and wrote a little application to do it. And after cleanup the spam which took less than 10 seconds I went to this folder size of the posts

Quite a difference ! And BlogEngine showing me the results

And here is the code, it is using .NET Framework 4 and the parallelization of queries to treat files:

#region using

using System;
using System.IO;
using System.Linq;
using System.Xml;
using System.Xml.Linq;

#endregion

namespace BlogEngineSpamDelete
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            var files = Directory.GetFiles(@"C:\Temp\blogengine\posts", "*.xml");
            foreach (var file in files.AsParallel())
            {
                FixPost(file);
            }
        }

        private static void FixPost(string file)
        {
            XDocument doc;
            using (var stream = File.OpenRead(file))
            {
                doc = XDocument.Load(stream);
            }

            var comments = from comment in doc.Descendants(XName.Get("comment", String.Empty))
                           select comment;

            var spamComments = from comment in comments.ToArray()
                               let data = new CommentState(comment.Attribute("spam").Value,
                                                           comment.Attribute("approved").Value,
                                                           comment.Attribute("deleted").Value) 
                               where ShouldDeleteSpamAndUnApproved(data)
                               select comment;

            foreach (var spamComment in spamComments)
            {
                spamComment.Remove();
            }

            using (var writer = XmlWriter.Create(file, new XmlWriterSettings {Indent = true}))
            {
                doc.WriteTo(writer);
            }
        }

        private static bool ShouldDeleteSpam(CommentState commentState)
        {
            return !commentState.Approved && 
                   (commentState.Spam || commentState.Deleted);
        }
        
        private static bool ShouldDeleteSpamAndUnApproved(CommentState commentState)
        {
            return !commentState.Approved || 
                   commentState.Spam ||
                   commentState.Deleted;
        }

        private class CommentState
        {
            public CommentState(String spam, String approved, String deleted)
            {
                Approved = bool.Parse(approved);
                Spam = bool.Parse(spam);
                Deleted = bool.Parse(deleted);
            }

            public bool Approved { get; private set; }
            public bool Spam { get; private set; }
            public bool Deleted { get; private set; }
        }
    }
}

Update: I also posted the code on bitbucket: https://bitbucket.org/lkempe/blogenginespamdelete

About Laurent

Laurent Kempé

Laurent Kempé is the editor, founder, and primary contributor of Tech Head Brothers, a French portal about Microsoft .NET technologies.

He is currently employed by Innoveo Solutions since 10/2007 as a Senior Solution Architect and certified Scrum Master.

Founder, owner and Managing Partner of Jobping, which provides a unique and efficient platform for connecting Microsoft skilled job seekers with employers using Microsoft technologies.

Laurent was awarded Most Valuable Professional (MVP) by Microsoft from April 2002 to April 2012.

JetBrains Academy Member
Certified ScrumMaster
My status

Twitter

Flickr

www.flickr.com
This is a Flickr badge showing public photos and videos from Laurent Kempé. Make your own badge here.

Month List

Page List