PDA

View Full Version : Stripping HTML tags from string in JavaScript?


MagSafe
2008-07-20, 12:42
Hi all,

Does anyone know of a way to strip out all HTML from a string, for example:


function stripHTML(myString)
{
// remove anything within '<>' brackets?

return myString;
}

myString = "<span><b>Some text</b></span>"

document.write stripHTML(myString)


Result:

Some text



Thanks :)

chucker
2008-07-20, 12:55
function stripHTML(s) {
return s.replace(/<(.|\n)*?>/g, '');
}

var string = "<span><b>Some text</b></span>"

document.write(stripHTML(string));

MagSafe
2008-07-20, 13:14
Hi Chucker,

Thanks for that, I managed to find a regular expression that seems to be working pretty well :)


// remove html tags from content
function stripHTML(s)
{
// replace html tags with nothing
s = s.replace(/<(?:.|\s)*?>/g,"");

// return content
return s;
}

Edit: Too slow :p , you must've found the same link I did :)

chucker
2008-07-20, 13:34
http://imgs.xkcd.com/comics/regular_expressions.png (http://xkcd.com/208/)

:D

MagSafe
2008-07-20, 13:38
lol, very appropriate :p

ast3r3x
2008-07-24, 18:28
All the while keeping in mind that those will match anything inbetween < and >. Which can be quite problematic if you just happen to be using those in your sentence and the webpage keeps removing large chunks of your text for unknown reasons. However if you had a better (http://regexlib.com/UserPatterns.aspx?authorId=a665e1aa-0726-4dfc-8297-c38e0e3777ff) regex :p

I made these a while ago, but honestly I think you want the first one.

chucker
2008-07-24, 23:01
All the while keeping in mind that those will match anything inbetween < and >. Which can be quite problematic if you just happen to be using those in your sentence and the webpage keeps removing large chunks of your text for unknown reasons.

Luckily, you can't use those within a sentence. If you actually wanted to use those characters, you'd have to encode them as entities.

ast3r3x
2008-07-25, 05:39
No encoding would go on since I assume this is client side since he is using js. They could be talking about math or just making a silly comment like beer > food. Or I've seen things like <sarcasm> ... </sarcasm>. People would be disappointed if such things were removed unnecessarily. I concede it isn't likely it will cause harm, but that doesn't mean that it will never happen and shouldn't be considered.

Of course I am curious as to why he is doing this since sanitizing a user input HAS to be done on the server anyways.

*Really you probably shouldn't be deleting content randomly, if you don't allow html, then just do what chucker said and encode everything they input as html entities.