Convert weird characters to their html entity equivalent

on 22-Nov-2011 | Comments ( 0 ) Tags: PHP

Recently I had to import some html from another site that was using a different encoding than UTF-8. The strange thing was that I could not find a good article on converting weird looking characters to their html equivalent and on top of this none of the php encoding functions worked for me.

After a long search I found this article that put me on the right track.

Using the above I started creating my own code to map the weird characters that broke the site:

function cleanImportedText($body){
 
    $replace = array("\x95","\x99");
    $replaceWith = array("•","™");
    
    $body = str_replace($replace, $replaceWith, $body );
    
    return $body;
}

The example above does the encoding for • and ™ from hexa. You can use it as starting point for your own mapping of "strange" characters.

Write a comment

  • Required fields are marked with *.

If you have trouble reading the code, click on the code itself to generate a new random code.
 

Quick modx Evolution Tags

  • cached [[snippet]] or uncached [!snippet!]
  • {{chunk}}
  • [+placeholder+]
  • [*resourceField/TV*]
  • [^timing^]
  • [~link~]

Quick modx Revolution Tags

  • [[snippet]]
  • [[$chunk]]
  • [[+placeholder]]
  • [[*resourceField/TV]]
  • [[~link]]
  • [^timing^]
  • [[++systemSetting]]
  • [[%languageStringKey]]
  • all tags can be called un-cached like: [[! snippet]]

Timing Tags (Evo and Revo)

  • [^qt^] - Query time
  • [^q^] - Query count
  • [^p^] - Parse time
  • [^t^] - Total time
  • [^s^] - Source
© modxRULES! 2009-2014