Source for file mime.php
Documentation is available at mime.php
* This contains the functions necessary to detect and decode MIME
* @copyright © 1999-2006 The SquirrelMail Project Team
* @license http://opensource.org/licenses/gpl-license.php GNU Public License
* @version $Id: mime.php,v 1.372 2006/10/06 22:02:52 stevetruckstuff Exp $
class/mime/Message.class.php
functions/page_header.php
functions/display_messages.php
functions/imap_general.php
functions/attachment_common.php
functions/display_messages.php
translateText => url_parser
/* -------------------------------------------------------------------------- */
/* -------------------------------------------------------------------------- */
* This function gets the structure of a message and stores it in the "message" class.
* It will return this object for use with all relevant header information and
* fully parsed into the standard "message" object format.
/* Isolate the body structure and remove beginning and end parenthesis. */
/* removed urldecode because $_GET is auto urldecoded ??? */
$errormessage =
_("SquirrelMail could not decode the bodystructure of the message");
$errormessage .=
'<br />'.
_("The bodystructure provided by your IMAP server:").
'<br /><br />';
foreach ($flags as $flag) {
$msg->is_answered =
true;
/* This starts the parsing of a particular structure. It is called recursively,
* so it can be passed different structures. It returns an object of type
* First, it checks to see if it is a multipart message. If it is, then it
* handles that as it sees is necessary. If it is just a regular entity,
* then it parses it and adds the necessary header information (by calling out
/* Do a bit of error correction. If we couldn't find the entity id, just guess
* that it is the first one. That is usually the case anyway.
$cmd =
"FETCH $id BODY[]";
$cmd =
"FETCH $id BODY[$ent_id]";
if ($fetch_size!=
0) $cmd .=
"<0.$fetch_size>";
} while($topline &&
($topline[0] ==
'*') &&
!preg_match('/\* [0-9]+ FETCH.*/i', $topline)) ;
$wholemessage =
implode('', $data);
if (ereg('\\{([^\\}]*)\\}', $topline, $regs)) {
$ret =
substr($wholemessage, 0, $regs[1]);
/* There is some information in the content info header that could be important
* in order to parse html messages. Let's get them here.
// $data = sqimap_run_command ($imap_stream, "FETCH $id BODY[$ent_id.MIME]", true, $response, $message, TRUE);
} else if (ereg('"([^"]*)"', $topline, $regs)) {
global $where, $what, $mailbox, $passed_id, $startMessage;
$par =
'mailbox=' .
urlencode($mailbox) .
'&passed_id=' .
$passed_id;
if (isset
($where) && isset
($what)) {
$par .=
'&startMessage=' .
$startMessage .
'&show_more=0';
$par .=
'&response=' .
urlencode($response) .
'<table width="80%"><tr>' .
_("Body retrieval error. The reason for this is most probably that the message is malformed.") .
'<tr><td><b>' .
_("Command:") .
"</td><td>$cmd</td></tr>" .
'<tr><td><b>' .
_("Response:") .
"</td><td>$response</td></tr>" .
'<tr><td><b>' .
_("Message:") .
"</td><td>$message</td></tr>" .
'<tr><td><b>' .
_("FETCH line:") .
"</td><td>$topline</td></tr>" .
"</table><br /></tt></font><hr />";
$data =
sqimap_run_command ($imap_stream, "FETCH $passed_id BODY[]", true, $response, $message, TRUE);
$wholemessage =
implode('', $data);
/* Don't kill the connection if the browser is over a dialup
* and it would take over 30 seconds to download it.
* Don't call set_time_limit in safe mode.
/* in case of base64 encoded attachments, do not buffer them.
Instead, echo the decoded attachment directly to screen */
$query =
"FETCH $id BODY[]";
$query =
"FETCH $id BODY[$ent_id]";
sqimap_run_command($imap_stream,$query,true,$response,$message,TRUE,'sqimap_base64_decode',$rStream,true);
TODO, use the same method for quoted printable.
However, I assume that quoted printable attachments aren't that large
so the performancegain / memory usage drop will be minimal.
If we decide to add that then we need to adapt sqimap_fread because
we need to split te result on \n and fread doesn't stop at \n. That
means we also should provide $results from sqimap_fread (by ref) to
te function and set $no_return to false. The $filter function for
quoted printable should handle unsetting of $results.
TODO 2: find out how we write to the output stream php://stdout. fwrite
doesn't work because 'php://stdout isn't a stream.
/* -[ END MIME DECODING ]----------------------------------------------------------- */
/* This is here for debugging purposes. It will print out a list
* of all the entity IDs that are in the $message object.
echo
"<tt>" .
$message->entity_id .
' : ' .
$message->type0 .
'/' .
$message->type1 .
' parent = '.
$message->parent->entity_id.
'<br />';
for ($i =
0; isset
($message->entities[$i]); $i++
) {
$priority_level =
substr($priority,0,1);
switch($priority_level) {
/* Check for a higher then normal priority. */
$priority_string =
_("High");
/* Check for a lower then normal priority. */
$priority_string =
_("Low");
/* Check for a normal priority. */
$priority_string =
_("Normal");
/* returns a $message object for a particular entity id */
* Extracted from strings.php 23/03/2002
global $where, $what; /* from searching */
global $color; /* color theme */
// require_once(SM_PATH . 'functions/url_parser.php');
for ($i=
0; $i <
count($body_ary); $i++
) {
if (strlen($line) -
2 >=
$wrap_at) {
if ($line[$pos] ==
' ') {
} else if (strpos($line, '>', $pos) ===
$pos) {
$line =
'<span class="quote1">' .
$line .
'</style>';
$line =
'<span class="quote2">' .
$line .
'</style>';
$body =
'<pre>' .
implode("\n", $body_ary) .
'</pre>';
* This returns a parsed string called $body. That string can then
* be displayed as the actual message in the HTML. It contains
* everything needed, including HTML Tags, Attachments at the
* Since 1.2.0 function uses message_body hook.
* Till 1.3.0 function included output of formatAttachments().
* @param resource $imap_stream imap connection resource
* @param object $message squirrelmail message object
* @param array $color squirrelmail color theme array
* @param integer $wrap_at number of characters per line
* @param string $ent_num (since 1.3.0) message part id
* @param integer $id (since 1.3.0) message id
* @param string $mailbox (since 1.3.0) imap folder name
* @param boolean $clean (since 1.5.1) Do not output stuff that's irrelevant for the printable version.
* @return string html formated message text
function formatBody($imap_stream, $message, $color, $wrap_at, $ent_num, $id, $mailbox=
'INBOX', $clean=
FALSE) {
/* This if statement checks for the entity to show as the
* primary message. To add more of them, just put them in the
* order that is their priority.
global $startMessage, $languages, $squirrelmail_language,
$show_html_default, $sort, $has_unsafe_images, $passed_ent_id,
$use_iframe, $iframe_height, $download_and_unsafe_link,
$download_href, $unsafe_image_toggle_href, $unsafe_image_toggle_text;
// workaround for not updated config.php
if (! isset
($use_iframe)) $use_iframe =
false;
$view_unsafe_images =
false;
$body_message =
getEntity($message, $ent_num);
if (($body_message->header->type0 ==
'text') ||
($body_message->header->type0 ==
'rfc822')) {
if (isset
($languages[$squirrelmail_language]['XTRA_CODE']) &&
function_exists($languages[$squirrelmail_language]['XTRA_CODE'] .
'_decode')) {
if (mb_detect_encoding($body) !=
'ASCII') {
$body =
call_user_func($languages[$squirrelmail_language]['XTRA_CODE'] .
'_decode',$body);
$hookResults =
do_hook("message_body", $body);
/* If there are other types that shouldn't be formatted, add
if ($body_message->header->type1 ==
'html') {
if ($show_html_default <>
1) {
$entity_conv =
array(' ' =>
' ',
$body =
strtr($body, $entity_conv);
$body_message->header->getParameter('charset'));
} elseif ($use_iframe &&
! $clean) {
// $clean is used to remove iframe in printable view.
* If we don't add html message between iframe tags,
* we must detect unsafe images and modify $has_unsafe_images.
$html_body =
magicHTML($body, $id, $message, $mailbox);
// Convert character set in order to display html mails in different character set
.
'mailbox=' .
$urlmailbox
.
'&passed_id=' .
$id
.
'&ent_id=' .
$ent_num
.
'&view_unsafe_images=' . (int)
$view_unsafe_images;
// adding warning message
$body =
html_tag('div',_("Viewing HTML formatted email"),'center');
* height can't be set to 100%, because it does not work as expected when
* iframe is inside the table. Browsers do not create full height objects
* even when iframe is not nested. Maybe there is some way to get full size
* with CSS. Tested in firefox 1.02 and opera 7.53
* width="100%" does not work as expected, when table width is not set (automatic)
* tokul: I think <iframe> are safer sandbox than <object>. Objects might
* need special handling for IE and IE6SP2.
$body.=
"<div><iframe name=\"message_frame\" width=\"100%\" height=\"$iframe_height\" src=\"$iframeurl\""
.
' frameborder="1" marginwidth="0" marginheight="0" scrolling="auto">' .
"\n";
// Message for browsers without iframe support
//$body.= _("Your browser does not support inline frames.
// You can view HTML formated message by following below link.");
//$body.= "<br /><a href=\"$iframeurl\">"._("View HTML Message")."</a>";
// if browser can't render iframe, it renders html message.
$body.=
"</iframe></div>\n";
// old way of html rendering
$body =
magicHTML($body, $id, $message, $mailbox);
* convert character set. charset_decode does not remove html special chars
* applied by magicHTML functions and does not sanitize them second time if
* fourth argument is true.
$body_message->header->getParameter('charset'));
// if this is the clean display (i.e. printer friendly), stop here.
$download_and_unsafe_link =
'';
$link =
'passed_id=' .
$id .
'&ent_id='.
$ent_num.
'&mailbox=' .
$urlmailbox .
'&sort=' .
$sort .
'&startMessage=' .
$startMessage .
'&show_more=0';
if (isset
($passed_ent_id)) {
$link .=
'&passed_ent_id='.
$passed_ent_id;
$download_href =
SM_PATH .
'src/download.php?absolute_dl=true&' .
$link;
$download_and_unsafe_link .=
' | <a href="'.
$download_href .
'">' .
_("Download this as a file") .
'</a>';
if ($view_unsafe_images) {
$text =
_("Hide Unsafe Images");
if (isset
($has_unsafe_images) &&
$has_unsafe_images) {
$link .=
'&view_unsafe_images=1';
$text =
_("View Unsafe Images");
$unsafe_image_toggle_href =
SM_PATH .
'src/read_body.php?'.
$link;
$unsafe_image_toggle_text =
$text;
$download_and_unsafe_link .=
' | <a href="'.
$unsafe_image_toggle_href .
'">' .
$text .
'</a>';
* Displays attachment links and information
* Since 1.3.0 function is not included in formatBody() call.
* Since 1.0.2 uses attachment $type0/$type1 hook.
* Since 1.2.5 uses attachment $type0/* hook.
* Since 1.5.0 uses attachments_bottom hook.
* Since 1.5.2 uses templates and does *not* return a value.
* @param object $message SquirrelMail message object
* @param array $exclude_id message parts that are not attachments.
* @param string $mailbox mailbox name
* @param integer $id message id
global $where, $what, $startMessage, $color, $passed_ent_id, $base_uri,
foreach ($att_ar as $att) {
$links['download link']['text'] =
_("Download");
$links['download link']['href'] =
$base_uri .
"src/download.php?absolute_dl=true&passed_id=$id&mailbox=$urlMailbox&ent_id=$ent";
if ($type0 ==
'message' &&
$type1 ==
'rfc822') {
$default_page =
$base_uri .
'src/read_body.php';
$rfc822_header =
$att->rfc822_header;
$filename =
$rfc822_header->subject;
if (trim( $filename ) ==
'') {
$filename =
'untitled-[' .
$ent .
']' ;
$from_o =
$rfc822_header->from;
// something weird happens when a digest message is opened and you return to the digest
// now the from object is part of an array. Probably the parseHeader call overwrites the info
// retrieved from the bodystructure in a different way. We need to fix this later.
// possible starting point, do not fetch header we already have and inspect how
// the rfc822_header object behaves.
$from_name =
_("Unknown sender");
$description =
_("From").
': '.
$from_name;
$default_page =
$base_uri .
'src/download.php';
$filename =
$att->getFilename();
if ($header->description) {
$display_filename =
$filename;
if (isset
($passed_ent_id)) {
$passed_ent_id_link =
'&passed_ent_id='.
$passed_ent_id;
$passed_ent_id_link =
'';
$defaultlink =
$default_page .
"?startMessage=$startMessage"
.
"&passed_id=$id&mailbox=$urlMailbox"
.
'&ent_id='.
$ent.
$passed_ent_id_link;
/* This executes the attachment hook with a specific MIME-type.
* If that doesn't have results, it tries if there's a rule
* for a more generic type.
$hookresults =
do_hook("attachment $type0/$type1", $links,
$startMessage, $id, $urlMailbox, $ent, $defaultlink,
$display_filename, $where, $what);
if(count($hookresults[1]) <=
1) {
$hookresults =
do_hook("attachment $type0/*", $links,
$startMessage, $id, $urlMailbox, $ent, $defaultlink,
$display_filename, $where, $what);
$links =
$hookresults[1];
$defaultlink =
$hookresults[6];
$a['Description'] =
$description;
$a['DefaultHREF'] =
$defaultlink;
$a['DownloadHREF'] =
$links['download link']['href'];
$a['ViewHREF'] = isset
($links['attachment_common']) ?
$links['attachment_common']['href'] :
'';
$a['Size'] =
$header->size;
$a['OtherLinks'] =
array();
foreach ($links as $val) {
if ($val['text']==
_("Download") ||
$val['text'] ==
_("View"))
if (empty($val['text']) &&
empty($val['extra']))
$t['HREF'] =
$val['href'];
$t['Text'] =
(empty($val['text']) ?
'' :
$val['text']) .
(empty($val['extra']) ?
'' :
$val['extra']);
$oTemplate->assign('attachments', $attach);
$oTemplate->display('read_attachments.tpl');
// Base64 encoded data goes in pairs of 4 bytes. To achieve on the
// fly decoding (to reduce memory usage) you have to check if the
// data has incomplete pairs
// Remove the noise in order to check if the 4 bytes pairs are complete
$string =
str_replace(array("\r\n","\n", "\r", " "),array('','','',''),$string);
$sStringRem =
substr($string,-
$iMod);
// Check if $sStringRem contains padding characters
if (substr($sStringRem,-
1) !=
'=') {
$string =
substr($string,0,-
$iMod);
* Decodes encoded message body
* This function decodes the body depending on the encoding type.
* Currently quoted-printable and base64 encodings are supported.
* decode_body hook was added to this function in 1.4.2/1.5.0
* @param string $body encoded message body
* @param string $encoding used encoding
* @return string decoded string
// plugins get first shot at decoding the body
$body =
$encoding_handler('decode', $body);
} elseif ($encoding ==
'quoted-printable' ||
$encoding ==
'quoted_printable') {
* quoted_printable_decode() function is broken in older
* php versions. Text with \r\n decoding was fixed only
* in php 4.3.0. Minimal code requirement 4.0.4 +
* str_replace("\r\n", "\n", $body); call.
} elseif ($encoding ==
'base64') {
// All other encodings are returned raw.
* This functions decode strings that is encoded according to
* RFC1522 (MIME Part Two: Message Header Extensions for Non-ASCII Text).
* @param string $string header string that has to be made readable
* @param boolean $utfencode change message in order to be readable on user's charset. defaults to true
* @param boolean $htmlsave preserve spaces and sanitize html special characters. defaults to true
* @param boolean $decide decide if string can be utfencoded. defaults to false
* @return string decoded header string
function decodeHeader ($string, $utfencode=
true,$htmlsave=
true,$decide=
false) {
if (isset
($languages[$squirrelmail_language]['XTRA_CODE']) &&
function_exists($languages[$squirrelmail_language]['XTRA_CODE'] .
'_decodeheader')) {
$string =
call_user_func($languages[$squirrelmail_language]['XTRA_CODE'] .
'_decodeheader', $string);
// Do we need to return at this point?
foreach ($aString as $chunk) {
if ($encoded &&
$chunk ===
'') {
} elseif ($chunk ===
'') {
/* if encoded words are not separated by a linear-space-white we still catch them */
while ($match =
preg_match('/^(.*)=\?([^?]*)\?(Q|B)\?([^?]*)\?=(.*)$/Ui',$chunk,$res)) {
/* if the last chunk isn't an encoded string then put back the space, otherwise don't */
if ($iLastMatch !==
$j) {
/* decide about valid decoding */
/* convert string to different charset,
* if functions asks for it (usually in compose)
// convert string to html codes in order to display it
$replace =
preg_replace('/=([0-9a-f]{2})/ie', 'chr(hexdec("\1"))',
/* convert string to different charset,
* if functions asks for it (usually in compose)
// convert string to html codes in order to display it
if (!$encoded &&
$htmlsave) {
/* remove the first added space */
* Function uses XTRA_CODE _encodeheader function, if such function exists.
* Function uses Q encoding by default and encodes a string according to RFC
* 1522 for use in headers if it contains 8-bit characters or anything that
* looks like it should be encoded.
* Function switches to B encoding and encodeHeaderBase64() function, if
* string is 8bit and multibyte character set supported by mbstring extension
* is used. It can cause E_USER_NOTICE errors, if interface is used with
* multibyte character set unsupported by mbstring extension.
* @param string $string header string, that has to be encoded
* @return string quoted-printable encoded string
* @todo make $mb_charsets system wide constant
if (isset
($languages[$squirrelmail_language]['XTRA_CODE']) &&
function_exists($languages[$squirrelmail_language]['XTRA_CODE'] .
'_encodeheader')) {
return call_user_func($languages[$squirrelmail_language]['XTRA_CODE'] .
'_encodeheader', $string);
// Use B encoding for multibyte charsets
$mb_charsets =
array('utf-8','big5','gb2313','euc-kr');
if (in_array($default_charset,$mb_charsets) &&
} elseif (in_array($default_charset,$mb_charsets) &&
// Add E_USER_NOTICE error here (can cause 'Cannot add header information' warning in compose.php)
// trigger_error('encodeHeader: Multibyte character set unsupported by mbstring extension.',E_USER_NOTICE);
// Encode only if the string contains 8-bit characters or =?
$max_l =
75 -
strlen($default_charset) -
7;
$iEncStart =
$enc_init =
false;
for($i =
0; $i <
$j; ++
$i) {
if ($iEncStart ===
false) {
if ($cur_l >
($max_l-
2)) {
/* if there is an stringpart that doesn't need encoding, add it */
$aRet[] =
substr($string,$iOffset,$iEncStart-
$iOffset);
$aRet[] =
"=?$default_charset?Q?$ret?=";
if ($iEncStart !==
false) {
$aRet[] =
substr($string,$iOffset,$iEncStart-
$iOffset);
$aRet[] =
"=?$default_charset?Q?$ret?=";
if ($iEncStart !==
false) {
$aRet[] =
substr($string,$iOffset,$iEncStart-
$iOffset);
$aRet[] =
"=?$default_charset?Q?$ret?=";
if ($iEncStart ===
false) {
// do not start encoding in the middle of a string, also take the rest of the word.
$sLeadString =
substr($string,0,$i);
$aLeadString =
explode(' ',$sLeadString);
$iEncStart =
$i -
strlen($sToBeEncoded);
$cur_l +=
strlen($sToBeEncoded);
/* first we add the encoded string that reached it's max size */
if ($cur_l >
($max_l-
2)) {
$aRet[] =
substr($string,$iOffset,$iEncStart-
$iOffset);
$aRet[] =
"=?$default_charset?Q?$ret?= "; /* the next part is also encoded => separate by space */
if ($iEncStart !==
false) {
$aRet[] =
substr($string,$iOffset,$iEncStart-
$iOffset);
$aRet[] =
"=?$default_charset?Q?$ret?=";
if ($iEncStart !==
false) {
$aRet[] =
substr($string,$iOffset,$iEncStart-
$iOffset);
$aRet[] =
"=?$default_charset?Q?$ret?=";
$aRet[] =
substr($string,$iOffset);
* Encodes string according to rfc2047 B encoding header formating rules
* It is recommended way to encode headers with character sets that store
* symbols in more than one byte.
* Function requires mbstring support. If required mbstring functions are missing,
* function returns false and sets E_USER_WARNING level error message.
* Minimal requirements - php 4.0.6 with mbstring extension. Please note,
* that mbstring functions will generate E_WARNING errors, if unsupported
* character set is used. mb_encode_mimeheader function provided by php
* mbstring extension is not used in order to get better control of header
* Used php code functions - function_exists(), trigger_error(), strlen()
* (is used with charset names and base64 strings). Used php mbstring
* functions - mb_strlen and mb_substr.
* Related documents: rfc 2045 (BASE64 encoding), rfc 2047 (mime header
* encoding), rfc 2822 (header folding)
* @param string $string header string that must be encoded
* @param string $charset character set. Must be supported by mbstring extension.
* Use sq_mb_list_encodings() to detect supported charsets.
* @return string string encoded according to rfc2047 B encoding formating rules
* @todo First header line can be wrapped to $iMaxLength - $HeaderFieldLength - 1
* @todo Do we want to control max length of header?
* @todo Do we want to control EOL (end-of-line) marker?
* @todo Do we want to translate error message?
* Check mbstring function requirements.
trigger_error('encodeHeaderBase64: Required mbstring functions are missing.',E_USER_WARNING);
* header length = 75 symbols max (same as in encodeHeader)
* remove =? ? ?= (5 chars)
* remove 2 more chars (\r\n ?)
$iMaxLength =
75 -
strlen($charset) -
7;
// set first character position
// loop through all characters. count characters and not bytes.
for ($iCharNum=
1; $iCharNum<=
mb_strlen($string,$charset); $iCharNum++
) {
// encode string from starting character to current character.
$encoded_string =
base64_encode(mb_substr($string,$iStartCharNum,$iCharNum-
$iStartCharNum,$charset));
// Check encoded string length
if(strlen($encoded_string)>
$iMaxLength) {
// if string exceeds max length, reduce number of encoded characters and add encoded string part to array
$aRet[] =
base64_encode(mb_substr($string,$iStartCharNum,$iCharNum-
$iStartCharNum-
1,$charset));
// set new starting character
$iStartCharNum =
$iCharNum-
1;
// encode last char (in case it is last character in string)
$encoded_string =
base64_encode(mb_substr($string,$iStartCharNum,$iCharNum-
$iStartCharNum,$charset));
} // if string is shorter than max length - add next character
// add last encoded string to array
$aRet[] =
$encoded_string;
// set initial return string
// loop through encoded strings
foreach($aRet as $string) {
// TODO: Do we want to control EOL (end-of-line) marker
if ($sRet!=
'') $sRet.=
" ";
// add header tags and encoded string to return string
$sRet.=
'=?'.
$charset.
'?B?'.
$string.
'?=';
/* This function trys to locate the entity_id of a specific mime element */
for ($i =
0, $ret =
''; $ret ==
'' &&
$i <
count($message->entities); $i++
) {
if ($message->entities[$i]->header->type0 ==
'multipart') {
// if (sq_check_save_extension($message->entities[$i])) {
return $message->entities[$i]->entity_id;
} elseif (!empty($message->entities[$i]->header->parameters['name'])) {
* This is part of a fix for Outlook Express 6.x generating
* cid URLs without creating content-id headers
return $message->entities[$i]->entity_id;
$filename =
$message->getFilename();
$save_extensions =
array('jpg','jpeg','gif','png','bmp');
return in_array($ext, $save_extensions);
* This function checks attribute values for entity-encoded values
* and returns them translated into 8-bit strings so we can run
* @param $attvalue A string to run entity check against.
* @return Nothing, modifies a reference value.
* Skip this if there aren't ampersands or backslashes.
if (strpos($attvalue, '&') ===
false
&&
strpos($attvalue, '\\') ===
false){
$m =
$m ||
sq_deent($attvalue, '/\�*(\d+);*/s');
$m =
$m ||
sq_deent($attvalue, '/\�*((\d|[a-f])+);*/si', true);
$m =
$m ||
sq_deent($attvalue, '/\\\\(\d+)/s', true);
* Kill any tabs, newlines, or carriage returns. Our friends the
* makers of the browser with 95% market value decided that it'd
* be funny to make "java[tab]script" be just as good as "javascript".
* @param attvalue The attribute value before extraneous spaces removed.
* @return attvalue Nothing, modifies a reference value.
$attvalue =
str_replace(Array("\t", "\r", "\n", "\0", " "),
Array('', '', '', '', ''), $attvalue);
* This function returns the final tag out of the tag name, an array
* of attributes, and the type of the tag. This function is called by
* sq_sanitize internally.
* @param $tagname the name of the tag.
* @param $attary the array of attributes and their values
* @param $tagtype The type of the tag (see in comments).
* @return a string with the final tag representation.
$fulltag =
'</' .
$tagname .
'>';
$fulltag =
'<' .
$tagname;
while (list
($attname, $attvalue) =
each($attary)){
$fulltag .=
' ' .
join(" ", $atts);
* A small helper function to use with array_walk. Modifies a by-ref
* value and makes it lowercase.
* @param $val a value passed by-ref.
* @return void since it modifies a by-ref value.
* This function skips any whitespace from the current position within
* a string and to the next non-whitespace value.
* @param $body the string
* @param $offset the offset within the string where we should start
* looking for the next non-whitespace character.
* @return the location within the $body where the next
* non-whitespace char is located.
* This function looks for the next character within a string. It's
* really just a glorified "strpos", except it catches if failures
* @param $body The string to look for needle in.
* @param $offset Start looking from this position.
* @param $needle The character/string to look for.
* @return location of the next occurance of the needle, or
* strlen($body) if needle wasn't found.
$pos =
strpos($body, $needle, $offset);
* This function takes a PCRE-style regexp and tries to match it
* @param $body The string to look for needle in.
* @param $offset Start looking from here.
* @param $reg A PCRE-style regex to match.
* @return Returns a false if no matches found, or an array
* with the following members:
* - integer with the location of the match within $body
* - string with whatever content between offset and the match
* - string with whatever it is we matched
if (!isset
($matches{0}) ||
!$matches{0}){
$retarr{0} =
$offset +
strlen($matches{1});
$retarr{1} =
$matches{1};
$retarr{2} =
$matches{2};
* This function looks for the next tag.
* @param $body String where to look for the next tag.
* @param $offset Start looking from here.
* @return false if no more tags exist in the body, or
* an array with the following members:
* - string with the name of the tag
* - array with attributes and their values
* - integer with tag type (1, 2, or 3)
* - integer where the tag starts (starting "<")
* - integer where the tag ends (ending ">")
* first three members will be false, if the tag is invalid.
* blah blah <tag attribute="value">
return Array(false, false, false, $lt, strlen($body));
* There are 3 kinds of tags:
* 3. XHTML-style content-less tag, e.g.:
switch (substr($body, $pos, 1)){
* A comment or an SGML declaration.
if (substr($body, $pos+
1, 2) ==
"--"){
$gt =
strpos($body, "-->", $pos);
return Array(false, false, false, $lt, $gt);
return Array(false, false, false, $lt, $gt);
* Assume tagtype 1 for now. If it's type 3, we'll switch values
* Look for next [\W-_], which will indicate the end of the tag name.
return Array(false, false, false, $lt, strlen($body));
list
($pos, $tagname, $match) =
$regary;
* $match can be either of these:
* '>' indicating the end of the tag entirely.
* '\s' indicating the end of the tag name.
* '/' indicating that this is type-3 xhtml tag.
* Whatever else we find there indicates an invalid tag.
* This is an xhtml-style tag with a closing / at the
* end, like so: <img src="blah" />. Check if it's followed
* by the closing bracket. If not, then this tag is invalid
if (substr($body, $pos, 2) ==
"/>"){
$retary =
Array(false, false, false, $lt, $gt);
return Array($tagname, false, $tagtype, $lt, $pos);
* Check if it's whitespace
* This is an invalid tag! Look for the next closing ">".
return Array(false, false, false, $lt, $gt);
* At this point we're here:
* <tagname attribute='blah'>
* At this point we loop in order to find all attributes.
while ($pos <=
strlen($body)){
return Array(false, false, false, $lt, $pos);
* See if we arrived at a ">" or "/>", which means that we reached
if ($matches{2} ==
"/>"){
return Array($tagname, $attary, $tagtype, $lt, $pos);
* There are several types of attributes, with optional
* [:space:] between members.
* attrname[:space:]=[:space:]'CDATA'
* attrname[:space:]=[:space:]"CDATA"
* attr[:space:]=[:space:]CDATA
* We leave types 1 and 2 the same, type 3 we check for
* '"' and convert to """ if needed, then wrap in
* double quotes. Type 4 we convert into:
* Looks like body ended before the end of tag.
return Array(false, false, false, $lt, strlen($body));
list
($pos, $attname, $match) =
$regary;
* We arrived at the end of attribute name. Several things possible
* '>' means the end of the tag and this is attribute type 4
* '/' if followed by '>' means the same thing as above
* '\s' means a lot of things -- look what it's followed by.
* anything else means the attribute is invalid.
* This is an xhtml-style tag with a closing / at the
* end, like so: <img src="blah" />. Check if it's followed
* by the closing bracket. If not, then this tag is invalid
if (substr($body, $pos, 2) ==
"/>"){
$retary =
Array(false, false, false, $lt, $gt);
$attary{$attname} =
'"yes"';
return Array($tagname, $attary, $tagtype, $lt, $pos);
* Skip whitespace and see what we arrive at.
$char =
substr($body, $pos, 1);
* Two things are valid here:
* '=' means this is attribute type 1 2 or 3.
* \w means this was attribute type 4.
* anything else we ignore and re-loop. End of tag and
* invalid stuff will be caught by our checks at the beginning
* Here are 3 possibilities:
* everything else is the content of tag type 3
$quot =
substr($body, $pos, 1);
return Array(false, false, false, $lt, strlen($body));
list
($pos, $attval, $match) =
$regary;
$attary{$attname} =
"'" .
$attval .
"'";
} else if ($quot ==
'"'){
return Array(false, false, false, $lt, strlen($body));
list
($pos, $attval, $match) =
$regary;
$attary{$attname} =
'"' .
$attval .
'"';
* These are hateful. Look for \s, or >.
return Array(false, false, false, $lt, strlen($body));
list
($pos, $attval, $match) =
$regary;
* If it's ">" it will be caught at the top.
$attary{$attname} =
'"' .
$attval .
'"';
* That was attribute type 4.
$attary{$attname} =
'"yes"';
* An illegal character. Find next '>' and return.
return Array(false, false, false, $lt, $gt);
* The fact that we got here indicates that the tag end was never
* found. Return invalid tag indication so it gets stripped.
return Array(false, false, false, $lt, strlen($body));
* Translates entities into literal values so they can be checked.
* @param $attvalue the by-ref value to check.
* @param $regex the regular expression to check against.
* @param $hex whether the entites are hexadecimal.
* @return True or False depending on whether there were matches.
function sq_deent(&$attvalue, $regex, $hex=
false){
for ($i =
0; $i <
sizeof($matches[0]); $i++
){
$numval =
$matches[1][$i];
$repl{$matches[0][$i]} =
chr($numval);
$attvalue =
strtr($attvalue, $repl);
* This function runs various checks against the attributes.
* @param $tagname String with the name of the tag.
* @param $attary Array with all tag attributes.
* @param $rm_attnames See description for sq_sanitize
* @param $bad_attvals See description for sq_sanitize
* @param $add_attr_to_tag See description for sq_sanitize
* @param $message message object
* @return Array with modified attributes.
while (list
($attname, $attvalue) =
each($attary)){
* See if this attribute should be removed.
foreach ($rm_attnames as $matchtag=>
$matchattrs){
foreach ($matchattrs as $matchattr){
unset
($attary{$attname});
* Remove any backslashes, entities, and extraneous whitespace.
* Now let's run checks on the attvalues.
* I don't expect anyone to comprehend this. If you do,
* get in touch with me so I can drive to where you live and
* shake your hand personally. :)
foreach ($bad_attvals as $matchtag=>
$matchattrs){
foreach ($matchattrs as $matchattr=>
$valary){
* There are two arrays in valary.
* Second one is replacements
list
($valmatch, $valrepl) =
$valary;
if ($newvalue !=
$attvalue){
$attary{$attname} =
$newvalue;
* Replace empty src tags with the blank image. src is only used
* for frames, images, and image inputs. Doing a replace should
* not affect them working as should be, however it will stop
* IE from being kicked off when src for img tags are not set
if (($attname ==
'src') &&
($attvalue ==
'""')) {
$attary{$attname} =
'"' .
SM_PATH .
'images/blank.png"';
* Turn cid: urls into http-friendly ones.
$attary{$attname} =
sq_cid2http($message, $id, $attvalue, $mailbox);
* "Hack" fix for Outlook using propriatary outbind:// protocol in img tags.
* One day MS might actually make it match something useful, for now, falling
* back to using cid2http, so we can grab the blank.png.
if (preg_match("/^[\'\"]\s*outbind:\/\//si", $attvalue)) {
$attary{$attname} =
sq_cid2http($message, $id, $attvalue, $mailbox);
* See if we need to append any attributes to this tag.
foreach ($add_attr_to_tag as $matchtag=>
$addattary){
* This function edits the style definition to make them friendly and
* usable in SquirrelMail.
* @param $message the message object
* @param $id the message id
* @param $content a string with whatever is between <style> and </style>
* @param $mailbox the message mailbox
* @return a string with edited content.
function sq_fixstyle($body, $pos, $message, $id, $mailbox){
global $view_unsafe_images;
return array(FALSE, strlen($body));
$newpos =
$ret[0] +
strlen($ret[2]);
* First look for general BODY style declaration, which would be
* body {background: blah-blah}
* and change it to .bodyclass so we can just assign it to a <div>
$content =
preg_replace("|body(\s*\{.*?\})|si", ".bodyclass\\1", $content);
$secremoveimg =
'../images/' .
_("sec_remove_eng.png");
* Fix url('blah') declarations.
// $content = preg_replace("|url\s*\(\s*([\'\"])\s*\S+script\s*:.*?([\'\"])\s*\)|si",
// "url(\\1$secremoveimg\\2)", $content);
// translate ur\l and variations (IE parses that)
$content =
preg_replace("/(\\\\)?u(\\\\)?r(\\\\)?l(\\\\)?/i", 'url', $content);
// NB I insert NUL characters to keep to avoid an infinite loop. They are removed after the loop.
while (preg_match("/url\s*\(\s*[\'\"]?([^:]+):(.*)?[\'\"]?\s*\)/si", $content, $matches)) {
* Fix url('https*://.*) declarations but only if $view_unsafe_images
if (!$view_unsafe_images){
$sExpr =
"/url\s*\(\s*[\'\"]?\s*$sProto*:.*[\'\"]?\s*\)/si";
$content =
preg_replace($sExpr, "u\0r\0l(\\1$secremoveimg\\2)", $content);
* Fix urls that refer to cid:
$cidurl =
'cid:'.
$matches[2];
$httpurl =
sq_cid2http($message, $id, $cidurl, $mailbox);
// escape parentheses that can modify the regular expression
$cidurl =
str_replace(array('(',')'),array('\\(','\\)'),$cidurl);
"u\0r\0l($httpurl)", $content);
* replace url with protocol other then the white list
* http,https and cid by an empty string.
$content =
preg_replace("/url\s*\(\s*[\'\"]?([^:]+):(.*)?[\'\"]?\s*\)/si",
* Remove any backslashes, entities, and extraneous whitespace.
* Fix stupid css declarations which lead to vulnerabilities
$match =
Array('/\/\*.*\*\//',
$replace =
Array('','idiocy', 'idiocy', 'idiocy', 'idiocy');
if ($contentNew !==
$contentTemp) {
// insecure css declarations are used. From now on we don't care
// anymore if the css is destroyed by sq_deent, sq_unspace or sq_unbackslash
return array($content, $newpos);
* This function converts cid: url's into the ones that can be viewed in
* @param $message the message object
* @param $id the message id
* @param $cidurl the cid: url.
* @param $mailbox the message mailbox
* @return a string with a http-friendly url
function sq_cid2http($message, $id, $cidurl, $mailbox){
$quotchar =
substr($cidurl, 0, 1);
if ($quotchar ==
'"' ||
$quotchar ==
"'"){
$match_str =
'/\{.*?\}\//';
/* in case of non-save cid links $httpurl should be replaced by a sort of
* This is part of a fix for Outlook Express 6.x generating
* cid URLs without creating content-id headers. These images are
* not part of the multipart/related html mail. The html contains
* <img src="cid:{some_id}/image_filename.ext"> references to
* attached images with as goal to render them inline although
* the attachment disposition property is not inline.
$httpurl =
$quotchar .
SM_PATH .
'src/download.php?absolute_dl=true&' .
"passed_id=$id&mailbox=" .
urlencode($mailbox) .
'&ent_id=' .
$linkurl .
$quotchar;
* If we couldn't generate a proper img url, drop in a blank image
* instead of sending back empty, otherwise it causes unusual behaviour
$httpurl =
$quotchar .
SM_PATH .
'images/blank.png' .
$quotchar;
* This function changes the <body> tag into a <div> tag since we
* can't really have a body-within-body.
* @param $attary an array of attributes and values of <body>
* @param $mailbox mailbox we're currently reading (for cid2http)
* @param $message current message (for cid2http)
* @param $id current message id (for cid2http)
* @return a modified array of attributes to be set for <div>
function sq_body2div($attary, $mailbox, $message, $id){
$divattary =
Array('class' =>
"'bodyclass'");
$has_bgc_stl =
$has_txt_stl =
false;
foreach ($attary as $attname=>
$attvalue){
$quotchar =
substr($attvalue, 0, 1);
$attvalue =
sq_cid2http($message, $id, $attvalue, $mailbox);
$styledef .=
"background-image: url('$attvalue'); ";
$styledef .=
"background-color: $attvalue; ";
$styledef .=
"color: $attvalue; ";
// Outlook defines a white bgcolor and no text color. This can lead to
// white text on a white bg with certain themes.
if ($has_bgc_stl &&
!$has_txt_stl) {
$styledef .=
"color: $text; ";
$divattary{"style"} =
"\"$styledef\"";
* This is the main function and the one you should actually be calling.
* There are several variables you should be aware of an which need
* Since the description is quite lengthy, see it here:
* http://linux.duke.edu/projects/mini/htmlfilter/
* @param $body the string with HTML you wish to filter
* @param $tag_list see description above
* @param $rm_tags_with_content see description above
* @param $self_closing_tags see description above
* @param $force_tag_closing see description above
* @param $rm_attnames see description above
* @param $bad_attvals see description above
* @param $add_attr_to_tag see description above
* @param $message message object
* @return sanitized html safe to show on your pages.
* Normalize rm_tags and rm_tags_with_content.
@array_walk($rm_tags_with_content, 'sq_casenormalize');
@array_walk($self_closing_tags, 'sq_casenormalize');
* See if tag_list is of tags to remove or tags to allow.
* false means remove these tags
* true means allow these tags
$trusted =
"\n<!-- begin sanitized html -->\n";
* Take care of netscape's stupid javascript entities like
while (($curtag =
sq_getnxtag($body, $curpos)) !=
FALSE){
list
($tagname, $attary, $tagtype, $lt, $gt) =
$curtag;
$free_content =
substr($body, $curpos, $lt-
$curpos);
if ($tagname ==
"style" &&
$tagtype ==
1){
list
($free_content, $curpos) =
if ($free_content !=
FALSE){
$trusted .=
$free_content;
if ($skip_content ==
false){
$trusted .=
$free_content;
if ($skip_content ==
$tagname){
* Got to the end of tag we needed to remove.
if ($skip_content ==
false){
if (isset
($open_tags{$tagname}) &&
$open_tags{$tagname} >
0){
if ($skip_content ==
false){
* See if this is a self-closing type and change
&&
in_array($tagname, $self_closing_tags)){
* See if we should skip this tag and any content
in_array($tagname, $rm_tags_with_content)){
$skip_content =
$tagname;
if (isset
($open_tags{$tagname})){
* This is where we run other checks.
if ($tagname !=
false &&
$skip_content ==
false){
if ($force_tag_closing ==
true){
foreach ($open_tags as $tagname=>
$opentimes){
$trusted .=
'</' .
$tagname .
'>';
$trusted .=
"<!-- end sanitized html -->\n";
* This is a wrapper function to call html sanitizing routines.
* @param $body the body of the message
* @param $id the id of the message
* @param boolean $take_mailto_links When TRUE, converts mailto: links
* into internal SM compose links
* (optional; default = TRUE)
* @return a string with html safe to display in the browser.
function magicHTML($body, $id, $message, $mailbox =
'INBOX', $take_mailto_links =
true) {
// require_once(SM_PATH . 'functions/url_parser.php'); // for $MailTo_PReg_Match
global $attachment_common_show_images, $view_unsafe_images,
* Don't display attached images in HTML mode.
$attachment_common_show_images =
false;
$rm_tags_with_content =
Array(
$self_closing_tags =
Array(
$force_tag_closing =
true;
$secremoveimg =
"../images/" .
_("sec_remove_eng.png");
"/^([\'\"])\s*\S+script\s*:.*([\'\"])/si",
"/^([\'\"])\s*mocha\s*:*.*([\'\"])/si",
"/^([\'\"])\s*about\s*:.*([\'\"])/si"
"/^([\'\"])\s*\S+script\s*:.*([\'\"])/si",
"/^([\'\"])\s*mocha\s*:*.*([\'\"])/si",
"/^([\'\"])\s*about\s*:.*([\'\"])/si"
"/position\s*:\s*absolute/i",
"/(\\\\)?u(\\\\)?r(\\\\)?l(\\\\)?/i",
"/url\s*\(\s*([\'\"])\s*\S+script\s*:.*([\'\"])\s*\)/si",
"/url\s*\(\s*([\'\"])\s*mocha\s*:.*([\'\"])\s*\)/si",
"/url\s*\(\s*([\'\"])\s*about\s*:.*([\'\"])\s*\)/si",
"/(.*)\s*:\s*url\s*\(\s*([\'\"]*)\s*\S+script\s*:.*([\'\"]*)\s*\)/si"
$view_unsafe_images =
false;
if (!$view_unsafe_images){
* Remove any references to http/https if view_unsafe_images set
array_push($bad_attvals{'/.*/'}{'/^src|background/i'}[0],
'/^([\'\"])\s*https*:.*([\'\"])/si');
array_push($bad_attvals{'/.*/'}{'/^src|background/i'}[1],
'/url\([\'\"]?https?:[^\)]*[\'\"]?\)/si');
"url(\\1$secremoveimg\\1)");
$add_attr_to_tag =
Array(
Array('target'=>
'"_blank"',
'title'=>
'"'.
_("This external link will open in a new window").
'"'
$has_unsafe_images =
true;
// we want to parse mailto's in HTML output, change to SM compose links
// this is a modified version of code from url_parser.php... but Marc is
// right: we need a better filtering implementation; adding this randomly
// here is not a great solution
if ($take_mailto_links) {
// parseUrl($trusted); // this even parses URLs inside of tags... too aggressive
global $MailTo_PReg_Match;
$MailTo_PReg_Match =
'/mailto:' .
substr($MailTo_PReg_Match, 1) ;
if ((preg_match_all($MailTo_PReg_Match, $trusted, $regs)) &&
($regs[0][0] !=
'')) {
foreach ($regs[0] as $i =>
$mailto_before) {
$mailto_params =
$regs[10][$i];
// get rid of any tailing quote since we have to add send_to to the end
if (substr($mailto_before, strlen($mailto_before) -
1) ==
'"')
$mailto_before =
substr($mailto_before, 0, strlen($mailto_before) -
1);
if (substr($mailto_params, strlen($mailto_params) -
1) ==
'"')
$mailto_params =
substr($mailto_params, 0, strlen($mailto_params) -
1);
if ($regs[1][$i]) { //if there is an email addr before '?', we need to merge it with the params
$to =
'to=' .
$regs[1][$i];
if (strpos($mailto_params, 'to=') > -
1) //already a 'to='
$mailto_params =
str_replace('to=', $to .
'%2C%20', $mailto_params);
if ($mailto_params) //already some params, append to them
$mailto_params .=
'&' .
$to;
$mailto_params .=
'?' .
$to;
$url_str =
preg_replace(array('/to=/i', '/(?<!b)cc=/i', '/bcc=/i'), array('send_to=', 'send_to_cc=', 'send_to_bcc='), $mailto_params);
// we'll already have target=_blank, no need to allow comp_in_new
// here (which would be a lot more work anyway)
$temp_comp_in_new =
$compose_new_win;
$compose_new_win =
$temp_comp_in_new;
// remove <a href=" and anything after the next quote (we only
// need the uri, not the link HTML) in compose uri
$comp_uri =
substr($comp_uri, 9);
$comp_uri =
substr($comp_uri, 0, strpos($comp_uri, '"', 1));
$trusted =
str_replace($mailto_before, $comp_uri, $trusted);
* function SendDownloadHeaders - send file to the browser
* Original Source: SM core src/download.php
* moved here to make it available to other code, and separate
* front end from back end functionality.
* @param string $type0 first half of mime type
* @param string $type1 second half of mime type
* @param string $filename filename to tell the browser for downloaded file
* @param boolean $force whether to force the download dialog to pop
* @param optional integer $filesize send the Content-Header and length to the browser
$isIE =
$isIE6plus =
false;
if (strstr($HTTP_USER_AGENT, 'compatible; MSIE ') !==
false &&
strstr($HTTP_USER_AGENT, 'Opera') ===
false) {
if (preg_match('/compatible; MSIE ([0-9]+)/', $HTTP_USER_AGENT, $match) &&
((int)
$match[1]) >=
6 &&
strstr($HTTP_USER_AGENT, 'Opera') ===
false) {
if (isset
($languages[$squirrelmail_language]['XTRA_CODE']) &&
function_exists($languages[$squirrelmail_language]['XTRA_CODE'] .
'_downloadfilename')) {
call_user_func($languages[$squirrelmail_language]['XTRA_CODE'] .
'_downloadfilename', $filename, $HTTP_USER_AGENT);
// A Pox on Microsoft and it's Internet Explorer!
// IE has lots of bugs with file downloads.
// It also has problems with SSL. Both of these cause problems
// for us in this function.
// See this article on Cache Control headers and SSL
// http://support.microsoft.com/default.aspx?scid=kb;en-us;323308
// The best thing you can do for IE is to upgrade to the latest
//set all the Cache Control Headers for IE
header ("Cache-Control: no-store, max-age=0, no-cache, must-revalidate"); // HTTP/1.1
header ("Cache-Control: post-check=0, pre-check=0", false);
header ("Cache-Control: private");
//set the inline header for IE, we'll add the attachment header later if we need it
header ("Content-Disposition: inline; filename=$filename");
// Try to show in browser window
header ("Content-Disposition: inline; filename=\"$filename\"");
header ("Content-Type: $type0/$type1; name=\"$filename\"");
// Try to pop up the "save as" box
// IE makes this hard. It pops up 2 save boxes, or none.
// http://support.microsoft.com/support/kb/articles/Q238/5/88.ASP
// http://support.microsoft.com/default.aspx?scid=kb;EN-US;260519
// But, according to Microsoft, it is "RFC compliant but doesn't
// take into account some deviations that allowed within the
// specification." Doesn't that mean RFC non-compliant?
// http://support.microsoft.com/support/kb/articles/Q258/4/52.ASP
// all browsers need the application/octet-stream header for this
header ("Content-Type: application/octet-stream; name=\"$filename\"");
// http://support.microsoft.com/support/kb/articles/Q182/3/15.asp
// Do not have quotes around filename, but that applied to
// "attachment"... does it apply to inline too?
header ("Content-Disposition: attachment; filename=\"$filename\"");
if ($isIE &&
!$isIE6plus) {
// This combination seems to work mostly. IE 5.5 SP 1 has
// known issues (see the Microsoft Knowledge Base)
// This works for most types, but doesn't work with Word files
header ("Content-Type: application/download; name=\"$filename\"");
// These are spares, just in case. :-)
//header("Content-Type: $type0/$type1; name=\"$filename\"");
//header("Content-Type: application/x-msdownload; name=\"$filename\"");
//header("Content-Type: application/octet-stream; name=\"$filename\"");
// another application/octet-stream forces download for Netscape
header ("Content-Type: application/octet-stream; name=\"$filename\"");
//send the content-length header if the calling function provides it
header("Content-Length: $filesize");
} // end fn SendDownloadHeaders
Documentation generated on Sat, 07 Oct 2006 16:12:37 +0300 by phpDocumentor 1.3.0RC6