Bookmark and Share
23Jan/11Off

Web.config/Machine.config Optimal Settings For ASP.NET website

These are what we have collected from different sources as our experiences during our life as ASP.NET web developer :) . More update will come...stay in touch with us...

For production websites, it’s important to remember to set the <Compilation debug =”false” /> setting in Web.config. This ensures no unnecessary debug code is generated for release version of website. If you are not using some of the ASP.NET modules such as Windows Authentication or Passport Authentication etc. then these can be removed from the ASP.NET processing pipeline as they will be unnecessarily loaded otherwise. Below is an example of some modules that could be removed from the pipeline:

<httpModules>
<remove name="WindowsAuthentication"/>
<remove name="PassportAuthentication"/>
<remove name="AnonymousIdentification"/>
<remove name="UrlAuthorization"/>
<remove name="FileAuthorization"/>
<remove name="OutputCache"/>
</httpModules>
ASP.NET Process Model configuration defines some process level properties like how many number of threads ASP.NET uses, how long it blocks a thread before timing out, how many requests to keep waiting for IO works to complete and so on. With fast servers with a lot of RAM, the process model configuration can be tweaked to make ASP.NET process consume more system resources and provide better scalability from each server. The below settings can help performance (a cut down version from an excellent article here):
<processModel
enable="true"
timeout="Infinite"
idleTimeout="Infinite"
shutdownTimeout="00:00:05"
requestLimit="Infinite"
requestQueueLimit="5000"
restartQueueLimit="10"
memoryLimit="60"
responseDeadlockInterval="00:03:00"
responseRestartDeadlockInterval="00:03:00"
maxWorkerThreads="100"
maxIoThreads="100"
minWorkerThreads="40"
minIoThreads="30"
asyncOption="20"
maxAppDomains="2000"
/>

Collected from: Code Project

Minimize Calls to DataBinder.Eval (Collected from MSDN)

The DataBinder.Eval method uses reflection to evaluate the arguments that are passed in and to return the results. If you have a table that has 100 rows and 10 columns, you callDataBinder.Eval 1,000 times if you use DataBinder.Eval on each column. Your choice to use DataBinder.Eval is multiplied 1,000 times in this scenario. Limiting the use ofDataBinder.Eval during data binding operations significantly improves page performance. Consider the following ItemTemplate element within a Repeater control usingDataBinder.Eval.

<ItemTemplate>
  <tr>
    <td><%# DataBinder.Eval(Container.DataItem,"field1") %></td>
    <td><%# DataBinder.Eval(Container.DataItem,"field2") %></td>
  </tr>
</ItemTemplate>

There are alternatives to using DataBinder.Eval in this scenario. The alternatives include the following:

Use explicit casting. Using explicit casting offers better performance by avoiding the cost of reflection. Cast the Container.DataItem as a DataRowView.

<%# ((DataRowView)Container.DataItem)["field1"] %>

You can gain even better performance with explicit casting if you use a DataReader to bind your control and use the specialized methods to retrieve your data. Cast theContainer.DataItem as a DbDataRecord.

<ItemTemplate>
  <tr>
     <td><%# ((DbDataRecord)Container.DataItem).GetString(0) %></td>
     <td><%# ((DbDataRecord)Container.DataItem).GetInt(1) %></td>
  </tr>
</ItemTemplate>

The explicit casting depends on the type of data source you are binding to; the preceding code illustrates an example.

Use the ItemDataBound event. If the record that is being data bound contains many fields, it may be more efficient to use the ItemDataBound event. By using this event, you only perform the type conversion once. The following sample uses a DataSet object.

protected void Repeater_ItemDataBound(Object sender, RepeaterItemEventArgs e)
{
  DataRowView drv = (DataRowView)e.Item.DataItem;
  Response.Write(string.Format("<td>{0}</td>",drv["field1"]));
  Response.Write(string.Format("<td>{0}</td>",drv["field2"]));
  Response.Write(string.Format("<td>{0}</td>",drv["field3"]));
  Response.Write(string.Format("<td>{0}</td>",drv["field4"]));
}
Filed under: ASP.NET No Comments
16Dec/10Off

Ajax: A New Approach to Web Applications

If anything about current interaction design can be called “glamorous,” it’s creating Web applications. After all, when was the last time you heard someone rave about the interaction design of a product that wasn’t on the Web? (Okay, besides the iPod.) All the cool, innovative new projects are online.

Despite this, Web interaction designers can’t help but feel a little envious of our colleagues who create desktop software. Desktop applications have a richness and responsiveness that has seemed out of reach on the Web. The same simplicity that enabled the Web’s rapid proliferation also creates a gap between the experiences we can provide and the experiences users can get from a desktop application.
That gap is closing. Take a look at Google Suggest. Watch the way the suggested terms update as you type, almost instantly. Now look at Google Maps. Zoom in. Use your cursor to grab the map and scroll around a bit. Again, everything happens almost instantly, with no waiting for pages to reload.

Google Suggest and Google Maps are two examples of a new approach to web applications that we at Adaptive Path have been calling Ajax. The name is shorthand for Asynchronous JavaScript + XML, and it represents a fundamental shift in what’s possible on the Web.
Defining Ajax
Ajax isn’t a technology. It’s really several technologies, each flourishing in its own right, coming together in powerful new ways. Ajax incorporates:

  • standards-based presentation using XHTML and CSS;
  • dynamic display and interaction using the Document Object Model;
  • data interchange and manipulation using XML and XSLT;
  • and JavaScript binding everything together.

The classic web application model works like this: Most user actions in the interface trigger an HTTP request back to a web server. The server does some processing — retrieving data, crunching numbers, talking to various legacy systems — and then returns an HTML page to the client. It’s a model adapted from the Web’s original use as a hypertext medium, but as fans of The Elements of User Experience know, what makes the Web good for hypertext doesn’t necessarily make it good for software applications.

This approach makes a lot of technical sense, but it doesn’t make for a great user experience. While the server is doing its thing, what’s the user doing? That’s right, waiting. And at every step in a task, the user waits some more.

Obviously, if we were designing the Web from scratch for applications, we wouldn’t make users wait around. Once an interface is loaded, why should the user interaction come to a halt every time the application needs something from the server? In fact, why should the user see the application go to the server at all?
How Ajax is Different
An Ajax application eliminates the start-stop-start-stop nature of interaction on the Web by introducing an intermediary — an Ajax engine — between the user and the server. It seems like adding a layer to the application would make it less responsive, but the opposite is true.

Instead of loading a webpage, at the start of the session, the browser loads an Ajax engine — written in JavaScript and usually tucked away in a hidden frame. This engine is responsible for both rendering the interface the user sees and communicating with the server on the user’s behalf. The Ajax engine allows the user’s interaction with the application to happen asynchronously — independent of communication with the server. So the user is never staring at a blank browser window and an hourglass icon, waiting around for the server to do something.

Every user action that normally would generate an HTTP request takes the form of a JavaScript call to the Ajax engine instead. Any response to a user action that doesn’t require a trip back to the server — such as simple data validation, editing data in memory, and even some navigation — the engine handles on its own. If the engine needs something from the server in order to respond — if it’s submitting data for processing, loading additional interface code, or retrieving new data — the engine makes those requests asynchronously, usually using XML, without stalling a user’s interaction with the application.

Who’s Using Ajax

Google is making a huge investment in developing the Ajax approach. All of the major products Google has introduced over the last year — Orkut, Gmail, the latest beta version of Google Groups, Google Suggest, and Google Maps — are Ajax applications. (For more on the technical nuts and bolts of these Ajax implementations, check out these excellent analyses of Gmail, Google Suggest, and Google Maps.) Others are following suit: many of the features that people love in Flickr depend on Ajax, and Amazon’s A9.com search engine applies similar techniques.

These projects demonstrate that Ajax is not only technically sound, but also practical for real-world applications. This isn’t another technology that only works in a laboratory. And Ajax applications can be any size, from the very simple, single-function Google Suggest to the very complex and sophisticated Google Maps.

At Adaptive Path, we’ve been doing our own work with Ajax over the last several months, and we’re realizing we’ve only scratched the surface of the rich interaction and responsiveness that Ajax applications can provide. Ajax is an important development for Web applications, and its importance is only going to grow. And because there are so many developers out there who already know how to use these technologies, we expect to see many more organizations following Google’s lead in reaping the competitive advantage Ajax provides.

Moving Forward

The biggest challenges in creating Ajax applications are not technical. The core Ajax technologies are mature, stable, and well understood. Instead, the challenges are for the designers of these applications: to forget what we think we know about the limitations of the Web, and begin to imagine a wider, richer range of possibilities.

It’s going to be fun.

Ajax Q&A

March 13, 2005: Since we first published Jesse’s essay, we’ve received an enormous amount of correspondence from readers with questions about Ajax. In this Q&A, Jesse responds to some of the most common queries.

Q. Did Adaptive Path invent Ajax? Did Google? Did Adaptive Path help build Google’s Ajax applications?

A. Neither Adaptive Path nor Google invented Ajax. Google’s recent products are simply the highest-profile examples of Ajax applications. Adaptive Path was not involved in the development of Google’s Ajax applications, but we have been doing Ajax work for some of our other clients.

Q. Is Adaptive Path selling Ajax components or trademarking the name? Where can I download it?

A. Ajax isn’t something you can download. It’s an approach — a way of thinking about the architecture of web applications using certain technologies. Neither the Ajax name nor the approach are proprietary to Adaptive Path.

Q. Is Ajax just another name for XMLHttpRequest?

A. No. XMLHttpRequest is only part of the Ajax equation. XMLHttpRequest is the technical component that makes the asynchronous server communication possible; Ajax is our name for the overall approach described in the article, which relies not only on XMLHttpRequest, but on CSS, DOM, and other technologies.

Q. Why did you feel the need to give this a name?

A. I needed something shorter than “Asynchronous JavaScript+CSS+DOM+XMLHttpRequest” to use when discussing this approach with clients.

Q. Techniques for asynchronous server communication have been around for years. What makes Ajax a “new” approach?

A. What’s new is the prominent use of these techniques in real-world applications to change the fundamental interaction model of the Web. Ajax is taking hold now because these technologies and the industry’s understanding of how to deploy them most effectively have taken time to develop.

Q. Is Ajax a technology platform or is it an architectural style?

A. It’s both. Ajax is a set of technologies being used together in a particular way.

Q. What kinds of applications is Ajax best suited for?

A. We don’t know yet. Because this is a relatively new approach, our understanding of where Ajax can best be applied is still in its infancy. Sometimes the traditional web application model is the most appropriate solution to a problem.

Q. Does this mean Adaptive Path is anti-Flash?

A. Not at all. Macromedia is an Adaptive Path client, and we’ve long been supporters of Flash technology. As Ajax matures, we expect that sometimes Ajax will be the better solution to a particular problem, and sometimes Flash will be the better solution. We’re also interested in exploring ways the technologies can be mixed (as in the case of Flickr, which uses both).

Q. Does Ajax have significant accessibility or browser compatibility limitations? Do Ajax applications break the back button? Is Ajax compatible with REST? Are there security considerations with Ajax development? Can Ajax applications be made to work for users who have JavaScript turned off?

A. The answer to all of these questions is “maybe”. Many developers are already working on ways to address these concerns. We think there’s more work to be done to determine all the limitations of Ajax, and we expect the Ajax development community to uncover more issues like these along the way.

Q. Some of the Google examples you cite don’t use XML at all. Do I have to use XML and/or XSLT in an Ajax application?

A. No. XML is the most fully-developed means of getting data in and out of an Ajax client, but there’s no reason you couldn’t accomplish the same effects using a technology like JavaScript Object Notation or any similar means of structuring data for interchange.

Q. Are Ajax applications easier to develop than traditional web applications?

A. Not necessarily. Ajax applications inevitably involve running complex JavaScript code on the client. Making that complex code efficient and bug-free is not a task to be taken lightly, and better development tools and frameworks will be needed to help us meet that challenge.

Q. Do Ajax applications always deliver a better experience than traditional web applications?

A. Not necessarily. Ajax gives interaction designers more flexibility. However, the more power we have, the more caution we must use in exercising it. We must be careful to use Ajax to enhance the user experience of our applications, not degrade it.

To get essays like this one delivered directly to your inbox, subscribe to our email newsletter.

Jesse James Garrett is the Director of User Experience Strategy and a founder of Adaptive Path. He is the author of the widely-referenced book The Elements of User Experience.

Other essays by Jesse James Garrett include The Nine Pillars of Successful Web Teams and Six Design Lessons From the Apple Store.

(Collected from Adaptive Path)

Filed under: AJAX, ASP.NET No Comments
24Jul/10Off

.NET Framework 3.5

.NET Framework (NetFx or Fx) version 3.5 has two elements to it that must be understood: the green bits and the red bits. The original references to this term are on old blog posts by Soma and Jason. Compared to those two blog entries I have the advantage of 13 months of hindsight :-) , so I will provide here the details behind those descriptions in my own words starting with my own slide:

.NET Framework 3.5

.NET Framework 3.5

When we say red bits, those are Framework bits that exist in RTM today i.e. NetFx v2.0 and NetFx v3.0.

NetFx v3.5 includes updates for those two existing frameworks. However, those updates are not a whole bunch of new features or changes, but in reality a service pack with predominantly bug fixes and perf improvements. So to revisit the terminology: Fx 3.5 includes v2.0 SP1 and v3.0 SP1. Like with all service packs, there should be nothing in there that could break your application. Having said that, if a bug is fixed in the SP and your code was taking advantage of that bug, then your code will break of course. To be absolutely clear, this is an in-place upgrade to v2 and v3, not a side-by-side story at the framework/clr level. I will not focus anymore on the Service Pack (red bits) improvements in Fx 3.5. If you are interested in that you may wish to read my previous blog posts here, here, here and here.

When we say green bits, we mean brand new assemblies with brand new types in them. These are simply adding to the .NET Framework (not changing or removing) just like Fx 3.0 was simply adding to v2.0 without changing existing assemblies and without changing the CLR engine. It is here where you find the brand new stuff to talk about. In Beta 1, the list of new assemblies (green bits) is:

1. System.Data.Linq.dll – The implementation for LINQ to SQL.

2. System.Xml.Linq.dll – The implementation for LINQ to XML.

3. System.AddIn.dll, System.AddIn.Contract.dll – New AddIn (plug-in) model.

4. System.Net.dll – Peer to Peer APIs.

5. System.DirectoryServices.AccountManagement.dll – Wrapper for Active Directory APIs.

6. System.Management.Instrumentation.dll – WMI 2.0 managed provider (combined with System.Management namespace in System.Core.dll).

7. System.WorkflowServices.dll and System.ServiceModel.Web.dll – WF and WCF enhancements (for more on WF + WCF in v3.5 follow links from here).

8. System.Web.Extensions.dll – The implementation for ASP.NET AJAX (for more web enhancements, follow links from here) plus also the implementation of Client Application Services and the three ASP.NET 3.5 controls.

9. System.Core.dll – In addition to the LINQ to Objects implementation, this assembly includes the following: HashSet, TimeZoneInfo, Pipes, ReaderWriteLockSlim, System.Security.*, System.Diagnostics.Eventing.* and System.Diagnostics.PerformanceData.

Collected from Daniel Moth blog

20Jul/10Off

UTF-8 Encoding

In working on a web-based application that needed to support Netscape Communicator 4.x+ and Microsoft Internet Explorer 5.x+, I discovered that the older versions of these browsers had poor support for UTF8 encoding. I needed to find a way to make form field entries URL-safe and also needed to support multiple languages. The JavaScript escape() function fixes ASCII characters that are not valid for use in URLs, but does not handle unicode characters well. To make matters worse, there were browser incompatibilities: using escape() in IE would generate a new string that looked like %unnnn, where n is a hexadecimal digit. The correct encoding should follow RFC 2279 and be a set of hexadecimal digit pairs like %nn%nn. Netscape 4 would just treat the characters as ASCII, which would result in lost accents and umlauts.

The encodeURIComponent() function introduced in IE5.5, Netscape 6, and Mozilla does exactly what is needed. However, since the function is unavailable in Netscape 4.x and IE5, a different solution is needed. All JavaScript strings are unicode, so I expected that it would be possible to properly encode them. Thankfully, someone saw my plea for help and sent me some helpful example code.

I have faced this problem. When using ajax jquery, even I have set the contentType is "application/json;charset=utf-8" it still not working for accented language (like Vietnamese). After searching, I have realized that we have to use this method encodeURIComponent() to encode unicode characters before submitting to server. So I took note it here with a hope that it is useful for ones who may face same problem as mine.

Collected from http://www.worldtimzone.com/res/encode

Filed under: AJAX, ASP.NET, JQuery No Comments
14May/10Off

Learning to Use Regular Expressions

Sob story: This page seems to be quite widely read, but only just occasionally gets donations on the above Paypal link. Tragedy of the commons and all that... but still, if any of you would like to donate a buck or two, I'd appreciate it?
Anyway, this tutorial was first published by IBM developerWorks. This version contains a few minor corrections that readers have suggested since the original publication. An expanded and updated version can be found in my book, Text Processing in Python

Matching Patterns in Text: The Basics


Character literals

/a/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

/Mary/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.
The very simplest pattern matched by a regular expression is a literal character or a sequence of literal characters. Anything in the target text that consists of exactly those characters in exactly the order listed will match. A lower case character is not identical with its upper case version, and vice versa. A space in a regular expression, by the way, matches a literal space in the target (this is unlike most programming languages or command-line tools, where spaces separate keywords).

"Escaped" characters literals

/.*/

Special characters must be escaped.*

/\.\*/
Special characters must be escaped.*
A number of characters have special meanings to regular expressions. A symbol with a special meaning can be matched, but to do so you must prefix it with the backslash character (this includes the backslash character itself: to match one backslash in the target, your regular expression should include "\\").

Positional special characters

/^Mary/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.

/Mary$/

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.
Two special characters are used in almost all regular expression tools to mark the beginning and end of a line: caret (^) and dollarsign ($). To match a caret or dollarsign as a literal character, you must escape it (i.e. precede it by a backslash "\").

An interesting thing about the caret and dollarsign is that they match zero-width patterns. That is the length of the string matched by a caret or dollarsign by itself is zero (but the rest of the regular expression can still depend on the zero-width match). Many regular expression tools provide another zero-width pattern for word-boundary (\b). Words might be divided by whitespace like spaces, tabs, newlines, or other characters like nulls; the word-boundary pattern matches the actual point where a word starts or ends, not the particular whitespace characters.

The "wildcard" character

/.a/ 

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.
In regular expressions, a period can stand for any character. Normally, the newline character is not included, but most tools have optional switches to force inclusion of the newline character also. Using a period in a pattern is a way of requiring that "something" occurs here, without having to decide what.

Users who are familiar with DOS command-line wildcards will know the question-mark as filling the role of "some character" in command masks. But in regular expressions, the question-mark has a different meaning, and the period is used as a wildcard.

Grouping regular expressions

/(Mary)( )(had)/ 

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.
A regular expression can have literal characters in it, and also zero-width positional patterns. Each literal character or positional pattern is an atom in a regular expression. You may also group several atoms together into a small regular expression that is part of a larger regular expression. One might be inclined to call such a grouping a "molecule," but normally it is also called an atom.

In older Unix-oriented tools like grep, subexpressions must be grouped with escaped parentheses, e.g. /\(Mary\)/. In Perl and most more recent tools (including egrep), grouping is done with bare parentheses, but matching a literal parenthesis requires escaping it in the pattern (the example to the side follows Perl).

Character classes

/[a-z]a/ 

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.
Rather than name only a single character, you can include a pattern in a regular expression that matches any of a set of characters.

A set of characters can be given as a simple list inside square brackets, e.g. /[aeiou]/ will match any single lowercase vowel. For letter or number ranges you may also use only the first and last letter of a range, with a dash in the middle, e.g. /[A-Ma-m]/ will match any lowercase or uppercase in the first half of the alphabet.

Many regular expression tools also provide escape-style shortcuts to the most commonly used character class, such as \s for a whitespace character and \d for a digit. You could always define these character classes with square brackets, but the shortcuts can make regular expressions more compact and more readable.

Complement operator

/[^a-z]a/ 

Mary had a little lamb.
And everywhere that Mary
went, the lamb was sure
to go.
The caret symbol can actually have two different meanings in regular expressions. Most of the time, it means to match the zero-length pattern for line beginnings. But if it is used at the beginning of a character class, it reverses the meaning of the character class. Everything not included in the listed character set is matched.

Alternation of patterns

/cat|dog|bird/

The pet store sold cats, dogs, and birds.

/=first|second=/

=first first= # =second second= # =first= # =second=

/(=)(first)|(second)(=)/

=first first= # =second second= # =first= # =second=

/=(first|second)=/

=first first= # =second second= # =first= # =second=
Using character classes is a way of indicating that either one thing or another thing can occur in a particular spot. But what if you want to specify that either of two whole subexpressions occur in a position in the regular expression? For that, you use the alternation operator, the vertical bar ("|"). This is the symbol that is also used to indicate a pipe in Unix/DOS shells, and is sometimes called the pipe character.

The pipe character in a regular expression indicates an alternation between everything in the group enclosing it. What this means is that even if there are several groups to the left and right of a pipe character, the alternation greedily asks for everything on both sides. To select the scope of the alternation, you must define a group that encompasses the patterns that may match. The example illustrates this.

The basic abstract quantifier

/@(=+=)*@/ 

Match with zero in the middle: @@
Subexpresion occurs, but...: @=+=ABC@
Lots of occurrences: @=+==+==+==+==+=@
Must repeat entire pattern: @=+==+=+==+=@
One of the most powerful and common things you can do with regular expressions is to specify how many times an atom occurs in a complete regular expression. Sometimes you want to specify something about the occurrence of a single character, but very often you are interested in specifying the occurrence of a character class or a grouped subexpression.

There is only one quantifier included with "basic" regular expression syntax, the asterisk ("*"); in English this has the meaning "some or none" or "zero or more." If you want to specify that any number of an atom may occur as part of a pattern, follow the atom by an asterisk.

Without quantifiers, grouping expressions doesn't really serve as much purpose, but once we can add a quantifier to a subexpression we can say something about the occurrence of the subexpression as a whole. Take a look at the example.

Matching Patterns in Text: Intermediate


More abstract quantifiers

/A+B*C?D/

AAAD
ABBBBCD
BBBCD
ABCCD
AAABBBC
In a certain way, the lack of any quantifier symbol after an atom quantifies the atom anyway: it says the atom occurs exactly once. Extended regular expressions (which most tools support) add a few other useful numbers to "once exactly" and "zero or more times." The plus-sign ("+") means "one or more times" and the question-mark ("?") means "zero or one times." These quantifiers are by far the most common enumerations you wind up naming.

If you think about it, you can see that the extended regular expressions do not actually let you "say" anything the basic ones do not. They just let you say it in a shorter and more readable way. For example, "(ABC)+" is equivalent to "(ABC)(ABC)*"; and "X(ABC)?Y" is equivalent to "XABCY|XY". If the atoms being quantified are themselves complicated grouped subexpressions, the question-mark and plus-sign can make things a lot shorter.

Numeric quantifiers

/a{5} b{,6} c{4,8}/

aaaaa bbbbb ccccc
aaa bbb ccc
aaaaa bbbbbbbbbbbbbb ccccc

/a+ b{3,} c?/

aaaaa bbbbb ccccc
aaa bbb ccc
aaaaa bbbbbbbbbbbbbb ccccc

/a{5} b{6,} c{4,8}/

aaaaa bbbbb ccccc
aaa bbb ccc
aaaaa bbbbbbbbbbbbbb ccccc
Using extended regular expressions, you can specify arbitrary pattern occurrence counts using a more verbose syntax than the question-mark, plus-sign, and asterisk quantifiers. The curly-braces ("{" and "}") can surround a precise count of how many occurrences you are looking for.

The most general form of the curly-brace quantification uses two range arguments (the first must be no larger than the second, and both must be non-negative integers). The occurrence count is specified this way to fall between the minimum and maximum indicated (inclusive). As shorthand, either argument may be left empty: if so the minimum/maximum is specified as zero/infinity, respectively. If only one argument is used (with no comma in there), exactly that number of occurrences are matched.

Backreferences

/(abc|xyz) \1/

jkl abc xyz
jkl xyz abc
jkl abc abc
jkl xyz xyz

/(abc|xyz) (abc|xyz)/

jkl abc xyz
jkl xyz abc
jkl abc abc
jkl xyz xyz
One powerful option in creating search patterns is specifying that a subexpression that was matched earlier in a regular expression is matched again later in the expression. We do this using backreferences. Backreferences are named by the numbers 1 through 9, preceded by the backslash/escape character when used in this manner. These backreferences refer to each successive group in the match pattern, as in /(one)(two)(three)/\1\2\3/. Each numbered backreference refers to the group that, in this example, has the word corresponding to the number.

It is important to note something the example illustrates. What gets matched by a backreference is the same literal string matched the first time, even if the pattern that matched the string could have matched other strings. Simply repeating the same grouped subexpression later in the regular expression does not match the same targets as using a backreference (but you have to decide what it is you actually want to match in either case).

Backreferences refer back to whatever occurred in the previous grouped expressions, in the order those grouped expressions occurred. Because of the naming convention (\1-\9), many tools limit you to nine backreferences. Some tools allow actual naming of backreferences and/or saving them to program variables. The more advanced parts of this tutorial touch on these topics

Don't match more than you want to

/th.*s/

-- I want to match the words that start
-- with 'th' and end with 's'.
this
thus
thistle
this line matches too much
Quantifiers in regular expressions are greedy. That is, they match as much as they possibly can.

Probably the easiest mistake to make in composing regular expressions is to match too much. When you use a quantifier, you want it to match everything (of the right sort) up to the point where you want to finish your match. But when using the "*", "+" or numeric quantifiers, it is easy to forget that the last bit you are looking for might occur later in a line than the one you are interested in.

Tricks for restraining matches

/th[^s]*./

-- I want to match the words that start
-- with 'th' and end with 's'.
this
thus
thistle
this line matches too much
Often if you find that your regular expressions are matching too much, a useful procedure is to reformulate the problem in your mind. Rather than thinking about "what am I trying to match later in the expression?" ask yourself "what do I need to avoid matching in the next part?" Often this leads to more parsimonious pattern matches. Often the way to avoid a pattern is to use the complement operator and a character class. Look at the example, and think about how it works.

The trick here is that there are two different ways of formulating almost the same sequence. You can either think you want to keep matching until you get to XYZ, or you can think you want to keep matching unless you get to XYZ. These are subtly different.

For people who have thought about basic probability, the same pattern occurs. The chance of rolling a 6 on a die in one roll is 1/6. What is the chance of rolling a 6 in six rolls? A naive calculation puts the odds at 1/6+1/6+1/6+1/6+1/6+1/6, or 100%. This is wrong, of course (after all, the chance after twelve rolls isn't 200%). The correct calculation is "how do I avoid rolling a 6 for six rolls?" -- i.e. 5/6*5/6*5/6*5/6*5/6*5/6, or about 33%. The chance of getting a 6 is the same chance as not avoiding it (or about 66%). In fact, if you imagine transcribing a series of dice rolls, you could apply a regular expression to the written record, and similar thinking applies.

Comments on modification tools

Not all tools that use regular expressions allow you to modify target strings. Some simply locate the matched pattern; the mostly widely used regular expression tool is probably grep, which is a tool for searching only. Text editors, for example, may or may not allow replacement in their regular expression search facility. As always, consult the documentation on your individual tool.

Of the tools that allow you to modify target text, there are a few differences to keep in mind. The way you actually specify replacements will vary between tools: a text editor might have a dialog box; command-line tools will use delimiters between match and replacement, programming languages will typically call functions with arguments for match and replacement patterns.

Another important difference to keep in mind is of what is getting modified. Unix-oriented command-line tools typically utilize pipes and STDOUT for changes to buffers, rather than modify files in-place. Using a sed command, for example, will write the modifications to the console, but will not change the original target file. Text editors or programming languages are more likely to actually modify a file in-place.

A note on modification examples

For purposes of this tutorial, examples will continue to use the sed style slash delimiters. Specifically, the examples will indicate the substitution command and the global modifier, as with "s/this/that/g". This expression means: "Replace the string 'this' with the string 'that' everywhere in the target text.

Examples will consist of the modification command, an input line, and an output line. The output line will have any changes emphasized. Also, each input/output line will be preceded by a less-than or greater-than symbol to help distinguish them (the order will be as described also), which is suggestive of redirection symbols in Unix shells.

A literal-string modification example

s/cat/dog/g 

< The zoo had wild dogs, bobcats, lions, and other wild cats.
> The zoo had wild dogs, bobdogs, lions, and other wild dogs.
Let us take a look at a couple modification examples that build on what we have already covered. This one simply substitutes some literal text for some other literal text. The search-and-replace capability of many tools can do this much, even without using regular expressions.

A pattern-match modification example

s/cat|dog/snake/g 

< The zoo had wild dogs, bobcats, lions, and other wild cats.
> The zoo had wild snakes, bobsnakes, lions, and other wild snakes.

s/[a-z]+i[a-z]*/nice/g 

< The zoo had wild dogs, bobcats, lions, and other wild cats.
> The zoo had nice dogs, bobcats, nice, and other nice cats.
Most of the time, if you are using regular expressions to modify a target text, you will want to match more general patterns than just literal strings. Whatever is matched is what gets replaced (even if it is several different strings in the target)

Modification using backreferences

s/([A-Z])([0-9]{2,4}) /\2:\1 /g 

< A37 B4 C107 D54112 E1103 XXX
> 37:A B4 107:C D54112 1103:E XXX
It is nice to be able to insert a fixed string everywhere a pattern occurs in a target text. But frankly, doing that is not very context sensitive. A lot of times, we do not want just to insert fixed strings, but rather to insert something that bears much more relation to the matched patterns. Fortunately, backreferences come to our rescue here. You can use backreferences in the pattern-matches themselves, but it is even more useful to be able to use them in replacement patterns. By using replacement backreferences, you can pick and choose from the matched patterns to use just the parts you are interested in.

To aid readability, subexpressions will be grouped with bare parentheses (as with Perl), rather than with escaped parentheses (as with sed).

Another warning on mismatching

This tutorial has already warned about the danger of matching too much with your regular expression patterns. But the danger is so much more serious when you do modifications, that it is worth repeating. If you replace a pattern that matches a larger string than you thought of when you composed the pattern, you have potentially deleted some important data from your target.

It is always a good idea to try out your regular expressions on diverse target data that is representative of your production usage. Make sure you are matching what you think you are matching. A stray quantifier or wildcard can make a surprisingly wide variety of texts match what you thought was a specific pattern. An sometimes you just have to stare at your pattern for a while, or find another set of eyes, to figure out what is really going on even after you see what matches. Familiarity might breed contempt, but it also instills competence.

Advanced Regular Expression Extensions


About advanced features

Some very useful enhancements are included in some regular expression tools. These enhancements often make the composition and maintenance of regular expression considerably easier. But check with your own tool to see what is supported.

The programming language Perl is probably the most sophisticated tool for regular expression processing, which explains much of its popularity. The examples illustrated will use Perl-ish code to explain concepts. Other programming languages, especially other scripting languages such as Python, have a similar range of enhancements. But for purposes of illustration, Perl's syntax most closely mirrors the regular expression tools it builds on, such as ed, ex, grep, sed, and awk.

Non-greedy quantifiers

/th.*s/

-- I want to match the words that start
-- with 'th' and end with 's'.
this line matches just right
this # thus # thistle

/th.*?s/

-- I want to match the words that start
-- with 'th' and end with 's'.
this # thus # thistle
this line matches just right

/th.*?s /

-- I want to match the words that start
-- with 'th' and end with 's'. (FINALLY!)
this # thus # thistle
this line matches just right
Earlier in the tutorial, the problems of matching too much were discussed, and some workarounds were suggested. Some regular expression tools are nice enough to make this easier by providing optional non-greedy quantifiers. These quantifier grab as little as possible while still matching whatever comes next in the pattern (instead of as much as possible).

Non-greedy quantifiers have the same syntax as regular greedy ones, except with the quantifier followed by a question-mark. For example, a non-greedy pattern might look like: "/A[A-Z]*?B/". In English, this means "match an A, followed by only as many capital letters as are needed to find a B."

One little thing to look out for is the fact that the pattern "/[A-Z]*?./" will always match zero capital letters. If you use non-greedy quantifiers, watch out for matching too little, which is a symmetric danger.

Pattern-match modifiers

/M.*[ise] /

MAINE # Massachusetts # Colorado #
mississippi # Missouri # Minnesota #

/M.*[ise] /i

MAINE # Massachusetts # Colorado #
mississippi # Missouri # Minnesota #

/M.*[ise] /gis

MAINE # Massachusetts # Colorado #
mississippi # Missouri # Minnesota #
We already saw one pattern-match modifier in the modification examples: the global modifier. In fact, in many regular expression tools, we should have been using the "g" modifier for all our pattern matches. Without the "g", many tools will match only the first occurrence of a pattern on a line in the target. So this is a useful modifier (but not one you necessarily want to use always). Let us look at some others.

As a little mnemonic, it is nice to remember the word "gismo" (it even seems somehow appropriate). The most frequent modifiers are:

  • g - Match globally
  • i - Case-insensitive match
  • s - Treat string as single line
  • m - Treat string as multiple lines
  • o - Only compile pattern once

The o option is an implementation optimization, and not really a regular expression issue (but it helps the mnemonic). The single-line option allows the wildcard to match a newline character (it won't otherwise). The multiple-line option causes "^" and "$" to match the begin and end of each line in the target, not just the begin/end of the target as a whole (with sed or grep this is the default). The insensitive option ignores differences between case of letters.

Changing backreference behavior

s/([A-Z])(?:-[a-z]{3}-)([0-9]*)/\1\2/g

< A-xyz-37 # B:abcd:142 # C-wxy-66 # D-qrs-93
> A37 # B:abcd:42 # C66 # D93
Backreferencing in replacement patterns is very powerful; but it is also easy to use more than nine groups in a complex regular expression. Quite apart from using up the available backreference names, it is often more legible to refer to the parts of a replacement pattern in sequential order. To handle this issue, some regular expression tools allow "grouping without backreferencing."

A group that should not also be treated as a back reference has a question-mark colon at the beginning of the group, as in "(?:pattern)." In fact, you can use this syntax even when your backreferences are in the search pattern itself.

Naming backreferences


import re
txt = "A-xyz-37 # B:abcd:142 # C-wxy-66 # D-qrs-93"
print re.sub("(?P<prefix>[A-Z])(-[a-z]{3}-)(?P<id>[0-9]*)",
             "\g<prefix>\g<id>", txt) 

A37 # B:abcd:42 # C66 # D93
The language Python offers a particularly handy syntax for really complex pattern backreferences. Rather than just play with the numbering of matched groups, you can give them a name.

The syntax of using regular expressions in Python is a standard programming language function/method style of call, rather than Perl- or sed-style slash delimiters. Check your own tool to see if it supports this facility.

Lookahead assertions

s/([A-Z]-)(?=[a-z]{3})([a-z0-9]* )/\2\1/g

< A-xyz37 # B-ab6142 # C-Wxy66 # D-qrs93
> xyz37A- # B-ab6142 # C-Wxy66 # qrs93D-

s/([A-Z]-)(?![a-z]{3})([a-z0-9]* )/\2\1/g

< A-xyz37 # B-ab6142 # C-Wxy66 # D-qrs93
> A-xyz37 # ab6142B- # Wxy66C- # D-qrs93
Another trick of advanced regular expression tools is "lookahead assertions." These are similar to regular grouped subexpression, except they do not actually grab what they match. There are two advantages to using lookahead assertions. On the one hand, a lookahead assertion can function in a similar way to a group that is not backreferenced; that is, you can match something without counting it in backreferences. More significantly, however, a lookahead assertion can specify that the next chunk of a pattern has a certain form, but let a different subexpression actually grab it (usually for purposes of backreferencing that other subexpression.

There are two kinds of lookahead assertions: positive and negative. As you would expect, a positive assertion specifies that something does come next, and a negative one specifies that something does not come next. Emphasizing their connection with non-backreferenced groups, the syntax for lookahead assertions is similar: (?=pattern) for positive assertions, and (?!pattern) for negative assertions.

Making regular expressions more readable


/               # identify URLs within a text file
          [^="] # do not match URLs in IMG tags like:
                # <img src="http://mysite.com/mypic.png">
http|ftp|gopher # make sure we find a resource type
          :\/\/ # ...needs to be followed by colon-slash-slash
      [^ \n\r]+ # stuff other than space, newline, tab is in URL
    (?=[\s\.,]) # assert: followed by whitespace/period/comma
/

The URL for my site is: http://mysite.com/mydoc.html.  You
might also enjoy ftp://yoursite.com/index.html for a good
place to download files.
In the later examples we have started to see just how complicated regular expressions can get. These examples are not the half of it. It is possible to do some almost absurdly difficult-to-understand things with regular expression (but ones that are nonetheless useful).

There are two basic facilities that some of the more advanced regular expression tools use in clarifying expressions. One is allowing regular expressions to continue over multiple lines (by ignoring whitespace like trailing spaces and newlines). The second is allowing comments within regular expressions. Some tools allow you to do one or another of these things alone, but when it gets complicated, do both!

The example given uses Perl's extend modifier to enable commented multi-line regular expressions. Consult the documentation for your own tool for details on how to compose these.

From http://gnosis.cx/publish/programming/regular_expressions.html

13May/10Off

Rendering a control as an Html String in ASP.NET 2.0

It sounded simple enough. I wanted to programmatically take the contents of a form, including TextBoxes, drop down lists, checkboxes, etc and email it to my client as a sort of “snapshot” of what the user had filled out on the form.  I had done this countless times in ASP.NET 1.1, but 2.0 refused to budge.  I kept getting error after error.  Here is the basic code I was using.  Since I seem to have a mental blocks on streams, textwriters and any other part of the framework that uses the Decorator pattern I had to look it up:

StringWriter swriter = new StringWriter();

HtmlTextWriter twrite = new HtmlTextWriter(swriter);

this.MainPanel.RenderControl(twrite);

string text = swriter.ToString();

This gave me the following exception: “Control 'TextBox1' of type 'TextBox' must be placed inside a form tag with runat=server.”  I remembered this from my .NET 1.1, and my solution had been to create a server side form on the fly, in the code behind it, place my controls that I wanted to render into that, then render the form.  The code would then look like this:

StringWriter swriter = new StringWriter();

HtmlTextWriter twrite = new HtmlTextWriter(swriter);

HtmlForm form = new HtmlForm();

form.Controls.Add(this.MainPanel);

form.RenderControl(twrite);

string text = swriter.ToString();

Unfortunately, this code didn’t work either.  I got another exception: “HtmlForm cannot render without a reference to the Page instance.  Make sure your form has been added to the control tree.”  This left me puzzled.  I did NOT remember this from asp.net 1.1.  What does the framework care if I create an Html form without binding it to the page?  What’s changed from asp.net 1.1 to 2.0?

Still I tried to give it what it wanted with this line of code:

form.Page = this.Page;

Still, ASP.NET was not satisfied.  I now got another error: “RegisterForEventValidation can only be called during Render();”  By now I was cursing Microsoft and spouting obscenities to myself.  Fortunately, only one other developer named Tina was around and she’s used to me by now.
I searched over and over in google for a way of successfully rendering a control as an html string which showed me a bunch of code that worked in 1.1.  Finally, after all my research, I found the answer.  In ASP.NET 2.0, you must do two things.  First override the VerifyRenderingInServerForm() method.  You don’t have to actually do anything in this method, just override it.  This ensures that you don’t have to create the HtmlForm, but you do have to override this method in your page.  Next, you have to set a page directive: EnableEventValidation="false".  This turns off event validation which basically means ASP.NET no longer checks to see if content in your controls contains malicious code.  There is a vague MSDN article here.

I don’t know why so much changed since ASP.NET 1.1 that the old code didn’t work.  But my real question to Microsoft is this: Why doesn’t EVERY control have a RenderAsString() method that will automatically do this for me?  Why do I have to jump through all these hoops to get a string from my controls?  This seems like very common functionality.  In any case, these two things got it to work, so you can take a control and render it as a string and email it, save it to disk or whatever.  All the research has been done for you, and here is my final example code.  This example takes the resulting string and stores it to a file on disk, but you could email it, save it to a database or whatever you wanted.

Default.aspx

<%@ Page Language="C#" AutoEventWireup="true"  CodeFile="Default.aspx.cs" Inherits="_Default" EnableEventValidation="false"  %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >

<head runat="server">

<title>Untitled Page</title>

</head>

<body>

<form id="form1" runat="server">

<div>

<asp:Panel runat="server" ID="MainPanel">

<asp:TextBox ID="TextBox1" runat="server"></asp:TextBox>

<asp:Button ID="SendButton" runat="server" OnClick="SendButton_Click" Text="Send" /></asp:Panel>

</div>

</form>

</body>

</html>

Default.aspx.cs

using System;

using System.Data;

using System.Configuration;

using System.Web;

using System.Web.Security;

using System.Web.UI;

using System.Web.UI.WebControls;

using System.Web.UI.WebControls.WebParts;

using System.Web.UI.HtmlControls;

using System.Text;

using System.IO;

public partial class _Default : System.Web.UI.Page

{

protected void Page_Load(object sender, EventArgs e)

{

}

protected void SendButton_Click(object sender, EventArgs e)

{

StringWriter swriter = new StringWriter();

HtmlTextWriter twrite = new HtmlTextWriter(swriter);

this.MainPanel.RenderControl(twrite);

string text = swriter.ToString();

File.WriteAllText(Server.MapPath("~/output.htm"), text);

}

public override void VerifyRenderingInServerForm(Control control)

{

}

}

Filed under: ASP.NET No Comments