A while ago I needed to use AWS S3 (the Amazon’s cloud-based file storage) to store some files and then download them or get their listings through C#. As ironic as it sounds I noticed that there was no .NET implementation nor a documentation for S3 so I decided to create a file repository in C# which lets .NET developers access S3 programmatically.
Here is a rundown as to how you would work with AWS S3 through C#:
In order to use any APIs of Amazon Web Services (AWS) you will have to add the nugget package that is provided by Amazon. Simply bring up the Nuget Package Manager window and search for the keyword AWS. The first item in the search result is most likely AWS SDK for .NET which must be installed before you can access S3.
Once the SDK is installed we will have to find the properties of our S3 bucket and place it somewhere in web.config (or app.config) file. Normally these three properties of the S3 bucket is required in order to access it securely:
Region end point
These details will be provided to you by your cloud administrator. Here is a list of region end points that you can place in your configuration file (e.g. us-west-1)
US Standard *
You can use one of the following two endpoints:
s3.amazonaws.com (Northern Virginia or Pacific Northwest)
s3-external-1.amazonaws.com (Northern Virginia only)
HTTP and HTTPS
US West (Oregon) region
HTTP and HTTPS
US West (N. California) region
HTTP and HTTPS
EU (Ireland) region
EU or eu-west-1
HTTP and HTTPS
EU (Frankfurt) region
HTTP and HTTPS
Asia Pacific (Singapore) region
HTTP and HTTPS
Asia Pacific (Sydney) region
HTTP and HTTPS
Asia Pacific (Tokyo) region
HTTP and HTTPS
South America (Sao Paulo) region
HTTP and HTTPS
In order to avoid adding the secret key, access key and region endpoint to the <appSettings> part of your configuration file and to make this tool more organised I have created a configuration class for it. This configuration class will let you access the <configurationSection> element that is related to S3. To configure your app.config (or web.config) files you will have to add these <sectionGroup> and <section> elements to your configuration file:
The S3FileRepositoryConfig class is inherited from ConfigurationSection class and has properties that map to some configuration elements of your .config file. A sample configuration for S3 is like this:
Note that <AspGuy> comes from the name property of <sectionGroup name=”AspGuy”> element. Also <S3Repository> tag comes from the name of <section> element. Each property of S3FileRepositoryConfig is mapped to an attribute of <S3Repository> element.
Apart from SecretKey, AccessKey andBucketName you can specify a root directory name as well. This setting is there so you can begin accessing the S3 bucket from a specific folder rather than from its root, and obviously this setting is optional. For example imagine there is a bucket with the given folder structure:
If you set the RootDir property to “” then when you call the GetSubDir methods of the S3 file repository if will return “Dir1” because Dir1 is the only top-level folder in the bucket. If you set the RootDir property to “Dir1” and then call the GetSubDirs method you will get two entries which are “Dir1_1” and “Dir1_2”.
Here is the code of the configuration class mentioned above:
For the repository class I have created an interface because of removing the dependency of clients (e.g. a web service that may need to use with various file storages) on S3. This will let you add your implementation of file system, FTP and other file storage types and use then through dependency injection. Here is the code of this interface:
In this interface:
Download: Downloads a file hosted on S3 to disk.
ChangeDir: Changes the current directory/folder to the given directory. If the new directory (relativePath parameter) starts with / then the path will be representing an absolute path (starting from the RootDir) otherwise it will be a relative path and will start from the current directory/folder.
GetFileNames: Retrieves the file names of the current folder
GetSubDirNames: Retrieves the name of folders in the current folder
AddFile: Uploads a file to S3
FileExists: Checks to see if a file is already on S3
DeleteFile: Deletes the file from S3
The implementation of these method are quiet simple using AWS SDK for .NET. The only tricky part is that S3 does not support folders. In fact in S3 everything is a key-value pair and the structure of entries is totally flat. What we do however is to use the forward slash character to represent folders and then we use this character as a delimiter to emulate a folder structure.
You can clone the repository of this code which is on GitHub to play with the code. Feel free to send a pull request if you want to improve the code. The GitHub repository is located at https://github.com/aussiearef/S3FileRepository
If you are building public facing web sites one of the things you want to achieve at the end of the project is a good performance under the load for your web site. That means, you have to make sure that your product works under a heavy load (e.g. 50 concurrent users, or 200 users per second etc.) even though at the moment you don’t think you would have that much load. Chances are that the web site attracts more and more users over time and then if it’s not a load tolerant web site it will start flaking, leaving you with an unhappy customer and ruined reputation.
There are many articles on the Internet about improving the performance of ASP.NET web sites, which all make sense; however, I think there are some more things you can do to save yourself from facing massive dramas. So what steps can be taken to produce a high-performance ASP.NET or ASP.NET MVC application?
Load test your application from early stages
Majority of developers tend to leave performing the load test (if they ever do it) to when the application is developed and has passed the integration and regression tests. Even though performing a load test at the end of the development process is better than not doing it at all, it might be way too late to fix the performance issues once your code has been already been written. A very common example of this issue is that when the application does not respond properly under load, scaling out (adding more servers) is considered. Sometimes this is not possible simply because the code is not suitable for achieving it. Like when the objects that are stored in Session are not serializable, and so adding more web nodes or more worker processes are impossible. If you find out that your application may require to be deployed on more than one server at the early stages of development, you will do your tests in an environment which is close to your final environment in terms of configuration and number of servers etc., then your code will be adapted a lot easier.
Use the high-performance libraries
Recently I was diagnosing the performance issues of a web site and I came across a hot spot in the code where JSON messages coming from a third-party web service had to be serialized several times. Those JSON messages were de-serialized by Newtonsoft.Json and tuned out that Newtonsoft.Json was not the fastest library when it came to de-serialization. Then we replaced Json.Net with a faster library (e.g. ServiceStack) and got a much better result.
Again if the load test was done at an early stage when we picked Json.Net as our serialization library we would have find that performance issue a lot sooner and would not have to make so many changes in the code, and would not have to re-test it entirely again.
Is your application CPU-intensive or IO-intensive?
Before you start implementing your web site and when the project is designed, one thing you should think about is whether your site is a CPU-intensive or IO-intensive? This is important to know your strategy of scaling your product.
For example if your application is CPU-intensive you may want to use a synchronous pattern, parallel processing and so forth whereas for a product that has many IO-bound operations such as communicating with external web services or network resources (e.g. a database) Task-based asynchronous pattern might be more helpful to scale out your product. Plus you may want to have a centralized caching system in place which will let you create Web Gardens and Web Farms in future, thus spanning the load across multiple worker processes or serves.
Use Task-based Asynchronous Model, but with care!
If your product relies many IO-bound operations, or includes long-running operations which may make the expensive IIS threads wait for an operation to complete, you better think of using the Task-based Asynchronous Pattern for your ASP.NET MVC project.
There are many tutorials on the Internet about asynchronous ASP.NET MVC actions (like this one) so in this blog post I refrain from explaining it. However, I just have to point out that traditional synchronous Actions in an ASP.NET (MVC) site keep the IIS threads busy until your operation is done or the request is processed. This means that if the site is waiting for an external resource (e.g. web service) to respond, the thread will be busy. The number of threads in .NET’s thread pool that can be used to process the requests are limited too, therefore, it’s important to release the threads as soon as possible. A task-based asynchronous action or method releases the thread until the request is processed, then grabs a new thread from the thread pool and uses it to return the result of the action. This way, many requests can be processed by few threads, which will lead to better responsiveness for your application.
Although task-based asynchronous pattern can be very handy for the right applications, it must be used with care. There are a few of concerns that you must have when you design or implement a project based on Task-based Asynchronous Pattern (TAP). You can see many of them in here, however, the biggest challenge that developers may face when using async and await keywords is to know that in this context they have to deal with threads slightly differently. For example, you can create a method that returns a Task (e.g. Task<Product>). Normally you can call .Run() method on that task or you can merely call task.Result to force running the task and then fetching the result. In a method or action which is built based on TBP, any of those calls will block your running thread, and will make your program sluggish or even may cause dead-locks.
Distribute caching and session state
It’s very common that developers build a web application on a single development machine and assume that the product will be running on a single server too, whereas it’s not usually the case for big public facing web sites. They often get deployed to more than one server which are behind a load balancer. Even though you can still deploy a web site with In-Proc caching on multiple servers using sticky session (where the load balancer directs all requests that belong to the same session to a single server), you may have to keep multiple copies of session data and cached data. For example if you deploy your product on a web farm made of four servers and you keep the session data in-proc, when a request comes through the chance of hitting a server that already contains a cached data is 1 in 4 or 25%, whereas if you use a centralized caching mechanism in place, the chance of finding a cached item for every request if 100%. This is crucial for web sites that heavily rely on cached data.
Another advantage of having a centralized caching mechanism (using something like App Fabric or Redis) is the ability to implement a proactive caching system around the actual product. A proactive caching mechanism may be used to pre-load the most popular items into the cache before they are even requested by a client. This may help with massively improving the performance of a big data driven application, if you manage to keep the cache synchronized with the actual data source.
Create Web Gardens
As it was mentioned before, in an IO-bound web application that involves quite a few long-running operations (e.g. web service calls) you may want to free up your main thread as much as possible. By default every web site is run under one main thread which is responsible to keep your web site alive, and unfortunately when it’s too busy, your site becomes unresponsive. There is one way of adding more “main threads” to your application which is achievable by adding more worker processes to your site under IIS. Each worker process will include a separate main thread therefore if one is busy there will be another one to process the upcoming processes.
Having more than one worker process will turn your site to a Web Garden, which requires your Session and Application data be persisted out-proc (e.g. on a state server or Sql Server).
Use caching and lazy loading in a smart way
There is no need to emphasize that if you cache a commonly accessed bit of data in memory you will be able to reduce the database and web service calls. This will specifically help with IO-bound applications that as I said before, may cause a lot of grief when the site is under load.
Another approach for improving the responsiveness of your site is using Lazy Loading. Lazy Loading means that an application does not have a certain piece of data, but it knows that where is that data. For example if there is a drop-down control on your web page which is meant to display list of products, you don’t have to load all products from the database once the page is loaded. You can add a jQuery function to your page which can populate the drop-down list the first time it’s pulled down. You can also apply the same technique in many places in your code, such as when you work with Linq queries and CLR collections.
Do not put C# code in your MVC views
Your ASP.NET MVC views get compiled at run time and not at compile time. Therefore if you include too much C# code in them, your code will not be compiled and placed in DLL files. Not only this will damage the testability of your software but also it will make your site slower because every view will take longer to get display (because they must be compiled). Another down side of adding code to the views is that they cannot be run asynchronously and so if you decide to build your site based on Task-based Asynchronous Pattern (TAP), you won’t be able to take advantage of asynchronous methods and actions in the views.
For example if there is a method like this in your code:
public async Task<string> GetName(int code)
var result = …
return await result;
This method can be run asynchronously in the context of an asynchronous ASP.NET MVC action like this:
public Task<ActionResult> Index(CancellationToken ctx)
var name = await GetName(100);
But if you call this method in a view, because the view is not asynchronous you will have to run it in a thread-blocking way like this:
var name = GetName(100).Result;
.Result will block the running thread until GetName() processes our request and so the execution of the app will halt for a while, whereas when this code is called using await keyword the thread is not blocked.
Use Fire & Forget when applicable
If two or more operations are not forming a single transaction you probably do not have to run them sequentially. For example if users can sign-up and create an account in your web site, and once they register you save their details in the database and then you send them an email, you don’t have to wait for the email to be sent to finalize the operation.
In such a case the best way of doing so is probably starting a new thread and making it send the email to the user and just get back to the main thread. This is called a fire and forgets mechanism which can improve the responsiveness of an application.
Build for x64 CPU
32-bit applications are limited to a lower amount of memory and have access to fewer calculation features/instructions of the CPU. To overcome these limitations, if your server is a 64-bit one, make sure your site is running under 64-bit mode (by making sure the option for running a site under 32-bit mode in IIS is not enabled). Then compile and build your code for x64 CPU rather than Any CPU.
One example of x64 being helpful is that to improve the responsiveness and performance of a data-driven application, having a good caching mechanism in place is a must. In-proc caching is a memory consuming option because everything is stored in the memory boundaries of the site’s application pool. For a x86 process, the amount of memory that can be allocated is limited to 4 GB and so if loads of data be added to the cache, soon this limit will be met. If the same site is built explicitly for a x64 CPU, this memory limit will be removed and so more items can be added to the cache thus less communication with the database which leads to a better performance.
Use monitoring and diagnostic tools on the server
There might be many performance issues that you never see them by naked eyes because they never appear in error logs. Identifying performance issues are even more daunting when the application is already on the production servers where you have almost no chance of debugging.
To find out the slow processes, thread blocks, hangs, and errors and so forth it’s highly recommended to install a monitoring and/or diagnostic tool on the server and get them to track and monitor your application constantly. I personally have used NewRelic (which is a SAS) to check the health of our online sites. See HERE for more details and for creating your free account.
Profile your running application
Once you finish the development of your site, deploy it to IIS, and then attach a profiler (e.g. Visual Studio Profiler) and take snapshots of various parts of the application. For example take a snapshot of purchase operation or user sign-up operation etc. Then check and see if there is any slow or blocking code there. Finding those hot spots at early stages might save you a great amount of time, reputation and money.
There are scenarios in which an API in a WEB API application needs to return a formatted HTML rather than a JSON message. For example we worked on a project where APIs are used to perform some search and return the result as JSON or XML while a few of them had to return HTML to be used by an Android app (in a webview container).
One solution would be breaking down the controller into two: one inherited from MVCController and the other one derived from ApiController. However since those APIs are in a same category in terms of functionality I would keep them in the same controller.
Moreover, using ApiController and returning HttpResponseMessage lets us to modify the details of the implementation in future without having to change the return type (e.g. from ActionResult to HttpResponseMessage) and also would be easier for us in future to upgrade to Web API 2.
The advent of IHttpActionResult In Web API 2 allows developers to return custom data. In case you are not using ASP.NET MVC 5 yet or you are after an easier way keep reading!
To parse and return a Razor view in a WEB API project, simply add some views to your application just like when you do it for a normal ASP.NET MVC project. Then through Nuget, find and add RazorEngine which is a cool tool to read and parse Razor views.
Inside the api simply create an object to act as a model, load the content of the view as a text data, pass the view’s body and the model to RazorEngine and get a parsed version of the view. Since the api is meant to return HTML, the content type must be set to text/html.
In this example the view has a markup like the one given below:
As it’s seen, the model is bound to type “dynamic” which let’s the view accept a wide range of types. You can move the code from your API to a helper class (or anything similar) and create a function which accepts a view name, a model and then returns a rendered HTML.
Have you ever thought that how does Google perform a fast search on a wide variety of file types? For example, how Google is able to suggest you a list of search expressions while you are typing in your keywords?
Another example is Google image search: You upload an image and Google finds the similar photos for you in no time.
The key to this magic is SimHash. SimHash is a mechanism/algorithm invented by Charikar (see the full patent). SimHash comes from the combination of Similarity and Hash. This means that instead of comparing the objects with each other to find the similarity, we convert them to an N-bit number that represents the object (known as Hash) and compare them. In the other words, if instead of the object, we maintain a number that represents the object, so that we will be able to compare those numbers to find the similarity of the two objects.
The basics of SimHash are as below:
Convert the object to a hash value. (From my experience, this is better to be an unsigned integer number).
Count the number of matching bits. For example, are the bit 1 in two hash values the same? Are the 2nd bits the same?
Depending on the size of the hash value (number of bits), you will have a number between 0 and N, where N is the length of the hash value. This is called the Hamming Distance. Hamming distance was introduced by Richard Hamming in 1950 (see here).
The number you have achieved must be normalized and finally be represented in a value such as percentage. To do so, we can use a simple formula such as Similarity = (HashSize-HamminDistance)/HashSize.
Since the hash value can be used to represent any kind of data, such as a text or an image file, it can be used to perform a fast search on almost any file type.
To calculate the hash value, we have to decide the hash size, which normally is 32 or 64. As I said before, an unsigned value works better. We also need to choose a chunk size. The chunk size will be used to break the data down to small pieces, called shingles. For example, if we decide to convert a string such as “Hello World” to a hash value, If the chunk size is 3, our chunks would be:
To convert a binary data to a hash value, you will have to break it down to chunks of bits. E.g. you have to pick every K bits. Google says that N=64 and K=3 are recommended.
To calculate the hash value in a SimHash manner, we have to take the following steps:
Tokenize the data. To tokenize the data, we will have to break it down to small chunks as mentioned above and store the chunks in an array.
Create an array (called a vector) of size N, where N is the size of the hash (let’s call this array V).
Loop over the array of tokens (assume that i is the index of each token),
Loop over the bits of each token (assume that j is the index of each bit),
If Bit[j] of Token[i] is 1, then add 1 to V[j] otherwise subsidize 1 from V[j]
Assume that the fingerprint is an unsigned value (32 or 64 bit) and is named F.
Once the loops finish, go through the array V, and if V[i] is greater than 0, set bit i in F to 1 otherwise to 0.
Return F as the fingerprint.
Here is the code:
private int DoCalculateSimHash(string input)
ITokeniser tokeniser = new Tokeniser();
var hashedtokens = DoHashTokens(tokeniser.Tokenise(input));
var vector = new int[HashSize];
for (var i = 0; i < HashSize; i++)
vector[i] = 0;
foreach (var value in hashedtokens)
for (var j = 0; j < HashSize; j++)
if (IsBitSet(value, j))
vector[j] += 1;
vector[j] -= 1;
var fingerprint = 0;
for (var i = 0; i < HashSize; i++)
if (vector[i] > 0)
fingerprint += 1 << i;
And the code to calculate the hamming distance is as below:
private static int GetHammingDistance(int firstValue, int secondValue)
var hammingBits = firstValue ^ secondValue;
var hammingValue = 0;
for (int i = 0; i < 32; i++)
if (IsBitSet(hammingBits, i))
hammingValue += 1;
You may use different ways to tokenize a given data. For example you may break a string value down to words, or to n-letter words, or to n-letter overlapping pieces. From my experience, if we assume that N is the size of each chunk and M is the number of overlapping characters, N=4 and M=3 are the best choices.
You may download the full source code of SimHash from SimHash.CodePlex.com. Bear in mind that SimHash is patented by Google!
I recently was involved in designing enterprise software which contained several ASP.NET web sites and desktop applications. Since same users would use this software system a single-sign-on feature was necessary. Thus I took advantage of Microsoft Identity Foundation which is based on Claim Based Authentication. Also as the applications would be used by internal users, I used Active Directory Federation for authenticating users against an existing Active Directory.
Later I though what would be the solution if Claim Based Authentication did not fit in the solution? So I decided to design and implement a simple single-sign-on (SSO) authentication with WCF. This SSO has the following specifications:
Only performs Authentication. It does not contain any role management capability. The reason is that roles and access rights are defined in the scope of each application (or sub-system) so it is left to be done by applications.
It is based on Web Services so it can be consumed any technology that understands Web Services (e.g. Java apps.)
It can support many kind of user storage, such as Active Directory, ASP.NET Application Services (ASPNETDB) , Custom Authentication etc.
It can be used by Web, Desktop and Mobile applications.
Figure 1, Components of SSO
As seen in figure 1, user information can be retrieved from Active Directory, ASP.NET Application Services, Custom Database or 3rd Party Web Services. Components that encapsulate the details of each user storage or services are called Federations. This term is used by Microsoft in Windows Identity Foundation so we keep using that!
The Single Sign-On Service relies on Federation Services. Each federation service is simply a .NET Class Library that contains a class which implements an interface which is common between itself and SSO service. A federation module is plugged into SSO service via a configuration file. The Visual Studio solution that is downloadable at the bottom of this post includes a custom federation module that uses SQL Server to store the user information.
Client applications can consume the Web Service directly to perform sign-in, sign-out, authentication and other related operations. However, this solution includes a custom ASP.NET membership provider which allows ASP.NET applications consume SSO with no hassle. It also enables the existing ASP.NET applications to use this SSO service with a small configuration change.
Figure 2, Package diagram of SSO
Classes and Interfaces
The key classes and interfaces are as below:
ISSOFederation Interface: is implemented by Federation classes.
AuthenticatedUser: Is used by the federation classes and SSO service to represent a user. This type is also emitted by the service to the clients.
Figure 3, Key types of Common package
CustomFederation: Implements ISSOFederation and encapsulates the details of authentication and user storage.
SSOFederationHelper (SignInServices package): Provides a service to load the nominated federation service. Only one instance of the federation object exists (Singleton) so to plug another federation module the service must be restarted (e.g. restart the ISS web site).
Figure 4, SSOFederationHelper class
A federation object is plugged to the service using reflection. To do so, first the Fully Qualified Name of the federation type is placed in the Web.Config file:
SSOMembershipProvider (SSOClientServices package) is also a custom Asp.net membership provider. This class enables the ASP.NET applications take advantage of the SSOService without being dependent on it. The key point here is that all the ASP.NET applications that take advantage of SSO service must give a same name to the Asp.net membership cookie. Example:
The custom membership provider has a proprietary app.config file. This configuration file includes the WCF client configuration (e.g. binding configuration). However, the URI of the service is configured in the Web.config file of the ASP.NET client. For example, an ASP.NET client configures the membership providers as below:
In order to deploy the solution take the following actions:
Restore the SQL Server 2008 Database in a SQL Server 2008 Server under the name of Framework
Create a Windows login in SQL Server 2008 and grant access to the restored database (e.g. Domain\SSOUser).
Create an application pool that works with .NET 4 and uses “Integrated” mode. Name this Application Pool as SSO.
Set the identity account of the newly created application pool. This account must be equal to the Windows account that you added to SQL Server (e.g. Domain\SSOUser). This is required because the existing custom federation project is using Windows Authentication to access database.
Open the solution file in Visual Studio 2010
Publish SignInServices application to a folder
Go to the folder
Open web.config file and configure the following entries:
Under <appSettings> set SessionTimeOutMinutes. This value indicates that how long a sign-in ticket is valid.
Under <appSettings> set federationType to any other federation type that you may want to use. If you want to use the custom federation type shipped with this sample, leave the current value as is.
Under <connectionStrings> update the connection strings either if you have restored the database under any name other than “Framework” or you want to use SQL Server authentication rather Windows authentication.
Build SignInService.CustomFederation project and copy the .DLL file to the \BIN folder of SignInService
Build SSOClientServices project and copy the .DLL file to the \BIN folder of SignInService
Go back to IIS
Under IIS create a new Web Application that uses SSO as its Application Pool and points to the folder to which you published the SignInService application.
Publish TestWebSite to IIS or simply run it in VS 2010.
Configuring the custom federation
The custom federation uses SMTP to send a new password to users once a password is requested to reset. The configuration of SMTP server is in Web.Config file of the WCF application (SignInSerivice). You must configure SMTP in order to reset passwords.
(the admin page must be completed as I focused on developing the SSO service rather than implementing the admin web site)
Important notice: None of the applications or the WCF service is protected for simplicity purposes. If you deploy this solution to a production environment, you must protect them using ASP.NET authentication or WCF security practices.
Important notice: Once you launch the TestWebSite, you’ll see a login screen. Enter the following default credential to enter:
After working as a senior designer and a software architect in three sub-continents and four countries, I have come across to a phenomenon in Australia! I call it a phenomenon because first of all, terms such as ‘solution architect’, ‘software architect’ and/or ‘enterprise architect’ are used interchangeably, and even sometimes incorrectly. Secondly and surprisingly, the architecture tends to be done by made-up methods and the documentation is either non-existent or is written in a language that is not standard.
This leaves the project owners with a whole bunch of documents which are not understandable to them so that they have to hand them over to a development team without even knowing if the design is what they really wanted.
I believe this happens probably because such a crucial task is given to a senior developer or who has a pure technical mindset, whilst an Architect must be able to look at the problem from various aspects .
What is Architecture?
Architecture is the fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution (IEEE 1471).
The definition suggested by IEEE (above) refers to a solution architect and/or software architect. However, as Microsoft suggests there are other kinds of architects such as a Business Strategy Architect.
There are basically six types of Architects:
· Business Strategy Architect
The purpose of this role is to change business focus and define the enterprise’s to-be status. This role, is about the long view and about forecasting.
· Business Architect
The mission of business architects is to improve the functionality of the business. Their job isn’t to architect software but to architect the business itself and the way it is run.
· Solution Architect
Solution architect is a relatively new term, and it should refer also to an equally new concept. Sometimes, however, it doesn’t; it tends to be used as a synonym for application architect.
· Software Architect
Software architecture is about architecting software meant to support, automate, or even totally change the business and the business architecture.
· Infrastructure Architect
The technical infrastructure exists for deployment of the solutions of the solution architect, which means that the solution architect and the technical infrastructure architect should work together to ensure safe and productive deployment and operation of the system
· Enterprise Architect
Enterprise Architecture is the practice of applying a comprehensive and rigorous method for describing a current and/or future structure and behaviour for an organization’s processes, information systems, personnel and organizational subunits, so that they align with the organization’s core goals and strategic direction. Although often associated strictly with information technology, it relates more broadly to the practice of business optimization in that it addresses business architecture, performance management, and process architecture as well (Wikipedia).
As we are techies let’s focus on Solution Architect role:
It tends to be used as a synonym for application architect. In an application-centric world, each application is created to solve a specific business problem, or a specific set of business problems. All parts of the application are tightly knit together, each integral part being owned by the application. An application architect designs the structure of the entire application, a job that’s rather technical in nature. Typically, the application architect doesn’t create, or even help create, the application requirements; other people, often called business analysts, less technically and more business-oriented than the typical application architect, do that.
So if you are asked to get on board and architecture a system based on a whole bunch of requirements, you are very likely to be asked to do solution architecture.
How to do that?
A while back a person who does not have a technical background, but he has money so he is the boss, was lecturing that in an ideal world no team member has to talk to other team members. At that time I was thinking that in my ideal world, which is very close to the Agile world, everybody can (or should) speak to everybody else. This points out that how you architecture a system is strongly tight to your methodology and mindset. It does not really make a big difference that which methodology you follow as long as you stick to the correct concepts.
What the above story points out to me is that solution architecture is the art of mapping the business concerns to technical matters, or in the other words, it’s actually speaking about technical things in a language which is understandable to business people.
A very good way to do this is by putting yourself in the stakeholders’ shoes. There are several types of stakeholders in each project who have their own views and their own concerns. This is the biggest difference between the design and the architecture. A designer thinks very technically while an architect can think broadly and can look at a problem from different angles.
Designers usually make a huge mistake, which happens a lot in Australia: They put everything in one document. Where I am doing a solution architecture job now, I was given a 21-mega-byte MS Word document which included everything, from requirements to detailed class and database design.
Such a document is very unlikely to be understandable by the stakeholders and very hard to use by developers. I believe that this happens because firstly designers don’t consider the separation of stakeholders’ and the developers’ concerns. Secondly, because it’s easier to write down everything in a document. But I have to say that this is wrong as SAD and design document (e.g. TSD) are built for different purposes and for different audiences (and in different phases if you are following a phase-based methodology such as RUP). If you put everything in a document, it’s as if while cooking dinner and you put the ingredients along with the utensils in a pot and boil them altogether!
A very good approach for looking at the problem from the stakeholder’s point of view is the 4+1 approach. At this model, scenarios (or Use Cases) are the base and we look at them from a logical view (what are the building blocks of the system), Process view (processes such as asynchronous operations), Development (aka Implementation) view and Physical (aka Deployment) view. There are also optional views such as Data View that you can use if you need to. Some of the views are technical and some of them are not, however they must match and there must be a consistency in the architecture so that technical views can cover business views (e.g. demonstration of a business process with a UML Activity Diagram and/or State Diagram).
I believe that each software project is like a spectrum that each stakeholder sees a limited part of it. The role of an architect is to see the entire spectrum. A good approach to do so (that I use a lot) is to include a business vision (this might not be a good term) in your SAD. It can be a bulleted list, a diagram or both, which shows what the application looks like from a business perspective. Label each part of the business vision with a letter or a number. Then add an architectural overview and then map it to the items of business vision indicating that which part of the architecture is meant to address which part of the business vision.
In a nutshell, Architecture is early design decisions, it is not the design.
What to put in an SAD?
There are a whole bunch of SAD templates on the Internet, such as the template offered by RUP. However the following items seem to be necessary for each architecture document:
Introduction. This can include Purpose, Glossary, Background of the project, Assumptions, References etc. I personally suggest that you explain that what kind of methodology you are following? This will avoid lots of debates, I promise!
It is very important to clear the scope of the document. Without a clear scope not only you will never know that when you are finished, you won’t be able to convince the stakeholder that the architecture is comprehensive enough and addresses all their needs.
Architectural goals and constraints: This can include the goals, as well as your business and architectural visions. Also explain the constraints (e.g. if the business has decided to develop the software system with Microsoft .NET, it is a constraint). I would suggest that you mention the components (or modules) of the system when you mention your architectural vision. For example say that it will include Identity Management, Reporting etc. And explain what your strategy to address them is. As this section is intended to help the business people to understand your architecture, try to include clear and well-organised diagrams.
A very important item that you want to mention is the architectural principles that you are following. This is even more important when the client organization maintains a set of architectural principles.
Quality of service requirements: Quality of service requirements address the quality attributes of the system, such as performance, scalability, security etc. These items must not be mentioned in a technical language and must not contain any details (e.g. the use of Microsoft Enterprise Library 5).
Use Case View: Views basically come from 4+1 model so if you follow a different model you might not have it. However, it is very important that you detect key scenarios (or Use Cases) and mention them in a high-level. Again, diagrams, such as Use Case Diagram, help.
Logical View: Logical view demonstrates the logical decomposition of the system, such as packages the build it. It will help the business people and the designers to understand the system better.
Process View: Use activity diagrams as well as state diagrams (if necessary) to explain the key processes of the system (e.g. the process of approving a leave request).
Deployment View: Deployment view demonstrates that how the system will work in a real production environment. I suggest that you put 2 types of diagrams: one (normal) human understandable diagram, such a Visio Diagram that shows the network, firewall, application server, database, etc. Also a UML deployment diagram that demonstrates the nodes and dependencies. This will again helps the business and technical people have same understanding of the physical structure of the system.
Implementation View: This part is the most interesting section of the techies. I like to include the implementation options (e.g. Java and .NET) and provide a list of pros and cons for each of them. Again, technical pros and cons don’t make much sense to business people. They are mostly interested in Cost of Ownership and availability of the resources and so on. If you suggest a technology or if it has already been selected, list the products and services that are needed on a production environment (e.g. IIS 7, SQL Server 2008). Also it’ll be good to include a very high-level diagram of the system.
Also I like to explain the architectural patterns that I’m going to use. If you are including this section in the Implementation View, explain them enough so that a business person can quite understand what that pattern is for. For instance if you are using Lazy Loading patter, explain that what problem does it solve and why you are using it.
Needless to say that you have to also decide which kind of Architecture style you are suggesting, such as 3-Tier and N-Tier, Client-Server etc. Once you have declared that, explain the components of the system (Layers, Tiers and their relationships) by diagrams.
This part also must include your implementation strategy for addressing the Quality of Service Requirements, such as how will you address scaling out.
Data View: If the application is data centric, explain the overall solution of data management (never put a database design in this part), your backup and restore strategy as well as disaster recovery strategy.
It is suggested that the architecture (and in result the Software Architecture Document) be developed through two or more iterations. It’s impossible to build a comprehensive architecture document in one iteration as not only Architecture has an impact on the requirements, but also architecture begins in an early stage and many of the scenarios are likely to change.
How to prove that?
Now that after doing lots of endeavor you have prepared your SAD, how will you prove it to the stakeholders? I assume that many of business people do not have any idea about the content and structure of an SAD and the amount of information that you must include in it.
A good approach is to prepare a presentation about the mission of the system, scope, goals, visions and your approach. Invite the stakeholders to a meeting and present the architecture to them and explain that how the architecture covers their business needs. If they are not satisfied, your architecture is very likely to be incomplete.
Here is an ASP.NET MVC 3 sample web site to let ASP.NET MVC learners see how a real application is developed and works. This will be the first version of the app so more versions are lined up to emerge.
This ASP.NET MVC application is intended to be as simple as possible so you have to know that it is not such a comprehensive commercial application. The database, user interface and code snippets are tend to be concise and clear. Next versions will cover the utilization of Ajax and Asp.NET MVC UI controls.
What is this application about?
WebAdvert is an online advertisement web site. Basically users can sign up and issue their own adverts. They also will be able to manage (view/edit/delete) the ads. Anonymous users can browse the existing ads. Finally, administrators can view/create/delete the members and also assign them to the “Admins” role if necessary.
WebAdvert uses ASP.NET Forms authentication in an MVC fashion. This means that the application includes the AspnetDB database. It also has a SQL Server Express database file named WebAdvert. WebAdvert database contains one table only which is named “Adverts”. Ads are stored in Adverts table. The structure of this table is as bellow: