HighTechTalks DotNet Forums  

Potentail design for using ASP.Net Cache object in a web farm

ASP.net Caching microsoft.public.dotnet.framework.aspnet.caching


Discuss Potentail design for using ASP.Net Cache object in a web farm in the ASP.net Caching forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old   
Martin
 
Posts: n/a

Default Potentail design for using ASP.Net Cache object in a web farm - 12-31-2006 , 07:46 AM






Hello all,

We know that designing a web application that is both scaleable and high
performance is difficult.

Scalability implies lots of web servers all referring back to a central SQL
server, which in turn implies limited caching which in turn hurts
performance opportunities.

Clearly there is no right answer for all scenarios, but I have been thinking
over a particular design which I would like to get your views on...

This scenario involves a collection of data which is concerned with an
overall user operation. The data is persisted across multiple tables, but
the primary keys are hierarchical in nature. Eg house relates to rooms,
relates to furniture. The house primary key forms part of the room and
furniture primary keys.

I would like to use typed datasets for all the benefits they have, and I
would like to use timestamps to assist in concurrent edit checking. One
dataset would hold the data for one house (plus related tables).
I would like to cache the datasets in the asp.net application. If it times
out, so be it I can go fetch a new version.
I would expect data edits to be applied to the database as part of the web
request operation, so dataset and database remain in sync.
I would not anticipate using Session state for this application.
I would cache data key in a client side cookie.

I require affinity to the specific cache and therefore web process (across
multiple servers and CPUs). *In my view getting into cache synchronisation
across web servers will hurt the very performance gains we are trying to get
via caching in the first place.*

As a user becomes interested in a specific set of data (house) the datakey
cookie would be set, and this would drive the selection of web process that
is best suited to serve the request. Consequently as the user works with
the site, different requests may be served by different web
processes/servers. If the datakey cookie is not set, then no cache affinity
is required.

I have looked for some extension to Microsoft's Network Load Balancer using
a provider pattern to allow me to control the selection criteria of a
specific web process, but without success.
I want to take advantage of the NLB heart beat facility. The scenario I
imagine is say a collection of four web processes (spread say across two
servers each with dual processor). I *think* I can give each web process a
distinct url by using application pool configuration, but I haven't
confirmed this yet.

So I would expect my web process selection algorithm to be driven by the
value of the cookie holding a datakey. The algorithm would distribute the
requests according to the data keys. I was thinking something simple like
modulo 4 of the house ID in this scenario. When a server goes down NLB
should know this, and expose this to my provider code. My web process
selection algorithm would check the required web process is alive (refering
to NLB API), and make an alternate selection if necesary.

So far as I am aware, the piece of the picture that is missing is a provider
pattern API in NLB to facilitate this. I wonder if this is something that
is on the drawing board at Microsoft (or a third party supplier for that
matter).

Apart from that piece missing, the main disadvantage I can see in this
design is it's defence against denial of service attack. Theoretically
attackers would need to select just four distinct IDs that each hit a
different web process, however I believe the DOS risk is sufficiently small
that this design is still widely applicable.

Other issues that I know come into play include:
security of datakey
overhead of establishing potential ssl sessions on each new web process, as
datakey changes (I think this is relatively infrequent)
authentication cookies would need to apply to scope of entire web farm.
authorisation to access data would need to occur in the web application, not
just the database.

NB this design does not preclude distributed web farm clusters on different
continents (each cluster potentially caching the same data), because at the
end of the day if concurrent data edits are detected, the dataset can be
refreshed from the database, and the user can reconfirm their edit
operation.
Also in this scenario, there are likely to be multiple databases
synchronised using replication. Typically the set of editable data would
configured for each database.

I would welcome a lively discussion on the viability on the design.

Thanks very much for your time.

Martin



Reply With Quote
  #2  
Old   
Alvin Bruney [MVP]
 
Posts: n/a

Default Re: Potentail design for using ASP.Net Cache object in a web farm - 01-07-2007 , 10:47 PM






See inline.

Quote:
Scalability implies lots of web servers all referring back to a central
SQL server, which in turn implies limited caching which in turn hurts
performance opportunities.
It most certainly does not. Caching avoids the SQL bottleneck.

Quote:
I would like to use typed datasets for all the benefits they have, and I
would like to use timestamps to assist in concurrent edit checking.
timestamps won't help you with concurrency because the timestamp isn't
guaranteed accurate since windows is not a real time OS.
Even a minor lag will thru off your sync on a heavy traffic day.

Quote:
I would like to cache the datasets in the asp.net application.
Nope, cache is a poor choice because it is per process. You have a multi-cpu
architecture on a web farm. That leads to cache duplication.

Quote:
I would cache data key in a client side cookie.
What happens if cookies are lost, unreadable or client turns them off?

Quote:
multiple servers and CPUs). *In my view getting into cache
synchronisation across web servers will hurt the very performance gains we
are trying to get via caching in the first place.*
Yes.

Quote:
As a user becomes interested in a specific set of data (house) the datakey
cookie would be set, and this would drive the selection of web process
that is best suited to serve the request.
Yes, but this is all driven by the client. Not a particularly good choice
since the client doesn't have to follow the rules you impose; that is, a
client can most easily disable cookies.

I *think* I can give each web process a
Quote:
distinct url by using application pool configuration, but I haven't
confirmed this yet.
That doesn't solve your cache affinity problem.

Quote:
different web process, however I believe the DOS risk is sufficiently
small that this design is still widely applicable.
You are pushing a cookie to the client, the wrong client can regenerate
multiple cookies that in turn drive the caching mechanism in your
architecture right?
Then, it's easy to flood the cache architecture from the client since every
request is valid.

--
Regards,
Alvin Bruney
------------------------------------------------------
Shameless author plug
Excel Services for .NET is coming...
OWC Black book on Amazon and
www.lulu.com/owc

"Martin" <x@y.z> wrote

Quote:
Hello all,

We know that designing a web application that is both scaleable and high
performance is difficult.

Scalability implies lots of web servers all referring back to a central
SQL server, which in turn implies limited caching which in turn hurts
performance opportunities.

Clearly there is no right answer for all scenarios, but I have been
thinking over a particular design which I would like to get your views
on...

This scenario involves a collection of data which is concerned with an
overall user operation. The data is persisted across multiple tables, but
the primary keys are hierarchical in nature. Eg house relates to rooms,
relates to furniture. The house primary key forms part of the room and
furniture primary keys.

I would like to use typed datasets for all the benefits they have, and I
would like to use timestamps to assist in concurrent edit checking. One
dataset would hold the data for one house (plus related tables).
I would like to cache the datasets in the asp.net application. If it
times out, so be it I can go fetch a new version.
I would expect data edits to be applied to the database as part of the web
request operation, so dataset and database remain in sync.
I would not anticipate using Session state for this application.
I would cache data key in a client side cookie.

I require affinity to the specific cache and therefore web process (across
multiple servers and CPUs). *In my view getting into cache
synchronisation across web servers will hurt the very performance gains we
are trying to get via caching in the first place.*

As a user becomes interested in a specific set of data (house) the datakey
cookie would be set, and this would drive the selection of web process
that is best suited to serve the request. Consequently as the user works
with the site, different requests may be served by different web
processes/servers. If the datakey cookie is not set, then no cache
affinity is required.

I have looked for some extension to Microsoft's Network Load Balancer
using a provider pattern to allow me to control the selection criteria of
a specific web process, but without success.
I want to take advantage of the NLB heart beat facility. The scenario I
imagine is say a collection of four web processes (spread say across two
servers each with dual processor). I *think* I can give each web process
a distinct url by using application pool configuration, but I haven't
confirmed this yet.

So I would expect my web process selection algorithm to be driven by the
value of the cookie holding a datakey. The algorithm would distribute the
requests according to the data keys. I was thinking something simple like
modulo 4 of the house ID in this scenario. When a server goes down NLB
should know this, and expose this to my provider code. My web process
selection algorithm would check the required web process is alive
(refering to NLB API), and make an alternate selection if necesary.

So far as I am aware, the piece of the picture that is missing is a
provider pattern API in NLB to facilitate this. I wonder if this is
something that is on the drawing board at Microsoft (or a third party
supplier for that matter).

Apart from that piece missing, the main disadvantage I can see in this
design is it's defence against denial of service attack. Theoretically
attackers would need to select just four distinct IDs that each hit a
different web process, however I believe the DOS risk is sufficiently
small that this design is still widely applicable.

Other issues that I know come into play include:
security of datakey
overhead of establishing potential ssl sessions on each new web process,
as datakey changes (I think this is relatively infrequent)
authentication cookies would need to apply to scope of entire web farm.
authorisation to access data would need to occur in the web application,
not just the database.

NB this design does not preclude distributed web farm clusters on
different continents (each cluster potentially caching the same data),
because at the end of the day if concurrent data edits are detected, the
dataset can be refreshed from the database, and the user can reconfirm
their edit operation.
Also in this scenario, there are likely to be multiple databases
synchronised using replication. Typically the set of editable data would
configured for each database.

I would welcome a lively discussion on the viability on the design.

Thanks very much for your time.

Martin





Reply With Quote
  #3  
Old   
Martin
 
Posts: n/a

Default Re: Potentail design for using ASP.Net Cache object in a web farm - 01-21-2007 , 06:40 AM



Hello Alvin,

Are you disagreeing with the whole philosophy of using the cache to help
serve the request as close to the client as possible?

I appreciate this brings challenges in a webfarm environment, and that's
what I'm wanting to address.

If you have good web references to how you would approach the overall goal
of increased performance with caching in webfarms, that would be
interesting.

I've made individual comments inline.

Thanks
Martin

"Alvin Bruney [MVP]" <some guy without an email address> wrote

Quote:
See inline.

Scalability implies lots of web servers all referring back to a central
SQL server, which in turn implies limited caching which in turn hurts
performance opportunities.
It most certainly does not. Caching avoids the SQL bottleneck.
The point I'm making here is that in a web farm environment, the standard
practice is to reference back to central db server *not* to use caching.
Using caching introduces new challenges which I'm trying to address.

Quote:
I would like to use typed datasets for all the benefits they have, and I
would like to use timestamps to assist in concurrent edit checking.
timestamps won't help you with concurrency because the timestamp isn't
guaranteed accurate since windows is not a real time OS.
Even a minor lag will thru off your sync on a heavy traffic day.
Here is a quote from http://support.microsoft.com/kb/170380
"TimeStamp is a SQL Server data type that is automatically updated every
time a row is inserted or updated. Values in TimeStamp columns are not
datetime data; they are, by default, defined as binary(8) varbinary(8),
indicating the sequence of Microsoft SQL Server activity on the row. A table
can have only one TimeStamp column. The TimeStamp data type is simply a
monotonically-increasing counter whose values will always be unique within a
database.
"
What's wrong with that?

Quote:
I would like to cache the datasets in the asp.net application.
Nope, cache is a poor choice because it is per process. You have a
multi-cpu architecture on a web farm. That leads to cache duplication.
I want to address this application pool configuration, but not sure if I
can.
What would you do?

Quote:
I would cache data key in a client side cookie.
What happens if cookies are lost, unreadable or client turns them off?
If they're turned off they could be put in the url (by an http module)
If they are lost or unreadable that would cause interference with the users
browsing experience.

Quote:
multiple servers and CPUs). *In my view getting into cache
synchronisation across web servers will hurt the very performance gains
we are trying to get via caching in the first place.*
Yes.

As a user becomes interested in a specific set of data (house) the
datakey cookie would be set, and this would drive the selection of web
process that is best suited to serve the request.
Yes, but this is all driven by the client. Not a particularly good choice
since the client doesn't have to follow the rules you impose; that is, a
client can most easily disable cookies.

I *think* I can give each web process a
distinct url by using application pool configuration, but I haven't
confirmed this yet.
That doesn't solve your cache affinity problem.
Doesn't it? I've not tried yet.
Got any ideaas then?

Quote:
different web process, however I believe the DOS risk is sufficiently
small that this design is still widely applicable.
You are pushing a cookie to the client, the wrong client can regenerate
multiple cookies that in turn drive the caching mechanism in your
architecture right?
Then, it's easy to flood the cache architecture from the client since
every request is valid.
I agree
What's your DOS answer?
Quote:
--
Regards,
Alvin Bruney
------------------------------------------------------
Shameless author plug
Excel Services for .NET is coming...
OWC Black book on Amazon and
www.lulu.com/owc

"Martin" <x@y.z> wrote in message
news:%23KIyBnNLHHA.4460 (AT) TK2MSFTNGP03 (DOT) phx.gbl...
Hello all,

We know that designing a web application that is both scaleable and high
performance is difficult.

Scalability implies lots of web servers all referring back to a central
SQL server, which in turn implies limited caching which in turn hurts
performance opportunities.

Clearly there is no right answer for all scenarios, but I have been
thinking over a particular design which I would like to get your views
on...

This scenario involves a collection of data which is concerned with an
overall user operation. The data is persisted across multiple tables,
but the primary keys are hierarchical in nature. Eg house relates to
rooms, relates to furniture. The house primary key forms part of the room
and furniture primary keys.

I would like to use typed datasets for all the benefits they have, and I
would like to use timestamps to assist in concurrent edit checking. One
dataset would hold the data for one house (plus related tables).
I would like to cache the datasets in the asp.net application. If it
times out, so be it I can go fetch a new version.
I would expect data edits to be applied to the database as part of the
web request operation, so dataset and database remain in sync.
I would not anticipate using Session state for this application.
I would cache data key in a client side cookie.

I require affinity to the specific cache and therefore web process
(across multiple servers and CPUs). *In my view getting into cache
synchronisation across web servers will hurt the very performance gains
we are trying to get via caching in the first place.*

As a user becomes interested in a specific set of data (house) the
datakey cookie would be set, and this would drive the selection of web
process that is best suited to serve the request. Consequently as the
user works with the site, different requests may be served by different
web processes/servers. If the datakey cookie is not set, then no cache
affinity is required.

I have looked for some extension to Microsoft's Network Load Balancer
using a provider pattern to allow me to control the selection criteria of
a specific web process, but without success.
I want to take advantage of the NLB heart beat facility. The scenario I
imagine is say a collection of four web processes (spread say across two
servers each with dual processor). I *think* I can give each web process
a distinct url by using application pool configuration, but I haven't
confirmed this yet.

So I would expect my web process selection algorithm to be driven by the
value of the cookie holding a datakey. The algorithm would distribute
the requests according to the data keys. I was thinking something simple
like modulo 4 of the house ID in this scenario. When a server goes down
NLB should know this, and expose this to my provider code. My web
process selection algorithm would check the required web process is alive
(refering to NLB API), and make an alternate selection if necesary.

So far as I am aware, the piece of the picture that is missing is a
provider pattern API in NLB to facilitate this. I wonder if this is
something that is on the drawing board at Microsoft (or a third party
supplier for that matter).

Apart from that piece missing, the main disadvantage I can see in this
design is it's defence against denial of service attack. Theoretically
attackers would need to select just four distinct IDs that each hit a
different web process, however I believe the DOS risk is sufficiently
small that this design is still widely applicable.

Other issues that I know come into play include:
security of datakey
overhead of establishing potential ssl sessions on each new web process,
as datakey changes (I think this is relatively infrequent)
authentication cookies would need to apply to scope of entire web farm.
authorisation to access data would need to occur in the web application,
not just the database.

NB this design does not preclude distributed web farm clusters on
different continents (each cluster potentially caching the same data),
because at the end of the day if concurrent data edits are detected, the
dataset can be refreshed from the database, and the user can reconfirm
their edit operation.
Also in this scenario, there are likely to be multiple databases
synchronised using replication. Typically the set of editable data would
configured for each database.

I would welcome a lively discussion on the viability on the design.

Thanks very much for your time.

Martin







Reply With Quote
  #4  
Old   
Alvin Bruney [MVP]
 
Posts: n/a

Default Re: Potentail design for using ASP.Net Cache object in a web farm - 01-25-2007 , 07:56 PM



Quote:
Are you disagreeing with the whole philosophy of using the cache to help
serve the request as close to the client as possible?
In principle, yes because it causes more problems than it solves especially
for web farms. It is workable, but at what cost?

Quote:
If you have good web references to how you would approach the overall goal
of increased performance with caching in webfarms, that would be
interesting.
Actually, the patterns and practice group at MS has released the
authoritative work on that. I just happen to subscribe to what it preaches.

Quote:
The point I'm making here is that in a web farm environment, the standard
practice is to reference back to central db server *not* to use caching.
It may be standard practice, but it is dead wrong with respect to
scalability.

Quote:
What's wrong with that?
In even a moderate concurrent environment, by the time you read the data it
may have already changed because of another update making your published
value stale.

Quote:
What would you do?
For a web farm, that requires shared resources, you have to move the dataset
back into the database. The problems that caching introduce in a highly
concurrent environment that require shared access outweigh the benefits. The
exception to this case is the asp net cache service. It's actually a viable
option because it outperforms sql.

Quote:
Got any ideaas then?
Come to think about it, I think the asp net cache service is a valid choice.
I can't see a reason why it won't address your problems. The only thing I
can see turning sour is the cost of serialization, but then again that is
the same with sql. There's another issue to with pages being loaded that
incur a page lock during the serialization process. That can block threads
if a page takes particularly long to execute.

Quote:
What's your DOS answer?
If you go that route, you'd need to somehow flag invalid responses and only
let in valid responses into your pipeline architecture. That way, even with
a DOS attack, it won't trigger your cache mechanism.

In a nutshell, there are no guarantees and it doesn't make your architecture
wrong. There are however, a few more things that you need to be wary of if
you decide to pursue your option. There may be ways to get your architecture
working in a scalable environment but you need to consider and plan for
these issues ahead of time.

--
Regards,
Alvin Bruney
------------------------------------------------------
Shameless author plug
Excel Services for .NET is coming...
OWC Black book on Amazon and
www.lulu.com/owc


"Martin" <x@y.z> wrote

Quote:
Hello Alvin,

Are you disagreeing with the whole philosophy of using the cache to help
serve the request as close to the client as possible?

I appreciate this brings challenges in a webfarm environment, and that's
what I'm wanting to address.

If you have good web references to how you would approach the overall goal
of increased performance with caching in webfarms, that would be
interesting.

I've made individual comments inline.

Thanks
Martin

"Alvin Bruney [MVP]" <some guy without an email address> wrote in message
news:%23L7UHftMHHA.1912 (AT) TK2MSFTNGP02 (DOT) phx.gbl...
See inline.

Scalability implies lots of web servers all referring back to a central
SQL server, which in turn implies limited caching which in turn hurts
performance opportunities.
It most certainly does not. Caching avoids the SQL bottleneck.
The point I'm making here is that in a web farm environment, the standard
practice is to reference back to central db server *not* to use caching.
Using caching introduces new challenges which I'm trying to address.


I would like to use typed datasets for all the benefits they have, and I
would like to use timestamps to assist in concurrent edit checking.
timestamps won't help you with concurrency because the timestamp isn't
guaranteed accurate since windows is not a real time OS.
Even a minor lag will thru off your sync on a heavy traffic day.
Here is a quote from http://support.microsoft.com/kb/170380
"TimeStamp is a SQL Server data type that is automatically updated every
time a row is inserted or updated. Values in TimeStamp columns are not
datetime data; they are, by default, defined as binary(8) varbinary(8),
indicating the sequence of Microsoft SQL Server activity on the row. A
table can have only one TimeStamp column. The TimeStamp data type is
simply a monotonically-increasing counter whose values will always be
unique within a database.
"
What's wrong with that?


I would like to cache the datasets in the asp.net application.
Nope, cache is a poor choice because it is per process. You have a
multi-cpu architecture on a web farm. That leads to cache duplication.
I want to address this application pool configuration, but not sure if I
can.
What would you do?


I would cache data key in a client side cookie.
What happens if cookies are lost, unreadable or client turns them off?
If they're turned off they could be put in the url (by an http module)
If they are lost or unreadable that would cause interference with the
users browsing experience.


multiple servers and CPUs). *In my view getting into cache
synchronisation across web servers will hurt the very performance gains
we are trying to get via caching in the first place.*
Yes.

As a user becomes interested in a specific set of data (house) the
datakey cookie would be set, and this would drive the selection of web
process that is best suited to serve the request.
Yes, but this is all driven by the client. Not a particularly good choice
since the client doesn't have to follow the rules you impose; that is, a
client can most easily disable cookies.

I *think* I can give each web process a
distinct url by using application pool configuration, but I haven't
confirmed this yet.
That doesn't solve your cache affinity problem.
Doesn't it? I've not tried yet.
Got any ideaas then?


different web process, however I believe the DOS risk is sufficiently
small that this design is still widely applicable.
You are pushing a cookie to the client, the wrong client can regenerate
multiple cookies that in turn drive the caching mechanism in your
architecture right?
Then, it's easy to flood the cache architecture from the client since
every request is valid.
I agree
What's your DOS answer?

--
Regards,
Alvin Bruney
------------------------------------------------------
Shameless author plug
Excel Services for .NET is coming...
OWC Black book on Amazon and
www.lulu.com/owc

"Martin" <x@y.z> wrote in message
news:%23KIyBnNLHHA.4460 (AT) TK2MSFTNGP03 (DOT) phx.gbl...
Hello all,

We know that designing a web application that is both scaleable and high
performance is difficult.

Scalability implies lots of web servers all referring back to a central
SQL server, which in turn implies limited caching which in turn hurts
performance opportunities.

Clearly there is no right answer for all scenarios, but I have been
thinking over a particular design which I would like to get your views
on...

This scenario involves a collection of data which is concerned with an
overall user operation. The data is persisted across multiple tables,
but the primary keys are hierarchical in nature. Eg house relates to
rooms, relates to furniture. The house primary key forms part of the
room and furniture primary keys.

I would like to use typed datasets for all the benefits they have, and I
would like to use timestamps to assist in concurrent edit checking. One
dataset would hold the data for one house (plus related tables).
I would like to cache the datasets in the asp.net application. If it
times out, so be it I can go fetch a new version.
I would expect data edits to be applied to the database as part of the
web request operation, so dataset and database remain in sync.
I would not anticipate using Session state for this application.
I would cache data key in a client side cookie.

I require affinity to the specific cache and therefore web process
(across multiple servers and CPUs). *In my view getting into cache
synchronisation across web servers will hurt the very performance gains
we are trying to get via caching in the first place.*

As a user becomes interested in a specific set of data (house) the
datakey cookie would be set, and this would drive the selection of web
process that is best suited to serve the request. Consequently as the
user works with the site, different requests may be served by different
web processes/servers. If the datakey cookie is not set, then no cache
affinity is required.

I have looked for some extension to Microsoft's Network Load Balancer
using a provider pattern to allow me to control the selection criteria
of a specific web process, but without success.
I want to take advantage of the NLB heart beat facility. The scenario I
imagine is say a collection of four web processes (spread say across two
servers each with dual processor). I *think* I can give each web
process a distinct url by using application pool configuration, but I
haven't confirmed this yet.

So I would expect my web process selection algorithm to be driven by the
value of the cookie holding a datakey. The algorithm would distribute
the requests according to the data keys. I was thinking something
simple like modulo 4 of the house ID in this scenario. When a server
goes down NLB should know this, and expose this to my provider code. My
web process selection algorithm would check the required web process is
alive (refering to NLB API), and make an alternate selection if
necesary.

So far as I am aware, the piece of the picture that is missing is a
provider pattern API in NLB to facilitate this. I wonder if this is
something that is on the drawing board at Microsoft (or a third party
supplier for that matter).

Apart from that piece missing, the main disadvantage I can see in this
design is it's defence against denial of service attack. Theoretically
attackers would need to select just four distinct IDs that each hit a
different web process, however I believe the DOS risk is sufficiently
small that this design is still widely applicable.

Other issues that I know come into play include:
security of datakey
overhead of establishing potential ssl sessions on each new web process,
as datakey changes (I think this is relatively infrequent)
authentication cookies would need to apply to scope of entire web farm.
authorisation to access data would need to occur in the web application,
not just the database.

NB this design does not preclude distributed web farm clusters on
different continents (each cluster potentially caching the same data),
because at the end of the day if concurrent data edits are detected, the
dataset can be refreshed from the database, and the user can reconfirm
their edit operation.
Also in this scenario, there are likely to be multiple databases
synchronised using replication. Typically the set of editable data
would configured for each database.

I would welcome a lively discussion on the viability on the design.

Thanks very much for your time.

Martin









Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.