HighTechTalks DotNet Forums  

interrupting flow of a function and/or yielding control

Dotnet Framework (CLR) microsoft.public.dotnet.framework.clr


Discuss interrupting flow of a function and/or yielding control in the Dotnet Framework (CLR) forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old   
dB.
 
Posts: n/a

Default interrupting flow of a function and/or yielding control - 12-21-2007 , 03:41 PM






I am trying to build a workflow system for database detection that
needs to perform thousands of detections in parallel. Most of the time
the detectors sit waiting on network IO to do something. The actual
detector code is fairly thick, with a third party implementing the
actual detectors.

What I am looking to provide is an engine in which I can create a list
of 1000 detectors, then execute the code in each detector all at the
same time without spawning 1000 threads.

Most of the delay is on the network. If I were writing the detectors
myself, I could do something like this:

BeginDetection()
{
...
send a packet
begin receive a packet (callback on OnReceivePacket)
}

OnReceivePacket(...)
{
... finish detection
}

Rebuilding all detectors this way is cumbersome for our purpose.

The question is: can CLR do something for me in terms of interrupting
the flow of a function, saving the stack and coming back to it? The
detectors could yield control too in the appropriate places
explicitly.

Thx
dB.


Reply With Quote
  #2  
Old   
Jeroen Mostert
 
Posts: n/a

Default Re: interrupting flow of a function and/or yielding control - 12-21-2007 , 06:35 PM






dB. wrote:
<snip>
Quote:
What I am looking to provide is an engine in which I can create a list
of 1000 detectors, then execute the code in each detector all at the
same time without spawning 1000 threads.

Well, you obviously can't literally do that -- if you want to execute the
code "all at the same time" then multiple threads are inevitable, otherwise
the code can at best be "not quite at the same time". However, I assume that
you just mean that you'd like to limit the number of active threads by not
dedicating them to waiting on I/O.

Quote:
Most of the delay is on the network. If I were writing the detectors
myself, I could do something like this:

BeginDetection()
{
...
send a packet
begin receive a packet (callback on OnReceivePacket)
}

OnReceivePacket(...)
{
... finish detection
}

Indeed, that's the classic implementation of asynchronous requests, which
will use a very efficient mix of thread pooling and completion ports in .NET.

Quote:
Rebuilding all detectors this way is cumbersome for our purpose.

I nevertheless strongly suggest that you consider it. A rewrite to move to
asnychronous I/O, while costly, is something you only do once (well, for
every codebase) and it continues to pay off. Although it may be
"cumbersome", for the most part it's not hard.

Quote:
The question is: can CLR do something for me in terms of interrupting
the flow of a function, saving the stack and coming back to it? The
detectors could yield control too in the appropriate places
explicitly.

What you're asking for is a coroutine, something which is not natively
implemented by the framework. That said, you can implement this using the
unmanaged hosting interfaces (that means leaving the comfortable world of
..NET and entering the harsh environs of C++ and Win32). The CLR associates
managed "tasks" with OS threads, and the CLR host can control this
assignment. Take a look at the IHostTaskManager interface and the ICLRTask
interface, and especially the SwitchIn() and SwitchOut() methods of the
latter. Using these, I suspect you could build a coroutine implementation
fairly straightforwardly, possibly using Win32 fibers to ease some of the
load (though there are many, many "details" to get right).

Even so, what you want is not quite comparable to a pure coroutine scenario.
Even if your detector can yield explicitly, it cannot do so *during* I/O
(because that code is not under your control), so it could at best yield
*between* I/O requests. But if I/O is what you're mostly doing, this is of
little use. Your threads will still be preoccupied with idling on I/O.
Managing threads explicitly will allow you to cut down on the number of
threads, but if those few threads are mostly busy doing nothing you haven't
gained much in terms of scalability. Even an unmanaged host has no way of
detecting when code is "waiting on I/O" to reliably switch out the task.

You can detect when the task is waiting in general, though. Implementing the
IHostSyncManager interface will give you precise control over the
synchronization primitives used by the managed code, and most synchronous
I/O is implemented by eventually using one of these primitives to wait for
completion. However, leveraging this effectively to turn threads into
coroutines without introducing deadlocks is a daunting task, to say the
least. Multithreaded programming is difficult enough without having to worry
about the implementation of the synchronization primitives themselves.

If the above sounds complicated to you, that's because it is. If you really
want to go this route, then pick up Steven Pratschner's "Customizing the
Microsoft .NET Common Language Runtime" (ISBN 9780735619883). This book is
pretty much not optional if you want to sink your teeth in hosting, because
the documentation, while pretty good, does not give you the big picture, let
alone the many pitfalls. It took me a good deal of two weeks to implement a
pretty simple host that uses AppDomains as lightweight, reliable,
restartable processes, and that doesn't even touch the more difficult
aspects of hosting. Something as dramatic as what you're asking for sounds
like a multi-month project for the uninitiated, and that's assuming you're
willing/able to muck about with unmanaged code in the first place.

If all you're concerned about is the amount of code you'll have to write to
make the detectors use asynchronous I/O, then you're probably still better
off hacking together some sort of code generator/translator that will
convert the synchronous calls to asynchronous ones for you, or possibly
doing even more dramatic rewrites of the code. While not a pretty solution,
it's still much less involved than implementing coroutines at a low level.

--
J.


Reply With Quote
  #3  
Old   
Dave Farquharson
 
Posts: n/a

Default Re: interrupting flow of a function and/or yielding control - 12-21-2007 , 07:08 PM



This is a pretty awesome and well thought out reply, and as someone who just
implemented a CLR hosting component myself I also heartily recommend Steven
Pratschner's book if you're thinking about using CLR hosting. It would have
taken me 4 times as long to get working without that book.

-dave


"Jeroen Mostert" <jmostert (AT) xs4all (DOT) nl> wrote

Quote:
dB. wrote:
snip
What I am looking to provide is an engine in which I can create a list
of 1000 detectors, then execute the code in each detector all at the
same time without spawning 1000 threads.

Well, you obviously can't literally do that -- if you want to execute the
code "all at the same time" then multiple threads are inevitable,
otherwise the code can at best be "not quite at the same time". However, I
assume that you just mean that you'd like to limit the number of active
threads by not dedicating them to waiting on I/O.

Most of the delay is on the network. If I were writing the detectors
myself, I could do something like this:

BeginDetection()
{
...
send a packet
begin receive a packet (callback on OnReceivePacket)
}

OnReceivePacket(...)
{
... finish detection
}

Indeed, that's the classic implementation of asynchronous requests, which
will use a very efficient mix of thread pooling and completion ports in
.NET.

Rebuilding all detectors this way is cumbersome for our purpose.

I nevertheless strongly suggest that you consider it. A rewrite to move to
asnychronous I/O, while costly, is something you only do once (well, for
every codebase) and it continues to pay off. Although it may be
"cumbersome", for the most part it's not hard.

The question is: can CLR do something for me in terms of interrupting
the flow of a function, saving the stack and coming back to it? The
detectors could yield control too in the appropriate places
explicitly.

What you're asking for is a coroutine, something which is not natively
implemented by the framework. That said, you can implement this using the
unmanaged hosting interfaces (that means leaving the comfortable world of
.NET and entering the harsh environs of C++ and Win32). The CLR associates
managed "tasks" with OS threads, and the CLR host can control this
assignment. Take a look at the IHostTaskManager interface and the ICLRTask
interface, and especially the SwitchIn() and SwitchOut() methods of the
latter. Using these, I suspect you could build a coroutine implementation
fairly straightforwardly, possibly using Win32 fibers to ease some of the
load (though there are many, many "details" to get right).

Even so, what you want is not quite comparable to a pure coroutine
scenario. Even if your detector can yield explicitly, it cannot do so
*during* I/O (because that code is not under your control), so it could at
best yield *between* I/O requests. But if I/O is what you're mostly doing,
this is of little use. Your threads will still be preoccupied with idling
on I/O. Managing threads explicitly will allow you to cut down on the
number of threads, but if those few threads are mostly busy doing nothing
you haven't gained much in terms of scalability. Even an unmanaged host
has no way of detecting when code is "waiting on I/O" to reliably switch
out the task.

You can detect when the task is waiting in general, though. Implementing
the IHostSyncManager interface will give you precise control over the
synchronization primitives used by the managed code, and most synchronous
I/O is implemented by eventually using one of these primitives to wait for
completion. However, leveraging this effectively to turn threads into
coroutines without introducing deadlocks is a daunting task, to say the
least. Multithreaded programming is difficult enough without having to
worry about the implementation of the synchronization primitives
themselves.

If the above sounds complicated to you, that's because it is. If you
really want to go this route, then pick up Steven Pratschner's
"Customizing the Microsoft .NET Common Language Runtime" (ISBN
9780735619883). This book is pretty much not optional if you want to sink
your teeth in hosting, because the documentation, while pretty good, does
not give you the big picture, let alone the many pitfalls. It took me a
good deal of two weeks to implement a pretty simple host that uses
AppDomains as lightweight, reliable, restartable processes, and that
doesn't even touch the more difficult aspects of hosting. Something as
dramatic as what you're asking for sounds like a multi-month project for
the uninitiated, and that's assuming you're willing/able to muck about
with unmanaged code in the first place.

If all you're concerned about is the amount of code you'll have to write
to make the detectors use asynchronous I/O, then you're probably still
better off hacking together some sort of code generator/translator that
will convert the synchronous calls to asynchronous ones for you, or
possibly doing even more dramatic rewrites of the code. While not a pretty
solution, it's still much less involved than implementing coroutines at a
low level.

--
J.



Reply With Quote
  #4  
Old   
Ben Voigt [C++ MVP]
 
Posts: n/a

Default Re: interrupting flow of a function and/or yielding control - 12-24-2007 , 02:33 PM




"Jeroen Mostert" <jmostert (AT) xs4all (DOT) nl> wrote

Quote:
dB. wrote:
snip
What I am looking to provide is an engine in which I can create a list
of 1000 detectors, then execute the code in each detector all at the
same time without spawning 1000 threads.

Well, you obviously can't literally do that -- if you want to execute the
code "all at the same time" then multiple threads are inevitable,
otherwise the code can at best be "not quite at the same time". However, I
assume that you just mean that you'd like to limit the number of active
threads by not dedicating them to waiting on I/O.

Most of the delay is on the network. If I were writing the detectors
myself, I could do something like this:

BeginDetection()
{
...
send a packet
begin receive a packet (callback on OnReceivePacket)
}

OnReceivePacket(...)
{
... finish detection
}

Indeed, that's the classic implementation of asynchronous requests, which
will use a very efficient mix of thread pooling and completion ports in
.NET.

Rebuilding all detectors this way is cumbersome for our purpose.

I nevertheless strongly suggest that you consider it. A rewrite to move to
asnychronous I/O, while costly, is something you only do once (well, for
every codebase) and it continues to pay off. Although it may be
"cumbersome", for the most part it's not hard.

The question is: can CLR do something for me in terms of interrupting
the flow of a function, saving the stack and coming back to it? The
detectors could yield control too in the appropriate places
explicitly.

What you're asking for is a coroutine, something which is not natively
implemented by the framework. That said, you can implement this using the
Actually, the C# yield return statement implements coroutines.

You're correct that it doesn't help with I/O particularly.... unless....
oooh I have an idea.

Make an IEnumerable interface that yield returns some I/O descriptor, which
a main loop will start asynchronously, providing an appropriate callback.
The callback will invoke IEnumerable.GetNext() on the associated detector,
starting the next I/O asynchronously.




Reply With Quote
  #5  
Old   
Jeroen Mostert
 
Posts: n/a

Default Re: interrupting flow of a function and/or yielding control - 12-24-2007 , 05:58 PM



Ben Voigt [C++ MVP] wrote:
Quote:
"Jeroen Mostert" <jmostert (AT) xs4all (DOT) nl> wrote in message
news:476c4db7$0$85783$e4fe514c (AT) news (DOT) xs4all.nl...
snip
What you're asking for is a coroutine, something which is not natively
implemented by the framework. That said, you can implement this using the

Actually, the C# yield return statement implements coroutines.

Well, sort of. It's not intended as a general coroutine construct, since the
routine ends aren't equals (GetNext() decides which iterator to call, but
the iterator doesn't decide which routine to yield control to). "Iterators
with closures" is more like it. But yeah, with a prearranged control type
for a return value, you could probably implement every coroutine scenario.
If you don't mind some pretty unintuitive code.

Quote:
You're correct that it doesn't help with I/O particularly.... unless....
oooh I have an idea.

Make an IEnumerable interface that yield returns some I/O descriptor, which
a main loop will start asynchronously, providing an appropriate callback.
The callback will invoke IEnumerable.GetNext() on the associated detector,
starting the next I/O asynchronously.

This sounds like much more work than the OP was gunning for. (Of course, my
hosting suggestion is even *more* work, but hey.)

There are still issues with scalability in this approach, only now you've
shifted the threading issues to the thread pool (assuming this is what we'll
use to kick off the asynchronous requests). The main problem here is that,
whichever way you slice it, the I/O will be done synchronously, so it'll tie
up a thread. If you care about the number of threads involved, you just
can't have 1,000 detectors going simultaneously, because it's going to take
1,000 threads. The only real solution is to rewrite the I/O to be asynchronous.

Using the thread pool does have the benefit of not blowing up the system,
since there's a limit to the number of requests it can have in flight. If
you *must* do lots of synchronous things with as much parallelism as
possible, the thread pool is probably the best way to go.

Actually, the detectors *might* just be doing lots of small I/O requests
that could profitably be broken up with a coroutine pattern, but the OP
hasn't really made it clear whether this is the case. Nor is it clear how
much rewriting of the detector code is acceptable, really.

--
J.


Reply With Quote
  #6  
Old   
Barry Kelly
 
Posts: n/a

Default Re: interrupting flow of a function and/or yielding control - 12-24-2007 , 07:30 PM



Jeroen Mostert wrote:

Quote:
You're correct that it doesn't help with I/O particularly.... unless....
oooh I have an idea.

Make an IEnumerable interface that yield returns some I/O descriptor, which
a main loop will start asynchronously, providing an appropriate callback.
The callback will invoke IEnumerable.GetNext() on the associated detector,
starting the next I/O asynchronously.

There are still issues with scalability in this approach, only now you've
shifted the threading issues to the thread pool (assuming this is what we'll
use to kick off the asynchronous requests). The main problem here is that,
whichever way you slice it, the I/O will be done synchronously, so it'll tie
up a thread. If you care about the number of threads involved, you just
can't have 1,000 detectors going simultaneously, because it's going to take
1,000 threads. The only real solution is to rewrite the I/O to be asynchronous.
If the I/O descriptor that the iterator yields to the main loop is a
delegate which performs the operation asynchronously (and the main loop
calls the delegate directly, rather than on a threadpool), then it could
work properly (assuming recommunication back of the return value /
exceptions etc. that occur on the End* call). It all depends on the I/O
descriptor.

Rather better, to my mind, would be to code in a continuation passing
style, passing a delegate as the AsyncCallback. Transformation of normal
C# code to CPS is actually mechanical. Compilers could and probably
should be able to do this automatically (see my blog for details - I
talked about this over a year ago, and something in Volta is vaguely
similar, so the latest post has a link back to it).

MS seems to have had a penchant for this weird stateful event
subscription style ("On*" event handlers) of async lately, which I don't
understand, because it's harder to use, IMHO - it requires subscribing
and unsubscribing before and after calls to the asynchronous method,
with even more jiggery pokery than async I/O already requires.

-- Barry

--
http://barrkel.blogspot.com/


Reply With Quote
  #7  
Old   
Ben Voigt [C++ MVP]
 
Posts: n/a

Default Re: interrupting flow of a function and/or yielding control - 12-26-2007 , 03:35 PM




"Barry Kelly" <barry.j.kelly (AT) gmail (DOT) com> wrote

Quote:
Jeroen Mostert wrote:

You're correct that it doesn't help with I/O particularly....
unless....
oooh I have an idea.

Make an IEnumerable interface that yield returns some I/O descriptor,
which
a main loop will start asynchronously, providing an appropriate
callback.
The callback will invoke IEnumerable.GetNext() on the associated
detector,
starting the next I/O asynchronously.

There are still issues with scalability in this approach, only now you've
shifted the threading issues to the thread pool (assuming this is what
we'll
use to kick off the asynchronous requests). The main problem here is
that,
whichever way you slice it, the I/O will be done synchronously, so it'll
tie
I specifically called for the I/O to be started asynchronously.

Quote:
up a thread. If you care about the number of threads involved, you just
can't have 1,000 detectors going simultaneously, because it's going to
take
1,000 threads. The only real solution is to rewrite the I/O to be
asynchronous.

If the I/O descriptor that the iterator yields to the main loop is a
delegate which performs the operation asynchronously (and the main loop
calls the delegate directly, rather than on a threadpool), then it could
work properly (assuming recommunication back of the return value /
exceptions etc. that occur on the End* call). It all depends on the I/O
descriptor.
If only it were possible to assign a completion handler using the
IAsyncResult then that would be an ideal candidate for the iterator
implementation to return. The iterator implementation would call yield
return BeginRead, then EndRead and process the result and exception locally.
The main loop would need to attach a completion routine that called
IEnumerator<IAsyncResult>.GetNext() on the object. Of course, if the
asyncState parameter is consistently set to the IEnumerator<IAsyncResult>
object then a single completion routine could be reused everywhere.

Something like:

void theCallback(IAsyncResult r) {
((IEnumerator<IAsyncResult>)r.AsyncState).GetNext( ); }

....
IAsyncResult iar = s.BeginRead(..., theCallback, this);
yield return iar;
int bytesRead = s.EndRead(iar);
....

a very straightforward conversion from synchronous to coroutine-based I/O.

Quote:
Rather better, to my mind, would be to code in a continuation passing
style, passing a delegate as the AsyncCallback. Transformation of normal
C# code to CPS is actually mechanical. Compilers could and probably
should be able to do this automatically (see my blog for details - I
talked about this over a year ago, and something in Volta is vaguely
similar, so the latest post has a link back to it).

MS seems to have had a penchant for this weird stateful event
subscription style ("On*" event handlers) of async lately, which I don't
understand, because it's harder to use, IMHO - it requires subscribing
and unsubscribing before and after calls to the asynchronous method,
with even more jiggery pokery than async I/O already requires.
It also forces you to think in terms of event-driven finite state machines
instead of sequential tasks, which makes it a lot easier to centralize error
handling and more likely to correctly handle exceptional states (i.e. the
operation never completed, something happened out of order, etc).

Quote:
-- Barry

--
http://barrkel.blogspot.com/



Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.