HighTechTalks DotNet Forums  

Re: Extremly bad peformance when serializing objects

Dotnet Distributed Applications microsoft.public.dotnet.distributed_apps


Discuss Re: Extremly bad peformance when serializing objects in the Dotnet Distributed Applications forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old   
Peter Morris
 
Posts: n/a

Default Re: Extremly bad peformance when serializing objects - 06-26-2003 , 10:09 AM






http://www.howtodothings.com/showart...sp?article=611



Reply With Quote
  #2  
Old   
AT
 
Posts: n/a

Default Re: Extremly bad peformance when serializing objects - 07-01-2003 , 12:51 PM






This is exactly the results that I saw as well.

If you create a new array of objects for every batch, the data is
serialized correctly but (as you would expect), the code takes alot
longer to execute. I believe the batch approach does improve upon a
single large array though.

In the end, to improve DataTable transport, I changed the
BinaryDataTable example from MSDN to serialize all of the rows to
StringBuilder object and then placed the single string in the stream.
I left the columns as indivdual objects

Instead of using verbose XML (as the DataTable object does itself), I
just used commas and pipes to seperate the row data. While not as fast
as SQLCLient, it was a dramatic improvement
over serializing each row as an array of objects. YMMV

brian


"Geert Depickere" <x.x (AT) xnospamx (DOT) x> wrote

Quote:
Hello Trond,

I have read your article (and newsgroup discussion) and have experimented
with your code.

I think I have found a problem in your code : If I use it as you provided
it, I do not get all data at the other side of the remoting : I get the data
for the last complete batch of 500 items + data for the extra non complete
batch.

In the extract below, should not the line marked as "//<<< THIS LINE" be
where "//<<< HERE" is ?

...

public void GetObjectData(SerializationInfo info, StreamingContext context){

int batchSize = 500;

info.AddValue("Count", (Int32)this.Count);

info.AddValue("BatchSize", (Int32)batchSize);

ChapterInfo[] chaptersT = new ChapterInfo[batchSize]; //<<< THIS LINE

int next = 0;

for (int i = 0; i < InnerList.Count / batchSize; i++) {

for (int j = 0; j < batchSize; j++) {

//<<< HERE

chaptersT[j] = (ChapterInfo)InnerList[i*batchSize + j];

}

info.AddValue(i.ToString(), chaptersT);

next = i + 1;

}

...

I think that you are adding (InnerList.Count/batchSize) times the same
chaptersT object to info, and hence you are not pulling the 60000 items over
the link, only 500 + the number of items in the incomplete batch.

Please check this and let me know.

Greetings from Belgium.

Geert Depickere

"Peter Morris" <support (AT) _nospam_droopyeyes (DOT) com> wrote in message
news:u$0iTy#ODHA.2788 (AT) TK2MSFTNGP10 (DOT) phx.gbl...
http://www.howtodothings.com/showart...sp?article=611



Reply With Quote
  #3  
Old   
Geert Depickere
 
Posts: n/a

Default Re: Extremly bad peformance when serializing objects - 07-02-2003 , 03:27 AM



Brian,

Thanks for the confirmation!

About the BinaryDataTable article : I experimented with this also, because I
was having problems with serializing very large datasets.

My main problem was the HUGE memory consumption of the remoting server
process (windows service) (and the long time it takes to serialize).

I managed to solve both problems (reasonable memory usage + performance) by
making a BinaryDataSet class that takes serialization in its own hands. I
just serialize the dataset to a StringBuilder (through a StringWriter, a
XmlTextWriter (WriteStartDocument and WriteXml) and then add the ToString()
of the StringBuilder to the SerializationInfo.

"Brian Flood" <bFlood (AT) spatialDataLogic (DOT) com> wrote

Quote:
This is exactly the results that I saw as well.

If you create a new array of objects for every batch, the data is
serialized correctly but (as you would expect), the code takes alot
longer to execute. I believe the batch approach does improve upon a
single large array though.

In the end, to improve DataTable transport, I changed the
BinaryDataTable example from MSDN to serialize all of the rows to
StringBuilder object and then placed the single string in the stream.
I left the columns as indivdual objects

Instead of using verbose XML (as the DataTable object does itself), I
just used commas and pipes to seperate the row data. While not as fast
as SQLCLient, it was a dramatic improvement
over serializing each row as an array of objects. YMMV

brian


"Geert Depickere" <x.x (AT) xnospamx (DOT) x> wrote

Hello Trond,

I have read your article (and newsgroup discussion) and have
experimented
with your code.

I think I have found a problem in your code : If I use it as you
provided
it, I do not get all data at the other side of the remoting : I get the
data
for the last complete batch of 500 items + data for the extra non
complete
batch.

In the extract below, should not the line marked as "//<<< THIS LINE" be
where "//<<< HERE" is ?

...

public void GetObjectData(SerializationInfo info, StreamingContext
context){

int batchSize = 500;

info.AddValue("Count", (Int32)this.Count);

info.AddValue("BatchSize", (Int32)batchSize);

ChapterInfo[] chaptersT = new ChapterInfo[batchSize]; //<<< THIS LINE

int next = 0;

for (int i = 0; i < InnerList.Count / batchSize; i++) {

for (int j = 0; j < batchSize; j++) {

//<<< HERE

chaptersT[j] = (ChapterInfo)InnerList[i*batchSize + j];

}

info.AddValue(i.ToString(), chaptersT);

next = i + 1;

}

...

I think that you are adding (InnerList.Count/batchSize) times the same
chaptersT object to info, and hence you are not pulling the 60000 items
over
the link, only 500 + the number of items in the incomplete batch.

Please check this and let me know.

Greetings from Belgium.

Geert Depickere

"Peter Morris" <support (AT) _nospam_droopyeyes (DOT) com> wrote in message
news:u$0iTy#ODHA.2788 (AT) TK2MSFTNGP10 (DOT) phx.gbl...
http://www.howtodothings.com/showart...sp?article=611





Reply With Quote
  #4  
Old   
AT
 
Posts: n/a

Default Re: Extremly bad peformance when serializing objects - 07-02-2003 , 10:07 AM



Hi Geert

that pretty much how I did it as well. StringBuilder full of row data
into a single slot of the serialization stream. Much quicker then an
object for each row. I didn't use the XML writers though, I figured
the column objects would be enough info to deserialize the data
correctly. I'd assume the serialization part would be about the same
as XML->StringBuilder but the transport over the wire may be smaller
(no XML tags) Haven't tested though, just happy to get serialization
part working fairly efficiently

brian




"Geert Depickere" <x.x (AT) xnospamx (DOT) x> wrote

Quote:
Brian,

Thanks for the confirmation!

About the BinaryDataTable article : I experimented with this also, because I
was having problems with serializing very large datasets.

My main problem was the HUGE memory consumption of the remoting server
process (windows service) (and the long time it takes to serialize).

I managed to solve both problems (reasonable memory usage + performance) by
making a BinaryDataSet class that takes serialization in its own hands. I
just serialize the dataset to a StringBuilder (through a StringWriter, a
XmlTextWriter (WriteStartDocument and WriteXml) and then add the ToString()
of the StringBuilder to the SerializationInfo.

"Brian Flood" <bFlood (AT) spatialDataLogic (DOT) com> wrote in message
news:a161d0c7.0307010851.79d0e981 (AT) posting (DOT) google.com...
This is exactly the results that I saw as well.

If you create a new array of objects for every batch, the data is
serialized correctly but (as you would expect), the code takes alot
longer to execute. I believe the batch approach does improve upon a
single large array though.

In the end, to improve DataTable transport, I changed the
BinaryDataTable example from MSDN to serialize all of the rows to
StringBuilder object and then placed the single string in the stream.
I left the columns as indivdual objects

Instead of using verbose XML (as the DataTable object does itself), I
just used commas and pipes to seperate the row data. While not as fast
as SQLCLient, it was a dramatic improvement
over serializing each row as an array of objects. YMMV

brian


"Geert Depickere" <x.x (AT) xnospamx (DOT) x> wrote in message
news:<eGkN1Q7PDHA.2424 (AT) tk2msftngp13 (DOT) phx.gbl>...
Hello Trond,

I have read your article (and newsgroup discussion) and have
experimented
with your code.

I think I have found a problem in your code : If I use it as you
provided
it, I do not get all data at the other side of the remoting : I get the
data
for the last complete batch of 500 items + data for the extra non
complete
batch.

In the extract below, should not the line marked as "//<<< THIS LINE" be
where "//<<< HERE" is ?

...

public void GetObjectData(SerializationInfo info, StreamingContext
context){

int batchSize = 500;

info.AddValue("Count", (Int32)this.Count);

info.AddValue("BatchSize", (Int32)batchSize);

ChapterInfo[] chaptersT = new ChapterInfo[batchSize]; //<<< THIS LINE

int next = 0;

for (int i = 0; i < InnerList.Count / batchSize; i++) {

for (int j = 0; j < batchSize; j++) {

//<<< HERE

chaptersT[j] = (ChapterInfo)InnerList[i*batchSize + j];

}

info.AddValue(i.ToString(), chaptersT);

next = i + 1;

}

...

I think that you are adding (InnerList.Count/batchSize) times the same
chaptersT object to info, and hence you are not pulling the 60000 items
over
the link, only 500 + the number of items in the incomplete batch.

Please check this and let me know.

Greetings from Belgium.

Geert Depickere

"Peter Morris" <support (AT) _nospam_droopyeyes (DOT) com> wrote in message
news:u$0iTy#ODHA.2788 (AT) TK2MSFTNGP10 (DOT) phx.gbl...
http://www.howtodothings.com/showart...sp?article=611



Reply With Quote
  #5  
Old   
TEK
 
Posts: n/a

Default Re: Extremly bad peformance when serializing objects - 07-05-2003 , 06:10 PM



Hello

Yes, that's confirmed.
I'm actually been looking at this with real life data the last couple of
days, and have found a bug or two.

The code added at the end is working as it should, you should however read
my "very soon to come" input here about a annoying problem with serialized
collections in combination with objects that implements the ISerializable
interface.

According to my futher speed tests, it is also indications that the speed
problem is mutch less severe when acting with real life data (not
generated), likely because realtime data supplies mutch better hash values.

regards, TEK

//I have tested around with the IDeserializationCallback interface, but just
ignore that code (it's commented out, so it should not be a problem)

[Serializable]

public class BaseCollection : System.Collections.CollectionBase, ICloneable,
ISerializable/*, IDeserializationCallback*/{

//private object[] _deserializationHelper = null;

public BaseCollection(){

}

protected BaseCollection(SerializationInfo info, StreamingContext context) :
base() {

int batchSize = info.GetInt32("BatchSize");

int count = info.GetInt32("Count");

int numBatches = count/batchSize;

if(count%batchSize != 0)

numBatches += 1;

if(count > 0){

Type t = (Type)info.GetValue("type", typeof(Type));


//_deserializationHelper = new object[count];

for(int i = 0; i < numBatches; i++){

object[] obj = (object[])info.GetValue(i.ToString(),typeof(object[]));

for(int j = 0; j < obj.Length; j++){

//_deserializationHelper[(i * batchSize) + j] = obj[j];

InnerList.Add(obj[j]);

}

}

}

}

/*

public virtual void OnDeserialization(Object sender) {

// After being deserialized, initialize the m_area field

// using the deserialized m_radius value.

InnerList.AddRange(_deserializationHelper);

_deserializationHelper = null;

}

*/

public virtual void GetObjectData(SerializationInfo info, StreamingContext
context){

int batchSize = 500;

info.AddValue("Count", (Int32)this.Count);

info.AddValue("BatchSize", (Int32)batchSize);


if(this.Count > 0){

Type t = this[0].GetType();

info.AddValue("type", t);


object[] objectT = null;

int next = 0;

for (int i = 0; i < InnerList.Count / batchSize; i++) {

objectT = new object[batchSize];

for (int j = 0; j < batchSize; j++) {

objectT[j] = InnerList[i*batchSize + j];

}

info.AddValue(i.ToString(), objectT);

next = i + 1;

}

objectT = new object[InnerList.Count - ((InnerList.Count / batchSize) *
batchSize)];


// Serialize the incomplete batch.

if (objectT.Length > 0) {

for (int i = (InnerList.Count / batchSize) * batchSize, j = 0; i <

InnerList.Count; i++, j++) {

objectT[j] = InnerList[i];

}

info.AddValue(next.ToString(), objectT);

}

}

}



"Geert Depickere" <x.x (AT) xnospamx (DOT) x> skrev i melding
news:eGkN1Q7PDHA.2424 (AT) tk2msftngp13 (DOT) phx.gbl...
Quote:
Hello Trond,

I have read your article (and newsgroup discussion) and have experimented
with your code.

I think I have found a problem in your code : If I use it as you provided
it, I do not get all data at the other side of the remoting : I get the
data
for the last complete batch of 500 items + data for the extra non complete
batch.

In the extract below, should not the line marked as "//<<< THIS LINE" be
where "//<<< HERE" is ?

...

public void GetObjectData(SerializationInfo info, StreamingContext
context){

int batchSize = 500;

info.AddValue("Count", (Int32)this.Count);

info.AddValue("BatchSize", (Int32)batchSize);

ChapterInfo[] chaptersT = new ChapterInfo[batchSize]; //<<< THIS LINE

int next = 0;

for (int i = 0; i < InnerList.Count / batchSize; i++) {

for (int j = 0; j < batchSize; j++) {

//<<< HERE

chaptersT[j] = (ChapterInfo)InnerList[i*batchSize + j];

}

info.AddValue(i.ToString(), chaptersT);

next = i + 1;

}

...

I think that you are adding (InnerList.Count/batchSize) times the same
chaptersT object to info, and hence you are not pulling the 60000 items
over
the link, only 500 + the number of items in the incomplete batch.

Please check this and let me know.

Greetings from Belgium.

Geert Depickere

"Peter Morris" <support (AT) _nospam_droopyeyes (DOT) com> wrote in message
news:u$0iTy#ODHA.2788 (AT) TK2MSFTNGP10 (DOT) phx.gbl...
http://www.howtodothings.com/showart...sp?article=611







Reply With Quote
  #6  
Old   
TEK
 
Posts: n/a

Default Re: Extremly bad peformance when serializing objects - 07-05-2003 , 06:22 PM



Ehh...

I forgot to comment on the speed issue...
This seems to give a shoot in the back of the speed finding for the first
application, I have not had the time to verify how mutch lower the effect
is...

I do no longer have the test data avilable, but my findings for now is:

17:000 objects, transferred 10 times (170k) locally, equals to 45 seconds.
Each bulk 17000 transfer creates a 3134KB file to serialize...

It also seems like this is demanding a LOT of processing power on the
receiving end, so I'm looking for other ways as well.
Will look into your suggestions about the DataTable

Regards, TEK
"TEK" <trondeirikkolloen (AT) hotmail (DOT) com> skrev i melding
news:uqaMcJ0QDHA.2332 (AT) TK2MSFTNGP10 (DOT) phx.gbl...
Quote:
Hello

Yes, that's confirmed.
I'm actually been looking at this with real life data the last couple of
days, and have found a bug or two.

The code added at the end is working as it should, you should however read
my "very soon to come" input here about a annoying problem with serialized
collections in combination with objects that implements the ISerializable
interface.

According to my futher speed tests, it is also indications that the speed
problem is mutch less severe when acting with real life data (not
generated), likely because realtime data supplies mutch better hash
values.

regards, TEK

//I have tested around with the IDeserializationCallback interface, but
just
ignore that code (it's commented out, so it should not be a problem)

[Serializable]

public class BaseCollection : System.Collections.CollectionBase,
ICloneable,
ISerializable/*, IDeserializationCallback*/{

//private object[] _deserializationHelper = null;

public BaseCollection(){

}

protected BaseCollection(SerializationInfo info, StreamingContext context)
:
base() {

int batchSize = info.GetInt32("BatchSize");

int count = info.GetInt32("Count");

int numBatches = count/batchSize;

if(count%batchSize != 0)

numBatches += 1;

if(count > 0){

Type t = (Type)info.GetValue("type", typeof(Type));


//_deserializationHelper = new object[count];

for(int i = 0; i < numBatches; i++){

object[] obj = (object[])info.GetValue(i.ToString(),typeof(object[]));

for(int j = 0; j < obj.Length; j++){

//_deserializationHelper[(i * batchSize) + j] = obj[j];

InnerList.Add(obj[j]);

}

}

}

}

/*

public virtual void OnDeserialization(Object sender) {

// After being deserialized, initialize the m_area field

// using the deserialized m_radius value.

InnerList.AddRange(_deserializationHelper);

_deserializationHelper = null;

}

*/

public virtual void GetObjectData(SerializationInfo info, StreamingContext
context){

int batchSize = 500;

info.AddValue("Count", (Int32)this.Count);

info.AddValue("BatchSize", (Int32)batchSize);


if(this.Count > 0){

Type t = this[0].GetType();

info.AddValue("type", t);


object[] objectT = null;

int next = 0;

for (int i = 0; i < InnerList.Count / batchSize; i++) {

objectT = new object[batchSize];

for (int j = 0; j < batchSize; j++) {

objectT[j] = InnerList[i*batchSize + j];

}

info.AddValue(i.ToString(), objectT);

next = i + 1;

}

objectT = new object[InnerList.Count - ((InnerList.Count / batchSize) *
batchSize)];


// Serialize the incomplete batch.

if (objectT.Length > 0) {

for (int i = (InnerList.Count / batchSize) * batchSize, j = 0; i

InnerList.Count; i++, j++) {

objectT[j] = InnerList[i];

}

info.AddValue(next.ToString(), objectT);

}

}

}



"Geert Depickere" <x.x (AT) xnospamx (DOT) x> skrev i melding
news:eGkN1Q7PDHA.2424 (AT) tk2msftngp13 (DOT) phx.gbl...
Hello Trond,

I have read your article (and newsgroup discussion) and have
experimented
with your code.

I think I have found a problem in your code : If I use it as you
provided
it, I do not get all data at the other side of the remoting : I get the
data
for the last complete batch of 500 items + data for the extra non
complete
batch.

In the extract below, should not the line marked as "//<<< THIS LINE" be
where "//<<< HERE" is ?

...

public void GetObjectData(SerializationInfo info, StreamingContext
context){

int batchSize = 500;

info.AddValue("Count", (Int32)this.Count);

info.AddValue("BatchSize", (Int32)batchSize);

ChapterInfo[] chaptersT = new ChapterInfo[batchSize]; //<<< THIS LINE

int next = 0;

for (int i = 0; i < InnerList.Count / batchSize; i++) {

for (int j = 0; j < batchSize; j++) {

//<<< HERE

chaptersT[j] = (ChapterInfo)InnerList[i*batchSize + j];

}

info.AddValue(i.ToString(), chaptersT);

next = i + 1;

}

...

I think that you are adding (InnerList.Count/batchSize) times the same
chaptersT object to info, and hence you are not pulling the 60000 items
over
the link, only 500 + the number of items in the incomplete batch.

Please check this and let me know.

Greetings from Belgium.

Geert Depickere

"Peter Morris" <support (AT) _nospam_droopyeyes (DOT) com> wrote in message
news:u$0iTy#ODHA.2788 (AT) TK2MSFTNGP10 (DOT) phx.gbl...
http://www.howtodothings.com/showart...sp?article=611









Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.