
Back in 2006 I had some code that used ImageMagick's Magick++ library to read and write images in various formats and do some basic processing. That library works well and is very feature-full, but is not what you would call "Boost like", so I was excited when GIL came along. I looked at it at the time of its review, and I thought at first that I must have missed a big chunk of the documentation where all the algorithms were described! Of course, it has no image processing algorithms; it's just a layer; if you want to scale or rotate or merge images, you have to do that yourself. I don't think I wrote a review in the end; my feeling was - in comparison with ImageMagick - "come back when you've finished" - but I didn't want to say that in public. So it's great that now, 4 years later!, we are finally getting some GIL extensions. Thank you Christian for - I hope! - getting the ball rolling. I hope we'll see more. Having said that, I suspect that even with lots of extensions GIL might not really be what I need. Here's a typical problem that I will have to address in the next few weeks: I have a 1e12 pixel image - a map - which is supplied as a few thousand 5000x5000 TIFF tiles; I have to join them together, tweak the colours, draw some lines on top, and re-chop it into a few million 256x256 PNG tiles. The issue of course is that that image can't be held in RAM. I will do this with my own wrappers around libtiff and libpng, in rows or chunks. So, do the image concepts (and concrete classes) that GIL defines accommodate that sort of problem? Even if they do, is a "virtual image" of some sort a good way of doing it? I suspect not. But even if the image concept is not useful, presumably other GIL concepts like pixel formats and algorithms in future extensions (pixel format conversion, blending) could still be useful. So, along comes Christian's IO extension. Wrapping the "legacy" image libraries is something that I have done a few times now and it can be painful, not least because of the need to read a lot of documentation; the libraries all differ and it would be great to get a unified set of wrappers that provide a sane C++ interface. But as you can see, I don't really want something that reads and writes complete GIL images, and that is what Christian's extension does. Ideally, this extension would have separated out the wrapping of the libraries from the actual GIL image integration, with the GIL image integration just being a layer on top of the image library wrappers. Since it doesn't do that, it's not much use to me. It is, however, better than what GIL currently has. So for that reason I suggest that it should be accepted.
- What is your evaluation of the design?
In addition to my comments above, I note that it's necessary to jump through hoops (using Boost.IOStreams) just to read an image from memory. Surely this should be a fundamental operation; it would be much better to have a ctor that takes begin and end pointers for the encoded image data.
- What is your evaluation of the implementation?
When I first looked at the library I noticed that the error handling was clearly broken. Although that seems to have been resolved, it has given me a poor impression of the quality (i.e. likely correctness) of the implementation. It seems that this error handling has received almost no testing. Thorough testing will be difficult for this library because as well as all the usual permutations of platform features, compilers etc. the variations in the image libraries must also be considered. The library version and compiler settings could both affect the correct operation of this code. I have some concerns about the likely performance of the code due to excessive copying of the data both in and out of the image library. Although I've not measured the performance impact, I feel it would be very desirable (and is certainly possible) for this library to have zero copying.
- What is your evaluation of the documentation?
Satisfactory. (As a meta-comment - I think libraries being reviewed really should have their docs available online somewhere, and preferably also their source code in a source browser of some sort. In fact, I thought this was already a requirement. Can it be added to the review managers' checklist for future reviews?)
- What is your evaluation of the potential usefulness of the extensions?
It's better than what was there before.
- Did you try to use the extensions? With what compiler? Did you have any problems?
No, I've not compiled anything.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Probably half a day during the review, and a similar amount of time a few months ago.
- Are you knowledgeable about the problem domain?
I've written a few wrappers for libjpeg, libpng and libtiff. I've never used GIL though I've read extensively about it.
And finally, every review should answer this question: - Do you think the extensions should be accepted as a part of Boost.GIL library?
On the grounds that this is better than what is currently there, and on the assumption that Domagoj Saric is not imminently going to post something else that would supersede this, I believe that it should be accepted. (I have not looked at the "toolbox" at all, and I've not seen it mentioned in any other reviews yet. My "yes" is just for the io_new functionality.) Regards, Phil.

Hi Phil, thanks for your review. It's very appreciated. On Tue, Dec 7, 2010 at 2:48 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Back in 2006 I had some code that used ImageMagick's Magick++ library to read and write images in various formats and do some basic processing. That library works well and is very feature-full, but is not what you would call "Boost like", so I was excited when GIL came along. I looked at it at the time of its review, and I thought at first that I must have missed a big chunk of the documentation where all the algorithms were described! Of course, it has no image processing algorithms; it's just a layer; if you want to scale or rotate or merge images, you have to do that yourself. I don't think I wrote a review in the end; my feeling was - in comparison with ImageMagick - "come back when you've finished" - but I didn't want to say that in public.
I guess, what you're hoping for is a full-fledged image processing tool chain. That's a huge undertaking if you ask me. Not sure if the boost community could ever agree on such a huge library. Like a boost user interface library or a boost XML lib. It's just very hard to suite all use cases.
Having said that, I suspect that even with lots of extensions GIL might not really be what I need. Here's a typical problem that I will have to address in the next few weeks: I have a 1e12 pixel image - a map - which is supplied as a few thousand 5000x5000 TIFF tiles; I have to join them together, tweak the colours, draw some lines on top, and re-chop it into a few million 256x256 PNG tiles. The issue of course is that that image can't be held in RAM. I will do this with my own wrappers around libtiff and libpng, in rows or chunks. So, do the image concepts (and concrete classes) that GIL defines accommodate that sort of problem? Even if they do, is a "virtual image" of some sort a good way of doing it? I suspect not. But even if the image concept is not useful, presumably other GIL concepts like pixel formats and algorithms in future extensions (pixel format conversion, blending) could still be useful.
Quite a monumental task, I must admit. For use cases like this I'm afraid you always have to resort to some homegrown code. I mean someone has to manage your memory and gil is not a memory manager.
So, along comes Christian's IO extension. Wrapping the "legacy" image libraries is something that I have done a few times now and it can be painful, not least because of the need to read a lot of documentation; the libraries all differ and it would be great to get a unified set of wrappers that provide a sane C++ interface. But as you can see, I don't really want something that reads and writes complete GIL images, and that is what Christian's extension does. Ideally, this extension would have separated out the wrapping of the libraries from the actual GIL image integration, with the GIL image integration just being a layer on top of the image library wrappers.
Mhmm, what would the first wrapper do, apart from error handling and providing a c++ interface for callbacks? What kind of buffer type would you use? I'm intrigued by your idea. But need more information.
- What is your evaluation of the design?
In addition to my comments above, I note that it's necessary to jump through hoops (using Boost.IOStreams) just to read an image from memory. Surely this should be a fundamental operation; it would be much better to have a ctor that takes begin and end pointers for the encoded image data.
Do you think io_new should provide in-memory streams, as well? So many other libs do that already? I mean there is std::stringstream, boost::iostreams, Fast Format ( http://www.fastformat.org/ ), and more.
- What is your evaluation of the implementation?
When I first looked at the library I noticed that the error handling was clearly broken. Although that seems to have been resolved, it has given me a poor impression of the quality (i.e. likely correctness) of the implementation.
The fix you mention was added over a year ago. Yes, like every software piece, even io_new has some bugs. ( sorry for the sarcasm ) One reason to bring io_new into boost mainstream is to have some more people to help out.
It seems that this error handling has received almost no testing. Thorough testing will be difficult for this library because as well as all the usual permutations of platform features, compilers etc. the variations in the image libraries must also be considered. The library version and compiler settings could both affect the correct operation of this code.
Yes it's pretty complicated and finicky. I'm actually quite happy the test suite runs on Linux, as well. I don't use Linux or gcc.
I have some concerns about the likely performance of the code due to excessive copying of the data both in and out of the image library. Although I've not measured the performance impact, I feel it would be very desirable (and is certainly possible) for this library to have zero copying.
You're right there are cases when zero copying is possible but there might be fewer cases than you think. For instance, reading a rgb8_image_t from a bmp still requires some channel shuffling since bmp stores the image as bgr8_image_t. Regards, Christian

Christian Henning wrote:
Ideally, this extension would have separated out the wrapping of the libraries from the actual GIL image integration, with the GIL image integration just being a layer on top of the image library wrappers.
Mhmm, what would the first wrapper do, apart from error handling and providing a c++ interface for callbacks?
Not much more.
What kind of buffer type would you use?
For the decoded data? Probably just raw memory.
I'm intrigued by your idea. But need more information.
Haven't we discussed this about 3 times before? Anyway, here's an online: class ReadJpegImage { public: ReadJpegImage(std::string filename); ReadJpegImage(const char* filename); ReadJpegImage(const char* begin, const char* end); size_t width() const; size_t height() const; pixel_format_t pixel_format() const; template <typename pixel_t> void read_rows(T* data, size_t n_rows); enum dct_e { dct_float, dct_int, dct_fastint }; dct_e dct_method() const; void dct_method(dct_e d); }; void read_jpeg_image(const char* filename, boost::gil::any_image& image) { ReadJpegImage i(filename); // ...read from i into image... }
In addition to my comments above, I note that it's necessary to jump through hoops (using Boost.IOStreams) just to read an image from memory. ?Surely this should be a fundamental operation; it would be much better to have a ctor that takes begin and end pointers for the encoded image data.
Do you think io_new should provide in-memory streams, as well? So many other libs do that already? I mean there is std::stringstream, boost::iostreams, Fast Format ( http://www.fastformat.org/ ), and more.
No no, that's precisely what it shouldn't do. I don't want all the extra layers of code that that involves. I just want a ctor that takes begin and end pointers for the encoded image data.
I have some concerns about the likely performance of the code due to excessive copying of the data both in and out of the image library. ?Although I've not measured the performance impact, I feel it would be very desirable (and is certainly possible) for this library to have zero copying.
You're right there are cases when zero copying is possible but there might be fewer cases than you think. For instance, reading a rgb8_image_t from a bmp still requires some channel shuffling since bmp stores the image as bgr8_image_t.
Well I've never needed to read an rgb8 image from a BMP file. Channel shuffling is exactly the case when copying is necessary. But I find it's more common to not need copying - and in particular, copying the encoded data should not be necessary. Regards, Phil.

"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1291751283459@dmwebmail.dmwebmail.chezphil.org...
Having said that, I suspect that even with lots of extensions GIL might not really be what I need. Here's a typical problem that I will have to address in the next few weeks: I have a 1e12 pixel image - a map - which is supplied as a few thousand 5000x5000 TIFF tiles; I have to join them together, tweak the colours, draw some lines on top, and re-chop it into a few million 256x256 PNG tiles. The issue of course is that that image can't be held in RAM. I will do this with my own wrappers around libtiff and libpng, in rows or chunks.
Hi Phil, if you are willing to try (and can work/test with MSVC on Windows) I think io2 should already be able to help you here (partial ROI based access to big images) at least with the WIC backend... The LibTIFF backend is unfortunately still not ready for this as I haven't yet implemented full/2D ROI access for it (both because of lack of time and because I first wanted to have an open discussion on how it should be done for such particular backends, the related LibTIFF issue in question was discussed near the end of the "[gil] New IO release" thread)... Example code demonstrating a possible (skeleton) solution: http://codepad.org/WD7CpIJ8 ... ps. unfortunately I do not have access to such a huge image to test whether WIC can actually handle such monster images...
- Do you think the extensions should be accepted as a part of Boost.GIL library?
On the grounds that this is better than what is currently there, and on the assumption that Domagoj Saric is not imminently going to post something else that would supersede this, I believe that it should be accepted.
If we've already waited so long, why rush if some of us agree that there are still things that need 'polishing'... You said you already did some LibXXX wrappers of your own...if this is so...why not join the effort even if only temporary to make sure you get what you want...Christian also seems open for cooperation... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

"Domagoj Saric" <domagoj.saric@littleendian.com> wrote in message news:idnk54$p0b$1@dough.gmane.org...
Example code demonstrating a possible (skeleton) solution: http://codepad.org/WD7CpIJ8 ...
Which of course needs to be fixed not read only diagonal tiles :D -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

Domagoj Saric wrote:
"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1291751283459@dmwebmail.dmwebmail.chezphil.org...
Having said that, I suspect that even with lots of extensions GIL might not really be what I need. Here's a typical problem that I will have to address in the next few weeks: I have a 1e12 pixel image - a map - which is supplied as a few thousand 5000x5000 TIFF tiles; I have to join them together, tweak the colours, draw some lines on top, and re-chop it into a few million 256x256 PNG tiles. The issue of course is that that image can't be held in RAM. I will do this with my own wrappers around libtiff and libpng, in rows or chunks.
Hi Phil, if you are willing to try (and can work/test with MSVC on Windows) I think io2 should already be able to help you here (partial ROI based access to big images) at least with the WIC backend...
I'm doing this on Linux. Wikipedia tells me WIC is "Windows Imaging Component".
Example code demonstrating a possible (skeleton) solution: http://codepad.org/WD7CpIJ8 ...
I don't really follow what that code is doing, and it's not obvious to me what its memory footprint will be. In contrast, I think I can write something that's not much longer and more obviously correct by using the sort of simple wrappers around the libraries that I have been proposing and explicitly managing the tiled input and output: class TiledReadImage { typedef shared_ptr<ReadTiff> readtiff_ptr; readtiff_ptr images[1400]; // One row of input tiles int rownum; // in pixels void open_next_row() { for (int c = 0; c < 1400; ++c) { images[c] = new ReadTiff(input_tile_filename(c,rownum/5000); } } public: TiledReadImage(): rownum(0) {} read_row(pixel_t* data) { if (rownum%5000 == 0) { open_next_row(); } for (int c = 0; c < 1400; ++c) { images[c]->read_row(data+5000*c); } ++rownum; } }; // Something similar for TiledWriteImage TiledReadImage i; TiledWriteImage o; pixel_t data[700000]; for (int row=0; row<1300000; ++row) { i.read_row(data); o.write_row(data); } Anyone can look at that and see that it keeps 1400 input files open (bad) and needs 700 kwords of buffer memory (good). If you wanted to have fewer files open you could code explicitly something else that had one input file open (good) and buffered complete tiles (probably 2 x 5000 x 700,000 = 7 GBytes) (bad).
ps. unfortunately I do not have access to such a huge image to test whether WIC can actually handle such monster images...
If you're interested in experimenting, I suggest making random input tiles or just replicating them.
On the grounds that this is better than what is currently there, and on the assumption that Domagoj Saric is not imminently going to post something else that would supersede this, I believe that it should be accepted.
If we've already waited so long, why rush if some of us agree that there are still things that need 'polishing'...
To get the ball moving on GIL extensions, and because this is better than what GIL currently has. Past experience suggests that the existence of this io extension will not prevent other similar and incompatible things (i.e. yours) from being accepted in the future.
You said you already did some LibXXX wrappers of your own...if this is so...why not join the effort even if only temporary to make sure you get what you want...Christian also seems open for cooperation...
My wrappers have all been written to implement some subset of the functionality that I needed at the time. For example I have never implemented any of the error handling stuff, and in some cases I have only read or write but not both. Regards, Phil.

Hi, I have a problem with the deserialization of objects defined in multiple dlls. I define the following classes in a first dll: polymorphic_base (pure virtual) and polymorphic_derived1 (derived from polymorphic_base). In a second dll, I define the class polymorphic_derived2 (derived from polymorphic_derived1). In an executable, I serialize an object of type polymorphic_derived2 followed and two objects of type polymorphic_derived1 in the order defined in this function: void save_exported (const char *testfile) { std::ofstream os(testfile); xml_oarchive oa(os); polymorphic_base * pb_ptr_1 = new polymorphic_derived1; oa << BOOST_SERIALIZATION_NVP(pb_ptr_1); polymorphic_base * pb_ptr_2 = new polymorphic_derived2; oa << BOOST_SERIALIZATION_NVP(pb_ptr_2); polymorphic_base * pb_ptr_3 = new polymorphic_derived1; oa << BOOST_SERIALIZATION_NVP(pb_ptr_3); } The serialization works, but when, I try to deserialyze with this function: void load_exported(const char *testfile) { std::ifstream is(testfile); xml_iarchive ia(is); polymorphic_base * pb_ptr_1 = NULL; ia >> BOOST_SERIALIZATION_NVP(pb_ptr_1); polymorphic_base * pb_ptr_2 = NULL; ia >> BOOST_SERIALIZATION_NVP(pb_ptr_2); polymorphic_base * pb_ptr_3 = NULL; ia >> BOOST_SERIALIZATION_NVP(pb_ptr_3); // <= exception } an exception is thrown on the deserialization of the second object of type polymorphic_derived1 but it works for the first one. in the file basic_archive.cpp, line 466 : "pending_bis = & bpis_ptr->get_basic_serializer();" bpis_ptr is null. It seems that the deserialization of the object of type polymorphic_derived2 (derived from polymorphic_derived1) has broken the part of table cobject_id_vector concerning polymorphic_derived1. If polymorphic_derived2 is defined in the first dll, there is no problem. Is it possible to make it works with 2 dll? Regards, Yohan

pipalapop wrote:
Hi, I have a problem with the deserialization of objects defined in multiple dlls. I define the following classes in a first dll: polymorphic_base (pure virtual) and polymorphic_derived1 (derived from polymorphic_base). In a second dll, I define the class polymorphic_derived2 (derived from polymorphic_derived1). In an executable, I serialize an object of type polymorphic_derived2 followed and two objects of type polymorphic_derived1 in the order defined in this function:
void save_exported (const char *testfile) { std::ofstream os(testfile); xml_oarchive oa(os);
polymorphic_base * pb_ptr_1 = new polymorphic_derived1; oa << BOOST_SERIALIZATION_NVP(pb_ptr_1);
polymorphic_base * pb_ptr_2 = new polymorphic_derived2; oa << BOOST_SERIALIZATION_NVP(pb_ptr_2);
polymorphic_base * pb_ptr_3 = new polymorphic_derived1; oa << BOOST_SERIALIZATION_NVP(pb_ptr_3);
}
The serialization works, but when, I try to deserialyze with this function:
void load_exported(const char *testfile) { std::ifstream is(testfile); xml_iarchive ia(is);
polymorphic_base * pb_ptr_1 = NULL; ia >> BOOST_SERIALIZATION_NVP(pb_ptr_1);
polymorphic_base * pb_ptr_2 = NULL; ia >> BOOST_SERIALIZATION_NVP(pb_ptr_2);
polymorphic_base * pb_ptr_3 = NULL; ia >> BOOST_SERIALIZATION_NVP(pb_ptr_3); // <= exception
}
an exception is thrown on the deserialization of the second object of type polymorphic_derived1 but it works for the first one.
in the file basic_archive.cpp, line 466 : "pending_bis = & bpis_ptr->get_basic_serializer();" bpis_ptr is null.
It seems that the deserialization of the object of type polymorphic_derived2 (derived from polymorphic_derived1) has broken the part of table cobject_id_vector concerning polymorphic_derived1.
If polymorphic_derived2 is defined in the first dll, there is no problem.
Is it possible to make it works with 2 dll?
probably, but would required some investigation. Robert Ramey
Regards, Yohan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Thanks for your answer. Do you think that there is a problem in the API to support this form of implementation? If not, have you an idea to allow it? Regards, Yohan --- En date de : Mer 8.12.10, Robert Ramey <ramey@rrsd.com> a écrit :
Hi, I have a problem with the deserialization of objects defined in multiple dlls. I define the following classes in a first dll:
virtual) and polymorphic_derived1 (derived from
In a second dll, I define the class
from polymorphic_derived1). In an executable, I serialize an object of type
followed and two objects of type polymorphic_derived1 in the order defined in this function:
void save_exported (const char *testfile) { std::ofstream os(testfile); xml_oarchive oa(os);
polymorphic_base * pb_ptr_1 = new
oa << BOOST_SERIALIZATION_NVP(pb_ptr_1);
polymorphic_base * pb_ptr_2 = new
oa << BOOST_SERIALIZATION_NVP(pb_ptr_2);
polymorphic_base * pb_ptr_3 = new
oa << BOOST_SERIALIZATION_NVP(pb_ptr_3);
}
The serialization works, but when, I try to deserialyze with this function:
void load_exported(const char *testfile) { std::ifstream is(testfile); xml_iarchive ia(is);
polymorphic_base * pb_ptr_1 = NULL; ia >> BOOST_SERIALIZATION_NVP(pb_ptr_1);
polymorphic_base * pb_ptr_2 = NULL; ia >> BOOST_SERIALIZATION_NVP(pb_ptr_2);
polymorphic_base * pb_ptr_3 = NULL; ia >> BOOST_SERIALIZATION_NVP(pb_ptr_3); // <= exception
}
an exception is thrown on the deserialization of the second object of type polymorphic_derived1 but it works for the first one.
in the file basic_archive.cpp, line 466 : "pending_bis = & bpis_ptr->get_basic_serializer();" bpis_ptr is null.
It seems that the deserialization of the object of type polymorphic_derived2 (derived from
the part of table cobject_id_vector concerning
De: Robert Ramey <ramey@rrsd.com> Objet: Re: [boost] serialization in multiple dlls À: boost@lists.boost.org Date: Mercredi 8 décembre 2010, 19h04 pipalapop wrote: polymorphic_base (pure polymorphic_base). polymorphic_derived2 (derived polymorphic_derived2 polymorphic_derived1; polymorphic_derived2; polymorphic_derived1; polymorphic_derived1) has broken polymorphic_derived1.
If polymorphic_derived2 is defined in the first dll,
there is no
problem.
Is it possible to make it works with 2 dll?
probably, but would required some investigation.
Robert Ramey
Regards, Yohan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1291823931996@dmwebmail.dmwebmail.chezphil.org...
I'm doing this on Linux. Wikipedia tells me WIC is "Windows Imaging Component".
Yes, that's why I asked whether you can test this on Windows (as the LibTIFF wrapper is not ready for it, 2D ROI access, yet)...
Example code demonstrating a possible (skeleton) solution: http://codepad.org/WD7CpIJ8 ...
I don't really follow what that code is doing,
Hmm...if I understood your initial example/use case correctly you have a huge TIFF comprised of 5k-x-5k tiles that you need to load, edit and then rechop into 256x256 PNG files... So the example code assumes 24 bit RGB input (for the sake of an example) and: - allocates the input_tile_holder ( 5000x5000x24bit ~ 75 MB) - creates a WIC reader object for the large input TIFF - allocates the output_tile_holder (256x256x24bit) - reads individual input tiles in a loop (but incorrectly, as it reads only diagonal tiles) - writes output files within the above loop... Perhaps the syntax for specifying ROI based access is confusing (e.g. "offset_view()" marks the source view/creates from it a ROI target which is not readily obvious from the name of the utility function...I'll have to rename it)...
and it's not obvious to me what its memory footprint will be.
Well, with io2, it all depends on the backend...if the backend is 'smart enough' to read in only the required parts of an image (and WIC theoretically/as per documentation should be) the footprint should be obvious as io2 (when the target is not a virtual image but a raw in memory image as in your case) does not allocate any memory (nor it performs any additional/redundant data copying...in general it strives for zero overhead, both for CPU and RAM)...so the only allocated memory is that allocated by the user (in this case, input_tile_holder and output_tile_holder) and by the backend...
In contrast, I think I can write something that's not much longer and more obviously correct by using the sort of simple wrappers around the libraries that I have been proposing and explicitly managing the tiled input and output:
...snipped code...
Anyone can look at that and see that it keeps 1400 input files open (bad) and needs 700 kwords of buffer memory (good). If you wanted to have fewer files open you could code explicitly something else that had one input file open (good) and buffered complete tiles (probably 2 x 5000 x 700,000 = 7 GBytes) (bad).
This example seems different from the one you gave in the first post (or I misunderstood both)...Now you seem to have an image that is (1400*5000) pixels wide and 1300000 pixels tall and that is not actually a single file but the 5k-x-5k tiles preseparated into individual files...and it misses the 'editing' logic and saving to 256x256 PNGs...As I don't see what it is actually trying to do with the input data I cannot know whether you actually need to load entire rows of tiles (the 1400 files) but doesn't such an approach defeat the purpose of tiles in the first place? I can only, as a side note, say that the shared_ptrs are an overkill...in fact, if the number 1400 is really fixed/known at compile-time the individual ReadTiff heap allocations are unnecessary...
ps. unfortunately I do not have access to such a huge image to test whether WIC can actually handle such monster images...
If you're interested in experimenting, I suggest making random input tiles or just replicating them.
Luckily I found this http://www.unearthedoutdoors.net/global_data/true_marble/download :) If you want, I can take the largest TIFF there and chop it out into 256x256 PNGs and measure the RAM and CPU time usage (or do some other test if you can define it clearly)...
If we've already waited so long, why rush if some of us agree that there are still things that need 'polishing'...
To get the ball moving on GIL extensions, and because this is better than what GIL currently has. Past experience suggests that the existence of this io extension will not prevent other similar and incompatible things (i.e. yours) from being accepted in the future.
Can I ask to which librar(y/ies) does your past experience refer to, as to me the practice seems quite the opposite (e.g. even after years of complaints and different proposals Boost.Function still has not changed, or the fact that we have two signals and two regex libraries)...
You said you already did some LibXXX wrappers of your own...if this is so...why not join the effort even if only temporary to make sure you get what you want...Christian also seems open for cooperation...
My wrappers have all been written to implement some subset of the functionality that I needed at the time. For example I have never implemented any of the error handling stuff, and in some cases I have only read or write but not both.
That's what I meant, you could look at the proposed code and, if it does not suit you, propose a change/patch... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 09/12/10 12:09, Domagoj Saric wrote:
"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message
If you're interested in experimenting, I suggest making random input tiles or just replicating them.
Luckily I found this http://www.unearthedoutdoors.net/global_data/true_marble/download :)
If you want, I can take the largest TIFF there and chop it out into 256x256 PNGs and measure the RAM and CPU time usage (or do some other test if you can define it clearly)...
I'd be interested in making similar tests myself. I have access to georeferenced raster datasets as large as 90K x 45K pixels. I also have extensive experience with GDAL (http://www.gdal.org) library which I have used to process such datasets with success However, I'm still missing specification of operations to be performed. Can we describe Phil's use case in form of reproducible steps, so we can pprogram this use case using with various toolkits/libraries? As the datasets I have are not public data, I could use the True Marble Imagery so the results are comparable. What you think? Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D00CAC1.1010509@loskot.net...
I'd be interested in making similar tests myself. I have access to georeferenced raster datasets as large as 90K x 45K pixels. I also have extensive experience with GDAL (http://www.gdal.org) library which I have used to process such datasets with success
However, I'm still missing specification of operations to be performed. Can we describe Phil's use case in form of reproducible steps, so we can pprogram this use case using with various toolkits/libraries? As the datasets I have are not public data, I could use the True Marble Imagery so the results are comparable.
What you think?
I'm all for it...actually I think we should skip any intermediate 'editing' operations (as these will only obfuscate the test and the results) and just test something simple, like chopping up a huge TIFF into small PNGs... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 09/12/10 12:43, Domagoj Saric wrote:
"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D00CAC1.1010509@loskot.net...
I'd be interested in making similar tests myself. I have access to georeferenced raster datasets as large as 90K x 45K pixels. I also have extensive experience with GDAL (http://www.gdal.org) library which I have used to process such datasets with success
However, I'm still missing specification of operations to be performed. Can we describe Phil's use case in form of reproducible steps, so we can pprogram this use case using with various toolkits/libraries? As the datasets I have are not public data, I could use the True Marble Imagery so the results are comparable.
What you think?
I'm all for it...actually I think we should skip any intermediate 'editing' operations (as these will only obfuscate the test and the results) and just test something simple, like chopping up a huge TIFF into small PNGs...
Good idea. 1. Which raster are we taking from the True Marble Imagery? 2. To keep things simpler, let's cut the raster to even tiles. For example, tiles of 200x200 pixels This will give us constant number of 11664 tiles for 21600x21600 raster. 3. No form of parallelism of the cutting procedure is assumed, right? 4. Cutting TIFF to PNG involves compression. If we are interested in raster access, RIO, I/O speed, perhaps we could stick to TIFF as output format as well. What you think? Speaking of the GDAL, I will write a chopper in C++ so we can compare the same technology. In the meantime, the library provides Python bindings and script dedicated to raster tiling: http://gdal.org/gdal2tiles.html The Python should not impose a significant overhead being useful for some quick prototyping. If anyone would be interested in taking part in this benchmark, GDAL for Windows is available through Cygwin-like installer: http://trac.osgeo.org/osgeo4w For Unix, it's trivial to build: http://trac.osgeo.org/gdal/wiki/BuildingOnUnix Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D00DA5D.3040400@loskot.net...
1. Which raster are we taking from the True Marble Imagery?
The largest one ( TrueMarble.250m. 21600x21600.E2.tif.gz 628MB) ?
2. To keep things simpler, let's cut the raster to even tiles. For example, tiles of 200x200 pixels This will give us constant number of 11664 tiles for 21600x21600 raster.
OK.
3. No form of parallelism of the cutting procedure is assumed, right?
Right.
4. Cutting TIFF to PNG involves compression. If we are interested in raster access, RIO, I/O speed, perhaps we could stick to TIFF as output format as well. What you think?
That depends on what exactly are we trying to test here, the C++ wrappers (e.g. io_new vs io2) and/or the backends (e.g. LibTIFF vs WIC) and/or something third... Are you sure the TrueMarble GeoTIFFs are uncompressed? -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 09/12/10 15:16, Domagoj Saric wrote:
"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D00DA5D.3040400@loskot.net...
1. Which raster are we taking from the True Marble Imagery?
The largest one ( TrueMarble.250m. 21600x21600.E2.tif.gz 628MB) ?
Good.
2. To keep things simpler, let's cut the raster to even tiles. For example, tiles of 200x200 pixels This will give us constant number of 11664 tiles for 21600x21600 raster.
OK.
OK. Though, GDAL itself reports the optimum block size as 512x512, see last 3 lines: gdalinfo.exe TrueMarble.250m.21600x21600.E2.tif Driver: GTiff/GeoTIFF Files: TrueMarble.250m.21600x21600.E2.tif Size is 21600, 21600 Coordinate System is: GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0], UNIT["degree",0.0174532925199433], AUTHORITY["EPSG","4326"]] Origin = (0.000000000000000,45.000000000000000) Pixel Size = (0.002083333333333,-0.002083333333333) Metadata: AREA_OR_POINT=Area Image Structure Metadata: INTERLEAVE=PIXEL Corner Coordinates: Upper Left ( 0.0000000, 45.0000000) ( 0d 0'0.01"E, 45d 0'0.00"N) Lower Left ( 0.0000000, 0.0000000) ( 0d 0'0.01"E, 0d 0'0.01"N) Upper Right ( 45.0000000, 45.0000000) ( 45d 0'0.00"E, 45d 0'0.00"N) Lower Right ( 45.0000000, 0.0000000) ( 45d 0'0.00"E, 0d 0'0.01"N) Center ( 22.5000000, 22.5000000) ( 22d30'0.00"E, 22d30'0.00"N) Band 1 Block=512x512 Type=Byte, ColorInterp=Red Band 2 Block=512x512 Type=Byte, ColorInterp=Green Band 3 Block=512x512 Type=Byte, ColorInterp=Blue It means, size of block for best efficiency of raster I/O operations. I will test 200x200 and 512x512 if time permits.
4. Cutting TIFF to PNG involves compression. If we are interested in raster access, RIO, I/O speed, perhaps we could stick to TIFF as output format as well. What you think?
That depends on what exactly are we trying to test here, the C++ wrappers (e.g. io_new vs io2) and/or the backends (e.g. LibTIFF vs WIC) and/or something third...
Actually, that was part of my initial question.
Are you sure the TrueMarble GeoTIFFs are uncompressed?
Yes, these TIFF (GeoTIFF in fact) files are uncompressed. That's why the TIFF files are provided as .gz files. The gdalinfo utility output I've provided above also does not display any compression properties. If the file was compressed, corresponding key-value would be reported, for example: Image Structure Metadata: COMPRESSION=LZW Here are details about this metadata: http://gdal.org/frmt_gtiff.html Also, Windows Explorer -> file Properties report it's uncompressed. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D00FF4B.7070901@loskot.net...
4. Cutting TIFF to PNG involves compression. If we are interested in raster access, RIO, I/O speed, perhaps we could stick to TIFF as output format as well. What you think?
That depends on what exactly are we trying to test here, the C++ wrappers (e.g. io_new vs io2) and/or the backends (e.g. LibTIFF vs WIC) and/or something third...
Actually, that was part of my initial question.
Hi, sorry for the delay... I went and tested ahead just to see what we are dealing here... This code http://codepad.org/eGMixUK1 (io2 using the WIC backend, unfortunately large TIFF support was added only to the latest WIC available on Windows 7 so the same code will not work on WinXP) opens the input TIFF and hacks it up into separate tile files. With an Intel i5@4.2 GHz with 4 GB RAM I got the following results: 200x200 PNG tiles ~ 65 seconds 512x512 PNG tiles ~ 69 seconds 512x512 TIFF tiles ~ 27 seconds RAM usage was below 5 MB the whole time (after working around an apparent leak in WIC that otherwise cause the RAM usage to crawl up to ~15 MB) and the binary is about 45kB... ... I guess these tasks are no longer so hard for modern hardware as my work desktop churns them out pretty fast... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 10/12/10 19:27, Domagoj Saric wrote:
"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D00FF4B.7070901@loskot.net...
4. Cutting TIFF to PNG involves compression. If we are interested in raster access, RIO, I/O speed, perhaps we could stick to TIFF as output format as well. What you think?
That depends on what exactly are we trying to test here, the C++ wrappers (e.g. io_new vs io2) and/or the backends (e.g. LibTIFF vs WIC) and/or something third...
Actually, that was part of my initial question.
Hi, sorry for the delay...
Sorry for delay too Here is the tiling benchmark for GDAL: https://github.com/mloskot/workshop/tree/master/benchmarking/tiling/gdal The gdal_image_tiles_test.cpp + Makefile for those who would like to run it in their environments. The results.txt file includes timing + tiles number + total size of tiles for PNG and JPEG output. Shortly, my results on Intel P8600 + 4GB RAM with Linux (amd64) PNG: 11:30 - 11:50 min JPG: 2:10 - 2:30 min RAM usage observed for both is less than 5MB.
This code http://codepad.org/eGMixUK1 (io2 using the WIC backend, unfortunately large TIFF support was added only to the latest WIC available on Windows 7 so the same code will not work on WinXP) opens the input TIFF and hacks it up into separate tile files. With an Intel i5@4.2 GHz with 4 GB RAM I got the following results:
Looks like run on ~1.5x faster machine, considering clock of single CPU.
200x200 PNG tiles ~ 65 seconds 512x512 PNG tiles ~ 69 seconds
What compression level did you use? See results.txt for my details.
512x512 TIFF tiles ~ 27 seconds
I assume it's no compression, right? I haven't tried TIFF.
RAM usage was below 5 MB the whole time (after working around an apparent leak in WIC that otherwise cause the RAM usage to crawl up to ~15 MB)
Similar to the GDAL run.
and the binary is about 45kB...
The size of binaries in GDAL test is: 26K test program + 11M libgdal.so Note, libgdal.so is built with support of more than 120 dataset formats.
... I guess these tasks are no longer so hard for modern hardware as my work desktop churns them out pretty fast...
Indeed, your results are very impressive. I'm going to run your benchmark and submit my results here https://github.com/mloskot/workshop/tree/master/benchmarking/tiling/gil but I'm missing some details about how to configure and build it. Can I find it anywhere? Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

On 13/12/10 17:53, Mateusz Loskot wrote:
I'm going to run your benchmark and submit my results here https://github.com/mloskot/workshop/tree/master/benchmarking/tiling/gil but I'm missing some details about how to configure and build it. Can I find it anywhere?
Domagoj, As far as I can see, implementation of your extension is Windows/Visual C++-specific, so I'm not really able to do anything to run it on Linux at this moment. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D06B2C7.4030604@loskot.net...
On 13/12/10 17:53, Mateusz Loskot wrote:
I'm going to run your benchmark and submit my results here https://github.com/mloskot/workshop/tree/master/benchmarking/tiling/gil but I'm missing some details about how to configure and build it. Can I find it anywhere?
Domagoj,
As far as I can see, implementation of your extension is Windows/Visual C++-specific, so I'm not really able to do anything to run it on Linux at this moment.
Hi Mateusz, I've finally cleaned up io2 to compile on *NIXes (test with GCC 4.2.1, GCC 4.6, Apple Clang 1.6 and Clang 2.8) and I've added low level access routines (what Phil was asking for). All backends support scanline/row based access (even if emulated, as in the case of WIC and GDI+), and the LibTIFF backend additionally supports tile based access... These low level access routines expect the caller to know what he/she is doing and perform no checks or conversions...A supplementary set of member functions and metafunctions was added for this purpose (to query image information before you start low level access), such as can_do_tile_access(), format(), get_native_format<> ... I'll explain it in more detail in another post... Anyways the LibTIFF tile based access can now be used to rewrite my original benchmark code so that you can run it on a *NIX machine... Here's the code: http://codepad.org/bKHEWPKb On a weak Mac Mini with OS X 10.6.6 [a Core (1) Duo@1.8GHz, 1GB RAM (~640MB of which is eaten by the OS) and a slow, small and uber fragmented disk] this finishes in ~34 seconds... On a Windows 7 Core 2 Duo@3.2Ghz + 4GB RAM + el-cheapo RAID 5 setup it finishes in ~16 seconds (~33 seconds on the first run while nothing is in cache)... Configuration: OSX: - everything built with Clang 2.8 -m32 -O4 (-flto) -DNDEBUG -msse -ffast-math - static linking except with libstc++ - libs: LibTIFF 3.9.2, LibJPEG 8b, zlib 1.2.4 Win7 - everything built with MSVC++ 10 x86 /Ox /Ob2 /Oi /Ot /Oy /GL /GF /MT /GS- /Gy /arch:SSE /fp:fast /DNDEBUG - static linking - libs: LibTIFF 3.9.4, LibJPEG 8a, zlib 1.2.5 -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

"Domagoj Saric" <domagoj.saric@littleendian.com> wrote in message news:igplr7$7al$1@dough.gmane.org...
"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D06B2C7.4030604@loskot.net...
As far as I can see, implementation of your extension is Windows/Visual C++-specific, so I'm not really able to do anything to run it on Linux at this moment. Anyways the LibTIFF tile based access can now be used to rewrite my original benchmark code so that you can run it on a *NIX machine...
One thing I forgot to mention was that LibTIFF uses memory mapped files (unless explicitly disabled) so expect high (reported) memory usage with this test... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 14/01/11 14:16, Domagoj Saric wrote:
"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D06B2C7.4030604@loskot.net...
On 13/12/10 17:53, Mateusz Loskot wrote:
I'm going to run your benchmark and submit my results here https://github.com/mloskot/workshop/tree/master/benchmarking/tiling/gil but I'm missing some details about how to configure and build it. Can I find it anywhere?
Domagoj,
As far as I can see, implementation of your extension is Windows/Visual C++-specific, so I'm not really able to do anything to run it on Linux at this moment.
Hi Mateusz,
I've finally cleaned up io2 to compile on *NIXes (test with GCC 4.2.1, GCC 4.6, Apple Clang 1.6 and Clang 2.8) and I've added low level access routines (what Phil was asking for). All backends support scanline/row based access (even if emulated, as in the case of WIC and GDI+), and the LibTIFF backend additionally supports tile based access...
Here's the code: http://codepad.org/bKHEWPKb
Hi Domagoj, This is great and thank you for your work. Unfortunately, I'm having very busy time now and I won't be able to run the tests before next weekend. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D3360FC.7010704@loskot.net...
Unfortunately, I'm having very busy time now and I won't be able to run the tests before next weekend.
Hi, have you by some chance managed to run the code I've posted? -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

Hi Domagoj, Domagoj Saric wrote:
Anyways the LibTIFF tile based access can now be used to rewrite my original benchmark code so that you can run it on a *NIX machine...
Here's the code: http://codepad.org/bKHEWPKb
As far as I can tell, that code expects the input TIFF to be in internally-tiled format, and it creates output images whose size is the same as the internal tiles. Is that right? I may have lost track of what you're trying to do here, but that's not what my original example was doing. I don't think I've ever seen an internally-tiled TIFF "in the wild", even in applications where it seems the most appropriate format e.g. large maps. Even when the input does have this form, the tile size will not in general match the required output tile size. I mean to offer you some code to compare against for benchmarking, but this is complicated by the fact that most of my existing code is entangled with other things. Regards, Phil.

"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1295368977993@dmwebmail.dmwebmail.chezphil.org...
Domagoj Saric wrote:
Anyways the LibTIFF tile based access can now be used to rewrite my original benchmark code so that you can run it on a *NIX machine...
Here's the code: http://codepad.org/bKHEWPKb
As far as I can tell, that code expects the input TIFF to be in internally-tiled format, and it creates output images whose size is the same as the internal tiles. Is that right?
Yes, because that is the format of the test image that Mateusz and I agreed to use for our tests... As explained in this recent post (addressed to you)http://lists.boost.org/Archives/boost/2011/01/175067.php the link to that code was meant only as a usage example...
I may have lost track of what you're trying to do here, but that's not what my original example was doing. I don't think I've ever seen an internally-tiled TIFF "in the wild", even in applications where it seems the most appropriate format e.g. large maps. Even when the input does have this form, the tile size will not in general match the required output tile size.
The Marble GeoTIFFs seem to be tiled... Anyways, as explained in the post above mentioned, what I was trying to do (and hopefully did do) was to add a low level interface (that you asked for) that enables things like row and/or tile access... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D065D94.80505@loskot.net...
The gdal_image_tiles_test.cpp + Makefile for those who would like to run it in their environments. The results.txt file includes timing + tiles number + total size of tiles for PNG and JPEG output.
Shortly, my results on Intel P8600 + 4GB RAM with Linux (amd64)
PNG: 11:30 - 11:50 min JPG: 2:10 - 2:30 min
RAM usage observed for both is less than 5MB.
Interesting, didn't know JPG compression was faster than PNG...
This code http://codepad.org/eGMixUK1 (io2 using the WIC backend, unfortunately large TIFF support was added only to the latest WIC available on Windows 7 so the same code will not work on WinXP) opens the input TIFF and hacks it up into separate tile files. With an Intel i5@4.2 GHz with 4 GB RAM I got the following results:
Looks like run on ~1.5x faster machine, considering clock of single CPU.
Actually an i5 is a newer architecture than your Core 2 Duo so it is faster even at same clock speeds...the one I'm using is an overclocked i5 660 (which further 'auto-overclocks' for single threaded loads) with fast DDR3 memory so we can assume that it is even 2x faster or maybe more (one cannot really know as it heavily depends on the circumstances)... However I was using a 32 bit build (while you used a 64 bit build)...maybe even better results would be achieved with a 64 bit build...
200x200 PNG tiles ~ 65 seconds 512x512 PNG tiles ~ 69 seconds
What compression level did you use? See results.txt for my details.
Don't know exactly, WIC defaults (resulting PNGs were about 450~460 MB)...
512x512 TIFF tiles ~ 27 seconds
I assume it's no compression, right? I haven't tried TIFF.
Yes, if I remember correctly (just WIC default again)...
The size of binaries in GDAL test is: 26K test program + 11M libgdal.so Note, libgdal.so is built with support of more than 120 dataset formats.
Still pretty big...it would be interesting if you could try a Clang 2.8 with -O4 (or -Os -flto -finline-small-functions) rebuild with static linking...
Indeed, your results are very impressive.
Well those aren't 'my' results actually, it's not like I wrote the decoding procedures :) The goal is only to write a wrapper with the least amount of 'fat' around the decoder...
I'm going to run your benchmark and submit my results here https://github.com/mloskot/workshop/tree/master/benchmarking/tiling/gil but I'm missing some details about how to configure and build it. Can I find it anywhere?
No need to build it (and the configuring is already done by the macro on the first line)...but yes as you figured out later, it works only on Windows (7), because of WIC (until I give 'full ROI' support to the LibTIFF backend) and MSVC (temporarily until I clean up io2)... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

Domagoj Saric wrote:
"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1291823931996@dmwebmail.dmwebmail.chezphil.org...
Example code demonstrating a possible (skeleton) solution: http://codepad.org/WD7CpIJ8 ...
I don't really follow what that code is doing,
Hmm...if I understood your initial example/use case correctly you have a huge TIFF comprised of 5k-x-5k tiles that you need to load, edit and then rechop into 256x256 PNG files...
Not quite; I have a 1e12 pixel image which is supplied as a few thousand 5000x5000 TIFF files. (Presumably the confusion is because TIFFs can in theory be divided into tiles internally; this is not what I'm referring to. I have an image which is supplied as tiles, each of which is a separate file.)
and it's not obvious to me what its memory footprint will be.
Well, with io2, it all depends on the backend...if the backend is 'smart enough' to read in only the required parts of an image (and WIC theoretically/as per documentation should be) the footprint should be obvious
Right, it's not obvious to me.
This example seems different from the one you gave in the first post (or I misunderstood both)...Now you seem to have an image that is (1400*5000) pixels wide and 1300000 pixels tall and that is not actually a single file but the 5k-x-5k tiles preseparated into individual files
Right.
...and it misses the 'editing' logic
Right. That's largely unimportant for this discussion.
and saving to 256x256 PNGs
Well I omitted the TiledWriteImage implementation but it would be similar in organisation to the TiledReadImage. Here you are: class TiledWriteImage { typedef shared_ptr<WritePng> writepng_ptr; writepng_ptr images[27345]; // !!! int rownum; void open_next_row() { for (int c=0; c<27345; ++c) { images[c] = new WritePng(output_tile_filename(c,rownum/256)); } } public: TiledWriteImage(): rownum(0) {} write_row(const pixel_t* data) { if (rownum%256==0) { open_next_row(); } for (int c=0; c<27345; ++c) { images[c]->write_row(data+256*c); } ++rownum; } };
...As I don't see what it is actually trying to do with the input data I cannot know whether you actually need to load entire rows of tiles (the 1400 files) but doesn't such an approach defeat the purpose of tiles in the first place?
No. Hmm, I thought this code was fairly obvious but maybe I'm making assumptions. There is a balance between - Buffer memory size - Number of open files - Code complexity Options include: 1. Read the entire input, then write the entire output. This uses an enormous amount of memory, but has only one file open at any time, and is very simple. 2. Read and write row at a time (as shown). This uses a very modest amount of memory, but requires a very large number of files to be open at the same time. It's still reasonably simple. 3. Read and write 256 rows at a time. This uses an acceptable amount of memory (less than 1 GB), and requires 1400 input files to be open, but only 1 ouput file. The complexity starts to increase in this case. 4. Read and write 5000 rows at a time. This requires a lot more RAM (15 GB) but I can have only 1 file open at a time. This is getting rather complex as there is some wrap-around to manage because 5000%256!=0. 5. Lots of schemes that involve closing, re-opening and seeking within the images. These will all be unacceptably slow. There is also the issue of concurrency to think about. The code that I've posted can be made to work reasonably efficiently on a shared memory multiprocessor system by parallelising the for loops in the read/write_row methods. It's also possible to split over multiple machines where the tile boundaries align, which is every 160,000 pixels; that lets me divide the work over about 50 machines (I run this on Amazon EC2). The essential issue is that I believe it's unreasonable or impossible to expect the library code to do any of this automatically - whatever its documentation might say, I don't trust that e.g. "WIC" will do the right thing. I want all this to be explicit in my code. So I just want the library to provide the simple open/read-row/write-row/close functionality for the image files, and to supply types that I can use for pixel_t, etc.
I can only, as a side note, say that the shared_ptrs are an overkill...in fact, if the number 1400 is really fixed/known at compile-time the individual ReadTiff heap allocations are unnecessary...
Please, think of this as pseudo-code if you like. Any shared pointer overhead will be swamped by the time taken in the image encoding and decoding.
To get the ball moving on GIL extensions, and because this is better than what GIL currently has. Past experience suggests that the existence of this io extension will not prevent other similar and incompatible things (i.e. yours) from being accepted in the future.
Can I ask to which librar(y/ies) does your past experience refer to, as to me the practice seems quite the opposite (e.g. even after years of complaints and different proposals Boost.Function still has not changed, or the fact that we have two signals and two regex libraries)...
Well having two signals and two regex libraries is a perfect example of a case where the existence of one solution has not prevented another similar and incompatible solution from being accepted later. Regards, Phil.

On 09/12/10 16:29, Phil Endecott wrote:
Domagoj Saric wrote:
I have a 1e12 pixel image which is supplied as a few thousand 5000x5000 TIFF files.
AFAIU, you retile this coverage cutting to smaller tiles 256x256 each. Is that correct? (I'm sorry if you've explained it already, but I've been disconnected for a couple of days.) It could be interesting to see how GDAL Raster I/O engine would manage There is a thin script which can retile existing tile coverages: http://gdal.org/gdal_retile.html http://trac.osgeo.org/gdal/browser/trunk/gdal/swig/python/scripts/gdal_retil... All work is done by GDAL engine written in C/C++. BTW, are these 5000x5000 tiles from Ordnance Survey dataset?
...As I don't see what it is actually trying to do with the input data I cannot know whether you actually need to load entire rows of tiles (the 1400 files) but doesn't such an approach defeat the purpose of tiles in the first place?
No. Hmm, I thought this code was fairly obvious but maybe I'm making assumptions.
There is a balance between - Buffer memory size - Number of open files
I assume you mean "number of open files at the same time"? Obviously, total number of open files will be huge.
- Code complexity
Options include: 1. Read the entire input, then write the entire output. This uses an enormous amount of memory, but has only one file open at any time, and is very simple. 2. Read and write row at a time (as shown). This uses a very modest amount of memory, but requires a very large number of files to be open at the same time. It's still reasonably simple.
You mean row as a raster scanline, not row of tiles in the grid of tiles, right?
3. Read and write 256 rows at a time. This uses an acceptable amount of memory (less than 1 GB), and requires 1400 input files to be open, but only 1 ouput file. The complexity starts to increase in this case.
It would be 256 x width of input raster scanline (5000) as whole scanline needs to be decoded.
4. Read and write 5000 rows at a time. This requires a lot more RAM (15 GB) but I can have only 1 file open at a time. This is getting rather complex as there is some wrap-around to manage because 5000%256!=0.
It's possible to read in stripes (e.g. 8 scanlines at ones): 8 * 5000 * 3 (assuming RGB) Now, if read block size could be optimised to 32x32, then single read operation would require to decode four such stripes. Operation repeated 8 times to generate single 256x256 output. There are raster backends that can perform some caching, so the same scanlines/blocks are not decoded more than once to decode subsequent blocks along strip of 8 (or more) scanlines. GDAL provides such mechanism: http://www.osgeo.org/pipermail/gdal-dev/2001-September/003165.html http://www.osgeo.org/pipermail/gdal-dev/2001-September/003166.html
5. Lots of schemes that involve closing, re-opening and seeking within the images. These will all be unacceptably slow.
Closing and (re)opening of input and output files seems to be orthogonal to raster format. Seeking is as efficient as I/O of particular format. Some formats allow scanline-based access, some allow strip-based and some allow tile-based access (regardless if a raster file is physically organised in tiles). There is a variation to the concept of efficient access called Region of Interest (ROI) but this requires a) support specified by a format and implemented by a format access library b) preprocessing of data to define ROIs. I assume that ROI are as useless for Phil's use case in similar way as the tiled TIFFs are. A bit of brainstorming: Assuming the cutting of coverage of 5000x5000 tiles to 256x256 tiles, maximum number of input tiles per single output tile is 4: output raster generated on the stitching of the four input rasters. If format backend performs access based on scanlines, then for each 256 pixels wide scanline written output, 2 * 5000 wide scanlines are accessed. This is a real limitation that needs to be balanced in terms of number of total output 256x256 rasters generated at the same time, etc. Anyways, my understanding is that Boost.GIL IO takes all the access options provided by format libraries as they are specified and either allows to utilise some or all of them. Hopefully, it should support most popular access strategies for the popular formats. Thus, I don't think it's possible for Boost.GIL IO to address such specific and advanced problems like Phil's. However, I think Boost.GIL IO backends could be not only format but problem specific as well. Perhaps, Phil's problem qualifies to be solved with a new GIL IO backend solving efficient block-based access implementing some assumptions: - one file open at time - calculation of memory efficient size of block based on size of scanline and max number of scanline to not to open too much - merge read blocks to single 256x256 written in single file - blocks/scanlines caching strategy (see GDAL case above) - etc. I believe it would be feasible with Boost.GIL and the IO extensions, as a specialised IO driver. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

Mateusz, I'm going to snip most of your message to focus on the important stuff. I don't have time to answer everything in detail and somehow my central point doesn't seem to have been understood yet.
Anyways, my understanding is that Boost.GIL IO takes all the access options provided by format libraries as they are specified and either allows to utilise some or all of them.
No, that's precisely what it doesn't do. It supports exactly one access method, which is to read or write a complete file in one go, and nothing else. No doubt reading or writing a complete file in one go will be the most common operation, but it is not the only one, and it's not the one that I need for any of my current applications including the example that I've posted. I think that it's unfortunate that Christian's extension has not tried to support at least sequential row-by-row access to images.
Thus, I don't think it's possible for Boost.GIL IO to address such specific and advanced problems like Phil's. However, I think Boost.GIL IO backends could be not only format but problem specific as well. Perhaps, Phil's problem qualifies to be solved with a new GIL IO backend solving efficient block-based access implementing some assumptions:
- one file open at time - calculation of memory efficient size of block based on size of scanline and max number of scanline to not to open too much - merge read blocks to single 256x256 written in single file - blocks/scanlines caching strategy (see GDAL case above) - etc.
I believe it would be feasible with Boost.GIL and the IO extensions, as a specialised IO driver.
No. I don't want a special IO driver to manage my tiled images. I simply want a wrapper around the image libraries that I can call that hides much of the legacy interface. I just wish that the extension we're reviewing had been decomposed into a set of such wrappers and the whole-image read/write code, such that I could use the wrappers and not the rest. Regards, Phil.

On 09/12/10 20:03, Phil Endecott wrote:
Mateusz, I'm going to snip most of your message to focus on the important stuff.
Sure.
I don't have time to answer everything in detail and somehow my central point doesn't seem to have been understood yet.
Yes, I've realised it lately.
Anyways, my understanding is that Boost.GIL IO takes all the access options provided by format libraries as they are specified and either allows to utilise some or all of them.
No, that's precisely what it doesn't do. It supports exactly one access method, which is to read or write a complete file in one go, and nothing else.
Right, understood. I've been looking too far in future I think.
No doubt reading or writing a complete file in one go will be the most common operation, but it is not the only one, and it's not the one that I need for any of my current applications including the example that I've posted.
Yes, agreed.
I think that it's unfortunate that Christian's extension has not tried to support at least sequential row-by-row access to images.
It would be useful indeed.
Thus, I don't think it's possible for Boost.GIL IO to address such specific and advanced problems like Phil's. However, I think Boost.GIL IO backends could be not only format but problem specific as well. Perhaps, Phil's problem qualifies to be solved with a new GIL IO backend solving efficient block-based access implementing some assumptions:
- one file open at time - calculation of memory efficient size of block based on size of scanline and max number of scanline to not to open too much - merge read blocks to single 256x256 written in single file - blocks/scanlines caching strategy (see GDAL case above) - etc.
I believe it would be feasible with Boost.GIL and the IO extensions, as a specialised IO driver.
No. I don't want a special IO driver to manage my tiled images. I simply want a wrapper around the image libraries that I can call that hides much of the legacy interface. I just wish that the extension we're reviewing had been decomposed into a set of such wrappers and the whole-image read/write code, such that I could use the wrappers and not the rest.
Yes, this has been clarified now. Thanks for the detailed explanation. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

Hi Phil, only one comment. On Thu, Dec 9, 2010 at 3:03 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Mateusz, I'm going to snip most of your message to focus on the important stuff. I don't have time to answer everything in detail and somehow my central point doesn't seem to have been understood yet.
Anyways, my understanding is that Boost.GIL IO takes all the access options provided by format libraries as they are specified and either allows to utilise some or all of them.
No, that's precisely what it doesn't do. It supports exactly one access method, which is to read or write a complete file in one go, and nothing else.
No doubt reading or writing a complete file in one go will be the most common operation, but it is not the only one, and it's not the one that I need for any of my current applications including the example that I've posted. I think that it's unfortunate that Christian's extension has not tried to support at least sequential row-by-row access to images.
You can read row-by-row, pixel-by-pixel, tile-by-tile. It can be done in any rectangular shape. Now, is it done most efficiently? No. Can it be done better? Yes. But the interface allows for it. That's my point. Here one example: rgb8_image_t img; image_read_settings< jpeg_tag > settings( point_t( 0, 0 ) , point_t( 10, 10 ) , jpeg_dct_method::slow ); read_image( jpeg_filename, img, settings ); Regards, Christian

On 09/12/10 20:59, Christian Henning wrote:
Hi Phil, only one comment.
On Thu, Dec 9, 2010 at 3:03 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Mateusz, I'm going to snip most of your message to focus on the important stuff. I don't have time to answer everything in detail and somehow my central point doesn't seem to have been understood yet.
Anyways, my understanding is that Boost.GIL IO takes all the access options provided by format libraries as they are specified and either allows to utilise some or all of them.
No, that's precisely what it doesn't do. It supports exactly one access method, which is to read or write a complete file in one go, and nothing else.
No doubt reading or writing a complete file in one go will be the most common operation, but it is not the only one, and it's not the one that I need for any of my current applications including the example that I've posted. I think that it's unfortunate that Christian's extension has not tried to support at least sequential row-by-row access to images.
You can read row-by-row, pixel-by-pixel, tile-by-tile. It can be done in any rectangular shape. Now, is it done most efficiently? No. Can it be done better? Yes. But the interface allows for it. That's my point.
Thanks Christian for the clarification. I've got confused myself as there have been mentioned some suggestions and patches to the design which I have no idea about. I took it as I got as there is some lack of functionality for some non-whole I/O cases. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

Christian Henning wrote:
On Thu, Dec 9, 2010 at 3:03 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
No, that's precisely what it doesn't do. ?It supports exactly one access method, which is to read or write a complete file in one go, and nothing else.
You can read row-by-row, pixel-by-pixel, tile-by-tile. It can be done in any rectangular shape. Now, is it done most efficiently? No.
You can read row-by-row by repeatedly closing and re-opening the file and skipping over all of the previously read lines. This is O(N^2). Actually doing that in practice would be crazy, but yes you are right, it can be done - for reading. But not for writing.
Can it be done better? Yes. But the interface allows for it.
No, your interface does not allow for a better implementation.
That's my point. Here one example:
rgb8_image_t img;
image_read_settings< jpeg_tag > settings( point_t( 0, 0 ) , point_t( 10, 10 ) , jpeg_dct_method::slow );
read_image( jpeg_filename, img, settings );
So how does that interface allow for an efficient implementation? It doesn't. To function efficiently the image file needs to be kept open from one call to the next. That can't be done with a free function like read_image() that takes a filename. You need to decouple the image file object from the reading and writing: rgb8_image_t img; jpeg_image_file_reader j(jpeg_filename); // <----- here image_read_settings< jpeg_tag > settings( point_t( 0, 0 ) , point_t( 10, 10 ) , jpeg_dct_method::slow ); read_image( j, img, settings ); // Takes the file object, not the filename Now, the file remains open while j is in scope, so you can read more lines from it etc. with the possibility of doing so efficiently. (If anyone is having difficulty following this, compare this will reading and writing text files; imagine if the operations took filenames, and there were no objects representing open files.) Phil.

Hi Phil,
rgb8_image_t img;
jpeg_image_file_reader j(jpeg_filename); // <----- here
image_read_settings< jpeg_tag > settings( point_t( 0, 0 ) , point_t( 10, 10 ) , jpeg_dct_method::slow );
read_image( j, img, settings ); // Takes the file object, not the filename
Now, the file remains open while j is in scope, so you can read more lines from it etc. with the possibility of doing so efficiently.
(If anyone is having difficulty following this, compare this will reading and writing text files; imagine if the operations took filenames, and there were no objects representing open files.)
I can follow. Your jpeg_image_file_reader could also serve another function that is to give the user access to the backend. I have been thinking about this use case and I think one good idea is have an input iterator returned by the io extension. The user would set up the strides to read which can be in scanlines or tiles. Once the iterator is retrieved the operator++() would read the next portion. Christian

"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1291995568066@dmwebmail.dmwebmail.chezphil.org...
Christian Henning wrote:
Can it be done better? Yes. But the interface allows for it.
No, your interface does not allow for a better implementation.
Again, to intercede for Christian ;) He did not imply, or at least I understood it that way, that the interface allows for a better way/implementation but simply that it allows for it to be done at all... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1291912190039@dmwebmail.dmwebmail.chezphil.org...
and it's not obvious to me what its memory footprint will be.
Well, with io2, it all depends on the backend...if the backend is 'smart enough' to read in only the required parts of an image (and WIC theoretically/as per documentation should be) the footprint should be obvious
Right, it's not obvious to me.
Why not, if it can be seen that the code reads an image with a sequentially moving ROI and writing each read chunk/ROI to a file and then 'forgetting'/freeing it then it should be obvious that the maximum amount RAM used will be the size of the ROI...unless of course the backend used is not quite 'smart' (more on 'trusting the backend' below)...
This example seems different from the one you gave in the first post (or I misunderstood both)...Now you seem to have an image that is (1400*5000) pixels wide and 1300000 pixels tall and that is not actually a single file but the 5k-x-5k tiles preseparated into individual files
Right.
OK, I somehow missed on first reading that you actually read the input image row by row (what I call a '1D ROI' in io2), i.e. you do not use tiles/real 2D ROIs... io2 can already do that with all backends including LibTIFF (only GDI+ and WIC backends support 2D ROIs currently) so when I catch the time I'll finally clean it up/fix it to compile on *NIX+Clang/GCC so you can test-drive it if you want...however I have to temporarily relax my involvement as I'm in the middle of a major release on my daily job + I'm about to become a dad in the coming days, probably even before this review ends so please bear with me ;-)
...As I don't see what it is actually trying to do with the input data I cannot know whether you actually need to load entire rows of tiles (the 1400 files) but doesn't such an approach defeat the purpose of tiles in the first place?
No. Hmm, I thought this code was fairly obvious but maybe I'm making assumptions.
There is a balance between - Buffer memory size - Number of open files - Code complexity
Options include: <...snipped...>
Well I must still say I do not understand why must you read entire rows of input tiles (1400*5000 in length), why can't you just read individual 5k input tiles and chop each of those into the smaller output PNGs and then move on...yes the 5000%256!=0 issue might require reading up to 4 input tiles and/or remembering some of the edge data for the cross-border input tiles and/or maybe of the input tile which in turn would require a (much) more complex code...but I do not really see the issue with that (if performance is your prime concern here)...it's just a real world problem that you handle the best as you can...a std::vector maybe much simpler than a std::set but if (performant) sorted insertion and lookup is what you need then there will be no doubt on which to use, right? Complex problems sometimes simply give complex (or slow) solutions...
There is also the issue of concurrency to think about. The code that I've posted can be made to work reasonably efficiently on a shared memory multiprocessor system by parallelising the for loops in the read/write_row methods. It's also possible to split over multiple machines where the tile boundaries align, which is every 160,000 pixels; that lets me divide the work over about 50 machines (I run this on Amazon EC2).
After performing the tests with the GeoTIFFs (that Mateusz and I discussed about), and without knowing how complex the intermediate 'editing' procedure is, it seems to me that even the 1400*5000*1300000 input image could be chopped up in a reasonable time (i.e. counted in hours) even by a single machine (that can still be considered a 'personal computer' as opposed to a 'server') with a 6 core i7, 8 GB of RAM and a dual RAID-0 setup (say 2+2 WD VelociRaptors)...
The essential issue is that I believe it's unreasonable or impossible to expect the library code to do any of this automatically - whatever its documentation might say, I don't trust that e.g. "WIC" will do the right thing. I want all this to be explicit in my code.
But then, if you refuse to trust documented performance claims even after they are tested, you are, I must say, confined to writing assembly code because you then simply can't trust anything...And by that rationale your code is also not 'obvious' as how can one trust your backend to actually read only what you told it to and not the entire image or something similar...This is akin to not trusting malloc to allocate only the amount requested (fearing it might actually allocate 16 times more)... As you can see, in the other thread, I tested io2+WIC and the expectations proved correct...So yes I think you can expect (and should demand it otherwise, which is why I went for io2 in the first place) for a library to do 'the right thing' (at least for a major subset of use cases, for others it should give you low level control)...
So I just want the library to provide the simple open/read-row/write-row/close functionality for the image files, and to supply types that I can use for pixel_t, etc.
Independent of the rest of the discussion/issues, yes I would agree that explicit read/write row would be a valuable part of the interface...
I can only, as a side note, say that the shared_ptrs are an overkill...in fact, if the number 1400 is really fixed/known at compile-time the individual ReadTiff heap allocations are unnecessary...
Please, think of this as pseudo-code if you like. Any shared pointer overhead will be swamped by the time taken in the image encoding and decoding.
Pseudo-code or not I simply have a knee jerk reaction to such coding practices ;) Especially here where we hunt for performance...using shared_ptr as opposed to say scoped_ptrs or a ptr_container gains you nothing here so why use them by default...More importantly, as already, said why heap allocated the readers at all (at least individually)? Especially considering that LibTIFF already does that so all you get is an additional indirection and worse locality of reference...
Well having two signals and two regex libraries is a perfect example of a case where the existence of one solution has not prevented another similar and incompatible solution from being accepted later.
But accepted/merged as an improved replacement but as a parallel entity which is a bit 'akward'/'ugly'/'unfortunate'... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 11/12/10 11:36, Domagoj Saric wrote:
"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1291912190039@dmwebmail.dmwebmail.chezphil.org...
and it's not obvious to me what its memory footprint will be.
Well, with io2, it all depends on the backend...if the backend is 'smart enough' to read in only the required parts of an image (and WIC theoretically/as per documentation should be) the footprint should be obvious
Right, it's not obvious to me.
Why not, if it can be seen that the code reads an image with a sequentially moving ROI and writing each read chunk/ROI to a file
Domagoj, Would you mind explaining what you exactly mean as ROI? The abbreviation stands for Region of Interest. This is clear, but what definition of ROI you refer to? I have a feeling there is potential for another confusion here. General meaning of ROI is slightly different to ROI as defined in JPEG 2000 or PGF (Progressive Graphics File) and similar formats. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D038E6B.5010903@loskot.net...
Would you mind explaining what you exactly mean as ROI? The abbreviation stands for Region of Interest. This is clear, but what definition of ROI you refer to? I have a feeling there is potential for another confusion here.
Hmm...well I mean simply that, a region of interest....in many contexts in the current discussions it is almost synonymous with a subimage_view...Please feel free to correct me if you think I'm using the term in wrong/confusing ways... ps. have you by some chance ran the GDAL test(s)? -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 11/12/10 15:25, Domagoj Saric wrote:
"Mateusz Loskot" <mateusz@loskot.net> wrote in message news:4D038E6B.5010903@loskot.net...
Would you mind explaining what you exactly mean as ROI? The abbreviation stands for Region of Interest. This is clear, but what definition of ROI you refer to? I have a feeling there is potential for another confusion here.
Hmm...well I mean simply that, a region of interest....in many contexts in the current discussions it is almost synonymous with a subimage_view...
OK, this is clear now. I have met term ROI mostly used to describe a certain coding strategy where regions which are encoded to higher quality (and stored as such in a format) than the rest of image. This is common technique used in JPEG2000 or PGF. This also allows faster decoding and access to those subimages where each of them is ROI.
Please feel free to correct me if you think I'm using the term in wrong/confusing ways...
I just wanted to confirm that we think of the same writing and reading the term ROI.
ps. have you by some chance ran the GDAL test(s)?
Just posted my results as follow-up to yours. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org

Domagoj Saric wrote:
"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1291912190039@dmwebmail.dmwebmail.chezphil.org... Well I must still say I do not understand why must you read entire rows of input tiles (1400*5000 in length)
Because I can, and it gives me fast and simple code.
why can't you just read individual 5k input tiles and chop each of those into the smaller output PNGs and then move on...yes the 5000%256!=0 issue might require reading up to 4 input tiles and/or remembering some of the edge data for the cross-border input tiles and/or maybe of the input tile which in turn would require a (much) more complex code...but I do not really see the issue with that
I prefer fast and simple code to either fast-and-complex or slow-and-simple.
So I just want the library to provide the simple open/read-row/write-row/close functionality for the image files, and to supply types that I can use for pixel_t, etc.
Independent of the rest of the discussion/issues, yes I would agree that explicit read/write row would be a valuable part of the interface...
Great. Regards, Phil.

"Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote in message news:1292110169238@dmwebmail.dmwebmail.chezphil.org...
why can't you just read individual 5k input tiles and chop each of those into the smaller output PNGs and then move on...yes the 5000%256!=0 issue might require reading up to 4 input tiles and/or remembering some of the edge data for the cross-border input tiles and/or maybe of the input tile which in turn would require a (much) more complex code...but I do not really see the issue with that
I prefer fast and simple code to either fast-and-complex or slow-and-simple.
Khm...I must say I still doubt that that/your approach, considering the inherent RAM and disk thrashing (and the consequential virtually zero locality of reference) can be even nearly as fast as the 'complex code' solution...
So I just want the library to provide the simple open/read-row/write-row/close functionality for the image files, and to supply types that I can use for pixel_t, etc.
Independent of the rest of the discussion/issues, yes I would agree that explicit read/write row would be a valuable part of the interface...
Great.
As mentioned in a recent reply to Mateusz, I've finally cleaned up the code to compile on GCC and Clang as well as added a 'low level interface' (to all backends) that might satisfy your needs: - pixel_size() - row_size() - format() - image_format_id() - can_do_row_access() - can_do_tile_access() - can_do_roi_access() - can_do_vertical_roi_access() - begin_sequential_row_access() - get_native_format<> - global formatted_image_traits<>::supported_pixel_formats_t + libtiff_image specific: - tile_size() - tile_row_size() - tile_dimensions() - begin_sequential_tile_access() Most of these are probably obvious (or will be from examples/code), except maybe for format() - it will return a 'value' that is a backend's internal representation of the loaded image's pixel format and can be used in two ways: whether it matches a format of your target image (whose, backend's internal, format value you retrieve with the get_native_format<> member metafunction) or to convert it to the index in the backend's MPL typelist of supported formats (e.g. formatted_image_traits<libjpeg_image>::supported_pixel_formats_t) by passing it to the image_format_id() static member function... You can find a link with example code in the recent reply to Mateusz: http://lists.boost.org/Archives/boost/2011/01/174907.php ... hope you find it useful... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman
participants (7)
-
Christian Henning
-
Domagoj Saric
-
Domagoj Saric
-
Mateusz Loskot
-
Phil Endecott
-
pipalapop
-
Robert Ramey