A question about Boost.RegEx
I have a text file in the following format: (sparse data format, index:value) 0.0 1:0.269474 2:0.145364 3:0.067149 4:0.112643 5:0.212212 6:0.244601 7:0.181663 8:0.238227 9:0.848362 10:0.266284 11:0.058374 12:0.071349 13:0.192308 14:0.20059 15:0.256923 16:0.385338 17:0.123268 18:0.119405 19:0.350768 20:0.187007 21:0.369464 22:0.056476 23:0.059463 24:0.07298 25:0.158566 26:0.192542 27:0.315876 28:0.503185 29:0.216059 30:0.122681 31:0.228612 32:0.116034 33:0.12488 34:0.171623 35:0.222429 36:0.278741 37:0.170732 38:0.404539 39:0.078273 40:0.201989 41:0.367349 42:0.310658 43:0.176915 44:0.215489 45:0.207045 46:0.267294 47:0.158534 48:0.114389 49:0.085446 50:0.141968 51:0.11669 52:0.804789 53:0.533344 54:0.112373 55:0.173574 56:0.495218 57:0.122419 58:0.091748 59:0.209178 60:0.100954 61:0.168572 62:0.130615 63:0.080905 64:0.552943 65:0.208904 66:0.072037 67:0.166432 68:0.539735 69:0.186302 70:0.161657 71:0.135055 72:0.131747 73:0.434487 74:0.235148 75:0.119409 76:0.137161 77:0.186354 78:0.182466 79:0.105231 80:0.049308 81:0.199764 82:0.275725 83:0.369274 84:0.222261 85:0.1464 86:0.396967 87:0.937 88:0.983 90:0.983 91:1.0 92:1.0 This one is just one line, i have 64000 lines like this one. What's the best way to load the data? I use Boost::RegEx and Boost::lexical_cast to do this. But It takes 2 minutes to read all the data. Is there a better way to do this? bool LibFile::ReadFile(const string &fileName) { ifstream fin(fileName.c_str(), ios::in) ; boost::regex elabel("^([0-9]+\\.?[0-9]+)", boost::regbase::icase); boost::regex eitem("(\\d+):([-+]?[0-9]*\\.?[0-9]+)", boost::regbase::icase); while (fin.good()) { string buffer ; getline(fin, buffer) ; if ( buffer.length() > 0) { Instance *pinstance = new Instance() ; pinstance->tag = "notag" ; pinstance->vector = new double[featureDim] ; for ( int ii = 0 ; ii < featureDim; ii ++ ) { pinstance->vector[ii] = 0.0 ; } boost::smatch what; string::const_iterator itb = buffer.begin() ; string::const_iterator ite = buffer.end() ; double label = 0.0 ; if ( boost::regex_search( itb, ite, what, elabel) ) { label = boost::lexical_cast<double>(what[1].str()) ; itb = what[0].second ; } while ( boost::regex_search( itb, ite, what, eitem) ) { int index = boost::lexical_cast<int>(what[1].str()) ; double val = boost::lexical_cast<double>(what[2].str()) ; if ( index <= featureDim) { pinstance->vector[index - 1] = val ; } itb = what[0].second ; } vecInstance.push_back(pinstance) ; instanceNum ++ ; } } fin.close() ; return true ; }
On 28/01/2008, Chengyuan Ma
What's the best way to load the data? I use Boost::RegEx and Boost::lexical_cast to do this. But It takes 2 minutes to read all the data. Is there a better way to do this?
How about a spirit parser?
I don't know spirit, but it'd be something like this:
bool
parse_numbers(istreambuf_iterator<char> s, map
participants (2)
-
Chengyuan Ma
-
Scott McMurray