Conversation
* apply - apply a function to a row/column. Overloaded to apply to the entire DataFrame
* Added a helper template to evaluate RowType when column needs to be dropped * Added unittests to verify correct behavior
* Drop row/column from DataFrame
* Convert a column of data to indexing layer * Added unittests to verify correct behavior * Added unittests for drop on heterogeneous DataFrame
* Faster than current parser
Documentation upadted with the developments of this week
| @params: index - integer or string indexes of rows | ||
| +/ | ||
| void apply(alias Fn, int axis, T)(T index) | ||
| if(is(T == int[]) || is(T == string[][])) |
There was a problem hiding this comment.
You probably want to add a constraint to Fn to make sure that it takes and returns a typeof(data[0][0])
Owner
Author
There was a problem hiding this comment.
For now I have kept the implementation very forgiving.
I'll probably add a __compile soon to check if Fn(data[i]) is possible.
|
|
||
| foreach(i, ele; lines[columnDepth .. $]) | ||
| { | ||
| if(ele.length > 0) |
There was a problem hiding this comment.
for conditionals like this you can reduce the indenting by writing
if(ele.length == 0)
continue;
Owner
Author
There was a problem hiding this comment.
Done in dec86d1
I'll eventually scrub the entire code base looking for cases like this 👍
|
It seems you could do with a utility function to compare predicates on the lines of two files, see the unit tests for fastCSV |
Owner
Author
|
Thanks for the review |
Display used to check the length of the complete data. Now max data chacked = 50
* Used the right way to build a hash map * Removed un-necessary rem * using range for file reading * Some documentation fixes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
apply: Applies a function to a row/column of DataFramecolumnToIndex: Converts a column of data to a level of row indexesfastCSV: The old parser was subpar to put it lightly. I saw a post which I couldn't find taking a scenario of a csv 2,000,000 X 5. I wrote a script to generate a mock CSV with the same specification andfrom_csvcouldn't do it. Then I reduced 2,000,000 to 200,000 andfrom_csvstill couldn't parse it fast enough. It took almost 6 minutes to parse 100,000 rows. The time just increases exponentially withfrom_csv. Hence I researched a bit and gotfastCSVworking. The benchmarks can be found hereI will replace
from_csvwithfastCSVafter I extend it's functionality to match that offrom_csv. Should be finished by the end of the week.cc/ @thewilsonator - Ready for review 👾
If you find anything out of place, or anything that can be improved, please leave a review and I'll make the necessary changes at the earliest of my convenience