Fixing Issue#550 by tylerjereddy · Pull Request #580 · MDAnalysis/mdanalysis

tylerjereddy · 2015-12-11T18:03:25Z

Attempted fix for Issue #550. Although this now enables us to use the GROWriter to write usable .gro files with > 99,999 atoms, there are some caveats here:

This solution produces an atom number of 0 after 99,999 rather than 00000. The latter would be the left-truncation result. Why did I do this? Well, the new unit test .gro data file, with > 100,000 atoms and generated by genbox 4.6.3, also does this!
It turns out that not all GROMACS utilities behave in this manner. For example, it appears that trjconv keeps the extra zeros.
Overall, my suggestion is that this PR fixes the major issue (you can't work with MDA-written .gro files for systems with >99,999 particles), even if there is some room for debate about the correct approach here. Both VMD and MDA can read the newly-written .gro files back in. I have to concede that most of my large systems stored in .gro format are actually truncated to keep all the extra zeros / digits, probably because I've used trjconv on them at some point.

The reason for this behaviour is because:

int(str(00000)) #where 00000 is the truncated form of 100000

returns 0

Of course, I'm sure we could find a way to do the truncation with digit-retention, but I ultimately aimed for a fix that matched the new unit test file, which I had generated in the manner Chris Ing suggested.

…to develop

Tests for PSA now use tempdir Fixes Issue #557

Fixes Issue #576

…e file. Improved the error message.

…re than 99,999 atoms, in a manner consistent with genbox in GROMACS 4.6.3. Note that it appears that some GROMACS utilities (i.e., trjconv) actually write 00000 instead of 0 for the 100,000th atom. Both formats can be read in by MDAnalysis and VMD, and this is (overall) an improvement that allows us to write large .gro files in a workable manner. Fixes #550.

richardjgowers · 2015-12-11T18:24:07Z

bigbox.gro is 5.7M. Is it possible to test the written values just by calling readlines and then checking that line[100000] is a certain string?

tylerjereddy · 2015-12-11T18:32:21Z

@richardjgowers I thought we discussed that. It might be possible to come up with something more elegant by subclassing GROWriter and modifying it such that we can write > 100,000 particle data with much smaller input. However, there are bigger data files already present for other tests, and is the extra effort to engineer this behaviour really worth saving the space?

Maybe it is! Note that subclassing for test purposes does add a layer of complexity if the GROWriter changes substantially in the future, since we'd be bending it to do something it doesn't normally do for good reason.

richardjgowers · 2015-12-11T18:39:09Z

I don't mind creating the large gro file, it's the large reference file that's the problem. The only section of it we're using is this:

33333SOL    HW199998   7.632   5.963   2.548
33333SOL    HW299999   7.702   5.897   2.680
33334SOL     OW    0   8.642   7.198   1.893
33334SOL    HW1    1   8.648   7.105   1.930

So we could change expected_lines to be those 4 lines, then check a slice of mda_output against that.

tylerjereddy · 2015-12-11T19:59:02Z

And how are you planning to create a large gro file with those lines if you don't have a large reference file with coordinates to read in? Where is the big universe, that will be used for writing, coming from? This is the same discussion from the issue itself. Either we include a large gro file or we subclass the writer and do some magic so that we can use smaller input or topology manipulation (from a blank universe or whatever) to fake reading in a large coordinate file.

orbeckst · 2015-12-11T20:43:18Z

For a start, try bzip2ing the bigbox.gro file. The GRO reader uses openany() so it should be happy to read it. It will probably compress down to 1/10 of the original size. Not elegant but possibly workable.

…which is involved in testing resolution of Issue#550.

tylerjereddy · 2015-12-12T15:25:47Z

Ok, compressing the file has reduced its size to 1.1M, and I've pushed the adjustments needed to account for the compressed file as well. The 'objections' from QuantifiedCode aren't even confined to things I've changed or added, and those that are involve policies that are not unique to my changes (i.e., they are policies / practices used in other parts of the MDA codebase, though that doesn't necessarily make them 'right'). Should PRs that pass all tests but face objections from QuantifiedCode really be labelled with a red 'x?'

Also, would it be of interest to add a flag that allows the user to specify if they want digit-retained truncation (i.e., 00000) vs. what we currently have (i.e., 0) for atom numbers when writing gro files? If yes, then I'd probably suggest we should use another bz2-compressed file with that adjusted numbering for unit testing. However, one drawback here is that you might only be able to specify the flag when calling GROWriter directly rather than inferring the writer form the filename as in u.atoms.write("filename.gro"). Also, extending the writing functionality in this way would be more of an enhancement than a resolution of the botched (unworkable) .gro files. The latter is the main target for resolution with this particular issue / PR.

mnmelo · 2015-12-13T01:10:05Z

@orbeckst expressed the same discontent at QC output in #563. I'm opening an issue so we can discuss what the output should be.

jbarnoud · 2015-12-14T13:46:32Z

Do both universe and large_universe need to be created for each test of that class while none of the tests use both? Could large_universe be created only in the relevant test?

tylerjereddy · 2015-12-17T11:07:30Z

@jbarnoud If the tests in the class don't really share the same requirements for the setUp and tearDown methods, that is probably an indication that refactoring into separate test classes is needed, until tests do share the same setup/ teardown requirements. Although my contributions in this PR may highlight the increasing heterogeneity in this test class, and the need to split into more classes, I'd suggest that if we are going to get more stringent with this kind of thing (I'm certain that not all MDAnalysis test classes are so clean that all of their tests share the same setUp and tearDown method requirements), then that should be a separate issue for overhauling test infrastructure / policy.

As for the other comment, I actually thought that using the same temporary output directory structure (and defining it in the setUp) made it nice and clear for deletion / cleanup of this test data, so that the tests can focus on the actual testing code.

As an aside, my PR adds a third statement of the following form to the tearDown method:

try:
 os.unlink(self.outfile3)
except OSError:
 pass

Doing the above operation in three separate try/except loops for three different temporary files in the same tearDown method just doesn't feel right! But it was already done twice for the first two files, so when I added a third (large) file, I just continued the trend. However, even in this case, I think that is a separate design issue that should be raised elsewhere, rather than turning this into a complete refactoring of the GRO writing test infrastructure.

kain88-de · 2015-12-17T11:34:05Z

@tylerjereddy I haven't read the tests right now. But the os.unlink(...) shouldn't be necessary anymore if you switched to use tempdir. The teardown method should only delete the temporary directory created by tempdir. In other tests I used a pattern like this.

def setup():
    self.tempdir = tempdir.tempdir()

def tearDown():
    del self.tempdir

def test_1():
    outfile = self.tempdir + '/name of file'

Personally even better for me works

class test_xyz:
    def __init__(self):
          self.tempdir = tempdir.tempdir()

   def test_1(self):
         outfile = self.tempdir + '/test-1.txt'

This way all the tests in this class work in one temp directory which gets cleaned once you are done.
Just make sure that every test in the class uses a unique filename and you don't need
to care about file deletion unlinking at all anymore.

kain88-de · 2015-12-17T12:10:02Z

You can check the second example here

jbarnoud · 2015-12-17T12:21:11Z

I agree these tests would benefit from some cleaning that is out of scope of this PR. Yet, maybe the test introduced here could be move in a new class so a large universe does not get created for each test of the TestGROWriter class. Tests are long enough to run already.

Also, a docstring with a reference to #550 would be nice as it would give the test some context. But this is just nitpicking.

…ed in the PR thread.

tylerjereddy · 2015-12-17T16:51:46Z

I've moved the test for #550 to its own class and made an effort to comply with the suggestions in this PR thread. The original TestGROWriter class should now be unchanged, and any issues with that should be raised elsewhere.

jbarnoud · 2015-12-17T16:56:45Z

👍

tylerjereddy · 2016-01-04T10:34:41Z

Any other suggestions / problems with this PR?

Fixing Issue#550

richardjgowers · 2016-01-04T10:39:57Z

Nope, looks good

sseyler and others added 11 commits December 9, 2015 16:26

Using TempDir() for testing PSA

c89734b

Merge branch 'develop' of https://github.com/MDAnalysis/mdanalysis in…

a11c561

…to develop

Fixed usage of tempdir

9716b52

Fixed typo

692fdf0

Now using tempfile.mkdtemp() and shutil to delete

bf75c7e

Merge pull request #575 from sseyler/develop

a125286

Tests for PSA now use tempdir Fixes Issue #557

Merge pull request #578 from MDAnalysis/issue-576

e2065be

Fixes Issue #576

Added unit test and associated data for Issue#550.

d3b6b44

Improved the test for Issue#550 to include all lines in the coordinat…

2819903

…e file. Improved the error message.

Added CHANGELOG entry for fix of Issue #550.

23c46f1

tylerjereddy added defect Format-Gromacs Component-Writers labels Dec 11, 2015

richardjgowers self-assigned this Dec 11, 2015

mnmelo force-pushed the develop branch from e2065be to e354507 Compare December 12, 2015 00:40

bigbox.gro has been replaced with a compressed version of this file, …

025322e

…which is involved in testing resolution of Issue#550.

mnmelo mentioned this pull request Dec 13, 2015

QuantifiedCode makes PRs have unsuccessful checks #583

Closed

jbarnoud reviewed Dec 14, 2015
View reviewed changes

Moved the test for Issue#550 to its own class and improved as discuss…

55bc758

…ed in the PR thread.

richardjgowers added a commit that referenced this pull request Jan 4, 2016

Merge pull request #580 from MDAnalysis/gro-fix-truncation

0371edc

Fixing Issue#550

richardjgowers merged commit 0371edc into develop Jan 4, 2016

richardjgowers deleted the gro-fix-truncation branch January 4, 2016 10:39

Uh oh!

Conversation

tylerjereddy commented Dec 11, 2015

Uh oh!

richardjgowers commented Dec 11, 2015

Uh oh!

tylerjereddy commented Dec 11, 2015

Uh oh!

richardjgowers commented Dec 11, 2015

Uh oh!

tylerjereddy commented Dec 11, 2015

Uh oh!

orbeckst commented Dec 11, 2015

Uh oh!

tylerjereddy commented Dec 12, 2015

Uh oh!

mnmelo commented Dec 13, 2015

Uh oh!

jbarnoud Dec 14, 2015

Choose a reason for hiding this comment

Uh oh!

tylerjereddy commented Dec 17, 2015

Uh oh!

kain88-de commented Dec 17, 2015

Uh oh!

kain88-de commented Dec 17, 2015

Uh oh!

jbarnoud commented Dec 17, 2015

Uh oh!

tylerjereddy commented Dec 17, 2015

Uh oh!

jbarnoud commented Dec 17, 2015

Uh oh!

tylerjereddy commented Jan 4, 2016

Uh oh!

richardjgowers commented Jan 4, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants