Skip to content

Rewrite integer formatting#12382

Merged
bors merged 3 commits intorust-lang:masterfrom
brendanzab:fmt-int
Feb 22, 2014
Merged

Rewrite integer formatting#12382
bors merged 3 commits intorust-lang:masterfrom
brendanzab:fmt-int

Conversation

@brendanzab
Copy link
Contributor

This is PR is the beginning of a complete rewrite and ultimate removal of the std::num::strconv module (see #6220), and the removal of the ToStrRadix trait in favour of using the std::fmt functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation more comprehensible .

The Formatter::{pad_integral, with_padding} methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of run-pass/ifmt.rs in order to provide more immediate feedback when building using make check-stage2-std NO_REBUILD=1.

Arbitrary radixes are now easier to use in format strings. For example:

assert_eq!(format!("{:04}", radix(3, 2)), ~"0011");

The benchmarks have been standardised between std::num::strconv and std::num::fmt to make it easier to compare the performance of the different implementations.

 type | radix | std::num::strconv      | std::num::fmt
======|=======|========================|======================
 int  | bin   | 1748 ns/iter (+/- 150) | 321 ns/iter (+/- 25)
 int  | oct   |  706 ns/iter (+/- 53)  | 179 ns/iter (+/- 22)
 int  | dec   |  640 ns/iter (+/- 59)  | 207 ns/iter (+/- 10)
 int  | hex   |  637 ns/iter (+/- 77)  | 205 ns/iter (+/- 19)
 int  | 36    |  446 ns/iter (+/- 30)  | 309 ns/iter (+/- 20)
------|-------|------------------------|----------------------
 uint | bin   | 1724 ns/iter (+/- 159) | 322 ns/iter (+/- 13)
 uint | oct   |  663 ns/iter (+/- 25)  | 175 ns/iter (+/- 7)
 uint | dec   |  613 ns/iter (+/- 30)  | 186 ns/iter (+/- 6)
 uint | hex   |  519 ns/iter (+/- 44)  | 207 ns/iter (+/- 20)
 uint | 36    |  418 ns/iter (+/- 16)  | 308 ns/iter (+/- 32)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text?

@brendanzab
Copy link
Contributor Author

r? @alexcrichton

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat trick!

However, it doesn't seem to support arbitrary bases. What about these being normal methods fn base(&self) -> u8 (etc.) and having

pub struct Binary;
impl Radix for Binary { fn base(&self) -> u8 { 2 } /* ... */ }
pub struct Octal;
// ...

pub struct GenericRadix { priv base: u8 }
impl Radix for GenerixRadix { fn radix(&self) { self.base } /* ... */ }

which would give the best of both worlds: static dispatch for the common cases but still enough flexibility to handle arbitrary bases. (And also allows for reducing code-bloat via choosing to use the GenericRadix version over monomorphising for the others, similar to a &Radix trait object, but without the extra cost of virtual calls.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey that's a cool idea. Will do.

@brson
Copy link
Contributor

brson commented Feb 19, 2014

Sweet perf wins. 🚀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a bit faster if you accumulate the digits backwards starting at the end of the buffer. That way you only need to make one write call to the underlying Writer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WOAH. Mega improvement. Writing an int with a decimal radix now takes 204 ns, down from 640 ns with strconv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! (Remember to update the PR/commit messages before it lands :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@zkamsler
Copy link
Contributor

Cool stuff. It reminds me that I ought to dust off the floating point formatting code that I had written a few weeks ago.

@brendanzab
Copy link
Contributor Author

Yeah, I'm looking at the Dragon4 (Steele) and Grisu3 (Loitsch) algorithms when it comes to formatting floating point. But you're welcome to look at it too. It's much more complicated than the integer stuff :)

@brendanzab
Copy link
Contributor Author

@alexcrichton Ok, I think I'm done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems copied from below, there seems to be no check against the "maximum width" which I think in the case of integers is precision.

@alexcrichton
Copy link
Member

This is super awesome. Perf wins, cleanups, hurrah!

I'd like to bikeshed the interface for a brief amount of time if that's ok. Some thoughts I have:

  • num::fmt => fmt::num?
  • These Radix structs have a bit of an odd api. It may be nicer to have clearly defined top-level functions inside of the module which use the Radix structures/traits internally. Basically it would be nice to avoid exposing one more trait.

What do you think?

@brendanzab
Copy link
Contributor Author

Ok, updated the API and moved it to std::fmt::num.

You can now do:

assert_eq!(format!("{:04}", radix(3, 2)), ~"0011");

I haven't made the other things private, because I still think its useful to be able to write the numbers via a Formatter directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use may need to be updated. I think it'd be useful to show it in the documentation as well.

Could you also expand the documentation a bit about how the value can be passed to format! to format the specified integer with the specified base?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants