Write on &mut [u8] and Cursor<&mut [u8]> doesn't optimize very well.

Calling write on a mutable slice (or one wrapped in a Cursor) with one, or a small amount of bytes results in function call to memcpy call after optimization (opt-level=3), rather than simply using a store as one would expect:

```rust

pub fn one_byte(mut buf: &mut [u8], byte: u8) {
    buf.write(&[byte]);
}

```

Results in: 

```llvm

define void @_ZN6cursor8one_byte17h68c172d435558ab9E(i8* nonnull, i64, i8) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
_ZN4core3ptr13drop_in_place17hc17de44f7e6456c9E.exit:
  %_10.sroa.0 = alloca i8, align 1
  call void @llvm.lifetime.start(i64 1, i8* nonnull %_10.sroa.0)
  store i8 %2, i8* %_10.sroa.0, align 1
  %3 = icmp ne i64 %1, 0
  %_0.0.sroa.speculated.i.i.i = zext i1 %3 to i64
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull %0, i8* nonnull %_10.sroa.0, i64 %_0.0.sroa.speculated.i.i.i, i32 1, i1 false), !noalias !0
  call void @llvm.lifetime.end(i64 1, i8* nonnull %_10.sroa.0)
  ret void
}

```

`copy_from_slice` seems to be part of the issue here, if I change the [write implementation on mutable slices](https://github.com/rust-lang/rust/blob/e7070dd019d70b089a9983571dc40b2f9ee16cf5/src/libstd/io/impls.rs#L228) to use this instead of `copy_from_slice`:
```rust

for (&input, output) in data[..amt].iter().zip(a.iter_mut()) {
    *output = input;
}

```

the llvm ir looks much nicer: 
```llvm

define void @_ZN6cursor8one_byte17h68c172d435558ab9E(i8* nonnull, i64, i8) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  %3 = icmp eq i64 %1, 0
  br i1 %3, label %_ZN4core3ptr13drop_in_place17hc17de44f7e6456c9E.exit, label %"_ZN84_$LT$core..iter..Zip$LT$A$C$$u20$B$GT$$u20$as$u20$core..iter..iterator..Iterator$GT$4next17he84ad69753d1c347E.exit.preheader.i"

"_ZN84_$LT$core..iter..Zip$LT$A$C$$u20$B$GT$$u20$as$u20$core..iter..iterator..Iterator$GT$4next17he84ad69753d1c347E.exit.preheader.i": ; preds = %start
  store i8 %2, i8* %0, align 1, !noalias !0
  br label %_ZN4core3ptr13drop_in_place17hc17de44f7e6456c9E.exit

_ZN4core3ptr13drop_in_place17hc17de44f7e6456c9E.exit: ; preds = %start, %"_ZN84_$LT$core..iter..Zip$LT$A$C$$u20$B$GT$$u20$as$u20$core..iter..iterator..Iterator$GT$4next17he84ad69753d1c347E.exit.preheader.i"
  ret void
}

```
The for loop will result in vector operations on longer slices, but I'm still unsure about whether doing this change could cause some slowdown on very long slices as the memcpy implementation may be more optimized for the specific system, and it doesn't really solve the underlying issue. There seems to be some problem with optimizing `copy_from_slice` calls that follow `split_at_mut` and probably some other calls that involve slice operations (I tried to alter the write function to use unsafe and creating a temporary slice using pointers instead, but that didn't help.)

Happens on both nightly `rustc 1.21.0-nightly (2aeb5930f 2017-08-25)` and stable (1.19) x86_64-unknown-linux-gnu` (Not sure if memcpy behaviour could be different on other platforms).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Write on &mut [u8] and Cursor<&mut [u8]> doesn't optimize very well. #44099

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Write on &mut [u8] and Cursor<&mut [u8]> doesn't optimize very well. #44099

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions