writing to partitioned table uses the wrong column as partition key

### Describe the bug

writing to partitioned table uses the wrong column as partition key

### To Reproduce


```sql
❯ create external table test(partition varchar, trace_id varchar) stored as parquet partitioned by (partition_id) location '/tmp/test'  options (create_local_path 'true');
Arrow error: Schema error: Unable to get field named "partition_id". Valid fields: ["partition", "trace_id"]
❯ create external table test(partition varchar, trace_id varchar) stored as parquet partitioned by (partition) location '/tmp/test'  options (create_local_path 'true');
0 rows in set. Query took 0.001 seconds.

❯ insert into test select * from 'input.parquet';
Object Store error: Generic LocalFileSystem error: Unable to open file /private/tmp/test/partition=00000000000000003088e9e74cf166bd/QZSOEmePoAsQkvzU.parquet#1: Too many open files (os error 24)
❯
```

It looks like it used the wrong column as the partition key:


```shell
alamb@MacBook-Pro-8:~/Software/arrow-datafusion$ ls /tmp/test | wc -l
   19992

alamb@MacBook-Pro-8:~/Software/arrow-datafusion$ ls /tmp/test | head
partition=0000000000000000000102576ce2faea/
partition=00000000000000000004d8eb49424f0d/
partition=000000000000000000051e89839e2bb0/
partition=00000000000000000005406b87ebcb41/
partition=000000000000000000065226d41eba99/
partition=000000000000000000066e556bea9c68/
partition=0000000000000000000688030531c6ff/
partition=000000000000000000068839bb143b45/
partition=00000000000000000006aaa220390696/
partition=00000000000000000006b0ebabd460a3/
```


Here is the input: [`input.parquet`](https://drive.google.com/file/d/1Hfld5jbKsSfp7vVTSBNIcV_KuvWIxc5D/view?usp=sharing)

Here is how I made it:

```sql

❯ copy (select 'a' as "partition", trace_id from traces UNION ALL select 'b' as "partition", trace_id from traces UNION ALL select 'c' as "partition", trace_id from traces) to 'input.parquet';
+----------+
| count    |
+----------+
| 15557151 |
+----------+
1 row in set. Query took 3.639 seconds.

❯ select * from 'input.parquet' limit 1;
+-----------+----------------------------------+
| partition | trace_id                         |
+-----------+----------------------------------+
| b         | 000000000000000028bf4438cad62275 |
+-----------+----------------------------------+
1 row in set. Query took 0.009 seconds.

❯ describe 'input.parquet';
+-------------+-------------------------+-------------+
| column_name | data_type               | is_nullable |
+-------------+-------------------------+-------------+
| partition   | Utf8                    | NO          |
| trace_id    | Dictionary(Int32, Utf8) | YES         |
+-------------+-------------------------+-------------+
2 rows in set. Query took 0.001 seconds.
```


### Expected behavior

I expect a three output directories to be created, for each of the distinct values of `partition_id`, `a`, `b` and `c`

```
/tmp/test/partition_id=a/<uuid>.parquet
/tmp/test/partition_id=b/<uuid>.parquet
/tmp/test/partition_id=c/<uuid>.parquet
```

### Additional context

Found as part of https://github.com/apache/arrow-datafusion/pull/7801#pullrequestreview-1687734166

@devinjdangelo  did some initial investigation: https://github.com/apache/arrow-datafusion/pull/7801#issuecomment-1771867480

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

writing to partitioned table uses the wrong column as partition key #7892

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

writing to partitioned table uses the wrong column as partition key #7892

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions