Skip to content

Commit 6d0747e

Browse files
authored
Add new data processing terms for zero intensity trimming (#460)
* add new data processing terms for zero intensity trimming * revise description * restore lost terms post merge * version bump * restore another lost term * fix term ordering, add automation to detect and flag * add python config
1 parent 4fb90df commit 6d0747e

File tree

3 files changed

+63
-13
lines changed

3 files changed

+63
-13
lines changed

.github/workflows/validate-obo.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,10 @@ jobs:
2727
# Runs a single command using the runners shell
2828
- name: Validate OBO file
2929
run: fastobo-validator --duplicates psi-ms.obo
30+
- uses: actions/setup-python@v6
31+
with:
32+
python-version: '3.10'
33+
- name: Check sorting
34+
run: |
35+
python -m pip install fastobo
36+
python scripts/check_sorted.py psi-ms.obo

psi-ms.obo

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
format-version: 1.2
2-
data-version: 4.1.227
2+
data-version: 4.1.228
33
date: 02:12:2025 11:00
44
saved-by: Joshua Klein
55
default-namespace: MS
@@ -24933,18 +24933,6 @@ name: secondary electrospray ionization
2493324933
def: "Secondary electrospray ionization (SESI) is an atmospheric pressure ionization (API) technique that uses a primary nano-electrospray plume of solvent ions to ionize neutral gaseous molecules in the gas phase via efficient proton transfer reactions. Operating at atmospheric pressure, SESI allows for the sensitive and real-time detection of volatile organic compounds (VOCs) and vapors with minimal sample preparation, making it ideal for applications like breath analysis and environmental monitoring." [PSI:MS]
2493424934
is_a: MS:1000240 ! atmospheric pressure ionization
2493524935

24936-
[Term]
24937-
id: MS:1003800
24938-
name: TSQ Certis
24939-
def: "Thermo Scientific TSQ Certis Triple Quadrupole MS." [PSI:PI]
24940-
is_a: MS:1000494 ! Thermo Scientific instrument model
24941-
24942-
[Term]
24943-
id: MS:1003801
24944-
name: AccurateMassSearch
24945-
def: "OpenMS TOPP tool to assemble metabolite features from singleton mass traces." [PSI:MS]
24946-
is_a: MS:1000752 ! TOPP software
24947-
2494824936
[Term]
2494924937
id: MS:1003780
2495024938
name: zstd compression
@@ -24981,6 +24969,30 @@ name: MS-Numpress short logged float compression followed by zstd compression
2498124969
def: "Compression using MS-Numpress short logged float compression and Zstandard." [PSI:MS, https://github.com/ms-numpress/ms-numpress]
2498224970
is_a: MS:1000572 ! binary data compression type
2498324971

24972+
[Term]
24973+
id: MS:1003800
24974+
name: TSQ Certis
24975+
def: "Thermo Scientific TSQ Certis Triple Quadrupole MS." [PSI:PI]
24976+
is_a: MS:1000494 ! Thermo Scientific instrument model
24977+
24978+
[Term]
24979+
id: MS:1003801
24980+
name: AccurateMassSearch
24981+
def: "OpenMS TOPP tool to assemble metabolite features from singleton mass traces." [PSI:MS]
24982+
is_a: MS:1000752 ! TOPP software
24983+
24984+
[Term]
24985+
id: MS:1003901
24986+
name: zero intensity point trimming
24987+
def: "Apply an algorithm to remove excess zero intensity value data points from a spectrum. Data may be retained for interperatbility such as retaining only zeros that flank non-zero intensity value data points from a profile spectrum.." [PSI:MS]
24988+
is_a: MS:1000543 ! data processing action
24989+
24990+
[Term]
24991+
id: MS:1003902
24992+
name: zero intensity point trimming interpolation
24993+
def: "A zero intensity point trimming algorithm that interpolates the m/z coordinate values from the local data or an estimated model." [PSI:MS]
24994+
is_a: MS:1003901 ! zero intensity point trimming
24995+
2498424996
[Term]
2498524997
id: MS:4000000
2498624998
name: PSI-MS CV Quality Control Vocabulary

scripts/check_sorted.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
import sys
2+
3+
import fastobo
4+
5+
cv = fastobo.load(sys.argv[1])
6+
7+
terms = list(cv)
8+
in_order = True
9+
last = None
10+
conflicts = []
11+
for i, t in enumerate(terms):
12+
if isinstance(t, fastobo.term.TermFrame):
13+
if t.id.prefix == 'NCIT':
14+
continue
15+
if last is None:
16+
last = t.id
17+
elif t.id.prefix != last.prefix:
18+
last = t.id
19+
else:
20+
if int(t.id.local) < int(last.local):
21+
conflicts.append((t.id, last, i))
22+
print(f"{t.id} is lower than {last}")
23+
in_order = False
24+
last = t.id
25+
26+
if in_order:
27+
print("All MS terms in order")
28+
sys.exit(0)
29+
else:
30+
print(conflicts)
31+
sys.exit(1)

0 commit comments

Comments
 (0)