Principles of music processing Chapter 1: Music Representation

Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications. Meinard M ü ller study notes

Score representation

Symbolic representation

MIDI representation

Simulate the state of pressing and hitting the electronic organ: each key can be represented by note number, key velocity, channel and timestamp.

midi pitch note number: the number from 0 to 127, 128 tones in total, indicating C − 1 ∼ G # 9 \mathrm{C^-1 \sim G^{\#}9} Tone between C − 1 ∼ G#9.
key velocity: an integer from 0 to 127, which determines the volume or sound attenuation speed
Channel: an integer from 0 to 15, sound channel
timestamp: an integer indicating the number of clock pulses / ticks to wait.

MIDI divides the quarter note into several clock pulses / ticks, which are recorded as PPQN(pulses per quarter note, or TPQN, ticks per quarter note, or PPQ, TPQ). PPQN should be set in the header of each MIDI file as the next MIDI sequence calculation timestamp standard. PPQN defaults to 120, that is, a quarter note is 120 clock pulses / ticks.
MIDI can also set a quarter note of absolute time. For example, a quarter note of 0.6 seconds can be set, which can be converted into a clock pulse / tick count of 5 milliseconds. Another unit of measurement is BPM(beats per minute). A quarter note in 0.6 seconds is 100BPM (100 quarter notes in a minute).

Scoring representation

MusicXML, each note attribute is represented by a label, for example, a E b 4 \mathrm{E^b4} Eb4 tone:

<note>
  <pitch>
  <step>E</step>
  <alter>-1</alter>
  <octave>4</octave>
  </pitch>
</note>

Optical music recognition

Scan and recognize the electronic picture of music score

Audio representation

Wave and waveform

The essence of sound is air pressure vibration. The waveform reflects the change of air pressure relative to the average air pressure during sound propagation. The peak refers to the highest point of air pressure during sound propagation, and the trough refers to the lowest point of air pressure during sound propagation. The air pressure is the density of air molecules. The denser the molecules are, the higher the air pressure is.

Frequency and tone

Period
Waves move periodically. In the waveform, the time from one peak to another is recorded as a period.
Frequency
- Frequency f = 1 / cycle T in Hz
- The acceptable frequency of human ear is 20Hz - 20kHz
- The higher the frequency, the higher the tone
Amplitude
Refers to the difference from the peak to the mean. (not the difference between peak and trough)
Phase phase
The value of the waveform at time 0.

The sine wave is regarded as the most basic sound wave. The sound generated by the sine wave is called harmonic sound or pure tone. The international standard records the 440Hz sine wave as tone A4.
From the perspective of auditory perception, if the frequency of two tones is a multiple of two, then the two tones sound similar. For example, A3(220Hz), A4(440Hz) and A5(880Hz) sound very similar. In addition, human beings feel that the cognitive distance of A4 is the same as that of A4 to A5, so human perception of tone is essentially logarithmic.
Combining the MIDI tone number and the twelve average law, the corresponding frequency of each tone can be calculated (the MIDI number of A4 is 69):
F p i t c h ( p ) = 2 ( p − 69 ) / 12 ⋅ 440 H z F_{pitch}(p) = 2^{(p-69)/12} \cdot 440 \mathrm{Hz} Fpitch(p)=2(p−69)/12⋅440Hz
The phase difference frequency of each semitone is a constant:
F p i t c h ( p + 1 ) F p i t c h ( p ) = 2 12 \frac{F_{pitch}(p+1)}{F_{pitch}(p)} = \sqrt[12]{2} Fpitch(p)Fpitch(p+1)=122
More generally, you can use cent as a basic unit to divide intervals: an octave is divided into 1200 cents, that is, 100 cents for each semitone. A cent tone change is too small. Experience shows that adults can accurately recognize a 25 cent tone difference, and trained people can even recognize a 10 cent tone difference.
In the real world, the tone is expressed by split and overtone.

Split partial
The vibration of a whole string / air column as the pitch is called the first minute. Then the string / air column is divided into integers, half the length is the second tone, one third is the third tone, and so on.
Overtone harmonic
Overtones are integral multiples of various separations
overtone
A parting other than a pitch
Deviation tone inharmonicity
Difference between overtone frequency and fundamental frequency of an instrument

For example, a parting / accompaniment ω \omega ω If its frequency is 65.2Hz(C2), its overtone column frequency is ω , 2 ω , 3 ω , 4 ω . . . \omega, 2\omega,3\omega,4\omega... ω, two ω, three ω, four ω... wait. Where overtones of power 2 are high octaves: ω \omega ω C2, 2 ω 2\omega two ω C3, 4 ω 4\omega four ω C4; 3 ω 3\omega three ω Similar to G3 (pure five degrees), as shown in the figure:

Pitch frequency cent difference	0	0	+2	0	-14	+2	-31	0	+4	-14	-49	+2	+41	-31	-12	0
tone	C 2 \mathrm{C2} C2	C 3 \mathrm{C3} C3	G 3 \mathrm{G3} G3	C 4 \mathrm{C4} C4	E 4 \mathrm{E4} E4	G 4 \mathrm{G4} G4	B b 4 \mathrm{B^b4} Bb4	C 5 \mathrm{C5} C5	D 5 \mathrm{D5} D5	E 5 \mathrm{E5} E5	F # 5 \mathrm{F^\#5} F#5	G 5 \mathrm{G5} G5	A b 5 \mathrm{A^b5} Ab5	B b 5 \mathrm{B^b5} Bb5	B 5 \mathrm{B5} B5	C 6 \mathrm{C6} C6
Overtone	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16

Note: overtone 3 ω 3\omega three ω The frequency is 2 per cent higher than that of G3.

Dynamics, intensity and loudness

Loudness essentially refers to the sound intensity * * intensity * *, and the loudness range is dynamic dynamics.
Sound power refers to the energy transmitted by the sound source to the air per unit time, while sound intensity refers to the sound power per unit area, unit W / m 2 W/m^2 W/m2. The minimum sound intensity perceived by humans is recorded as the threshold of hearing (TOH):
I T O H = 1 0 − 12 W / m 2 I_{TOH} = 10^{-12}W/m^2 ITOH=10−12W/m2
The maximum sound intensity perceived by humans is recorded as the threshold of pain (top):
I T O P = 10 W / m 2 I_{TOP} = 10W/m^2 ITOP=10W/m2
In practice, sound intensity is measured in decibels:
d B ( I ) = 10 ⋅ l o g 10 ( I I T O H ) I T O H : 0 d B I T O P : 130 d B d B ( 2 I ) − d B ( I ) ≈ 3 dB(I) = 10\cdot log_{10}(\frac{I}{I_{TOH}}) \\ I_{TOH} : 0dB \\ I_{TOP} : 130dB \\ dB(2I) - dB(I) \approx 3 dB(I)=10⋅log10(ITOHI)ITOH:0dBITOP:130dBdB(2I)−dB(I)≈3

According to the decibel formula, the decibel value of twice the sound intensity is greater than that of the original sound intensity, and the difference is about 3.
In addition, people will feel the difference of sound frequency casually, and the TOH and TOP will also change, generally decreasing with the increase of frequency.

timbre

ADSR model
The peaks (troughs) of the waveform in the sound generation time are connected into a curve. According to the fluctuation of the curve, it can be divided into four stages A(attack)-D(delay)-S(sustain)-R(release) (similar to the process of playing a key on the piano). The ADSR curves of different timbres of the same tone are different (although they can reach the frequency and amplitude of the tone).
tremolo/vibrato
Both belong to vibrato. tremolo is equivalent to am and vibrato is equivalent to FM. Modulation has two necessary parameters: modulation rate and modulation amplitude.

summary

By translating the real-world music score into symbols, and then the computer transcribes the symbols into audio playback.

Some exercises

Any semitone frequency ratio, or semitone distance of any frequency:
F p i t c h ( p + k ) F p i t c h ( p ) = 2 k / 12 d i s t a n c e ( k ) = 12 ⋅ l o g 2 ω 1 ω 2 \frac{F_{pitch}(p+k)}{F_{pitch(p)}} = 2^{k/12} \\ distance(k) = 12\cdot log_2{\frac{\omega_1}{\omega_2 }} Fpitch(p)Fpitch(p+k)=2k/12distance(k)=12⋅log2ω2ω1
Suppose an octave has 17 tones, and set a tone number p=100, and its frequency is 1000Hz. There are 256 tone numbers, marked as 0 ~ 255. So in this model, what is the frequency corresponding to the tone number of p=83, p=66 and p=49, and what is the percentage difference between the two adjacent tones?
this moment let meaning sound transfer frequency rate than value by : F p i t c h ( p + k ) F p i t c h ( p ) = 2 k / 17 belt enter F p i t c h ( 100 ) = 1000 H z have to : F p i t c h ( p ) = 2 ( k − 100 ) / 17 ⋅ 1000 H z so : F p i t c h ( 83 ) = 500 H z ， F p i t c h ( 66 ) = 250 H z ， F p i t c h ( 49 ) = 125 H z one individual eight degree common have 1200 individual c e n t ， so mutually adjacent sound transfer Deliver increase of c e n t = 1200 / 17 ≈ 71 At this moment, any tone frequency ratio is: \ frac {f {pitch} (P + k)} {f {pitch (P)}} = 2 ^ {K / 17} \ \ bring in F_ F for {pitch} (100) = 1000 \ mathrm {Hz}_ {pitch} (P) = 2 ^ {(K-100) / 17} \ cdot 1000 \ mathrm {Hz} \ \ so: F_{pitch}(83) = 500\mathrm{Hz}，F_{pitch}(66) = 250\mathrm{Hz}，F_{pitch}(49) = 125\mathrm{Hz} \ \ there are 1200 percents in an octave, so the adjacent tones increase by cent=1200/17 \approx 71 At this moment, the frequency ratio of any tone is: Fpitch(p) Fpitch (p+k) = 2k/17, brought into Fpitch (100)=1000Hz: Fpitch(p)=2(k − 100) / 17 ⋅ 1000Hz, so: Fpitch (83)=500Hz, Fpitch (66)=250Hz, Fpitch (49)=125Hz. There are 1200 percents in an octave, so the increasing percents of adjacent tones are 1200 / 17 ≈ 71
Write a simple program to convert tones and MIDI tone numbers.
Write a simple program to calculate the frequencies of the 16 overtones of C2 and find the nearest tone to them. Same principle calculation B b 4 \mathrm{B^b4} Bb4.

def pitch_sharp():
    return 'C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B'


def pitch_flat():
    return 'C', 'Db', 'D', 'Eb', 'E', 'F', 'Gb', 'G', 'Ab', 'A', 'Bb', 'B'


def to_pitch(num):
    index, bias = num % 12, str(num // 12 - 1)
    result_s, result_f = pitch_sharp()[index] + bias, pitch_flat()[index] + bias
    return set((result_s, result_f))


def check_pitch(pitch):
    if pitch[1] == '#':
        return pitch_sharp().index(pitch[:2]), pitch[2:]
    elif pitch[1] == 'b':
        return pitch_flat().index(pitch[:2]), pitch[2:]
    else:
        return pitch_sharp().index(pitch[:1]), pitch[1:]


def to_num(pitch):
    index, bias = check_pitch(pitch)
    return (int(bias) + 1) * 12 + index

def cent_round(cent):
    return round(cent % 100 if cent % 100 < 50 else cent % 100 - 100)


def gen_harmonic(pitch, n):
    i_ = 1
    while i_ <= n:
        diff_cent_ = math.log2(i_) * 1200
        yield cent_round(diff_cent_), to_pitch(round(diff_cent_ / 100) + to_num(pitch))
        i_ = i_ + 1
# gen_harmonic('Bb4', 16) output:
# diff: 0 , pitch: {'A#4', 'Bb4'}
# diff: 0 , pitch: {'Bb5', 'A#5'}
# diff: 2 , pitch: {'F6'}
# diff: 0 , pitch: {'A#6', 'Bb6'}
# diff: -14 , pitch: {'D7'}
# diff: 2 , pitch: {'F7'}
# diff: -31 , pitch: {'G#7', 'Ab7'}
# diff: 0 , pitch: {'A#7', 'Bb7'}
# diff: 4 , pitch: {'C8'}
# diff: -14 , pitch: {'D8'}
# diff: -49 , pitch: {'E8'}
# diff: 2 , pitch: {'F8'}
# diff: 41 , pitch: {'Gb8', 'F#8'}
# diff: -31 , pitch: {'G#8', 'Ab8'}
# diff: -12 , pitch: {'A8'}
# diff: 0 , pitch: {'Bb8', 'A#8'}

Pythagorean tuning, proposed by Pythagoras, uses only a ratio of 3:2 to generate tone frequencies. Pythagorean scale is a scale constructed only by pure pentad (3:2) and octave (2:1). Now for C2 operation, the frequency is continuously multiplied by 3 / 2. If the generated frequency is higher than C3 frequency, divide by 2. By analogy, 13 frequency values (including the initial C2) can be generated. The last frequency value is closest to C2, and the difference between it and C2 is called Pythagorean comma. Use a program to simulate the process and calculate the nearest twelve tone average tone and corresponding difference

def diff_cent(w1, w2):
    return 1200 * math.log2(w1 / w2)


def pythagorean(new_freq, freq, idx):
    return 1.5 * new_freq if 1.5 * new_freq / freq < 2 else 0.75 * new_freq


def gen_tuning(pitch, func):
    i_, n, freq_ = 1, 12, to_freq(pitch)
    new_freq_ = freq_
    while i_ <= n:
        new_freq_ = func(new_freq_, freq_, i_)
        diff_cent_ = diff_cent(new_freq_, freq_)
        new_pitch_ = to_pitch(round(diff_cent_ / 100) + to_num(pitch))
        yield new_freq_ / freq_, new_pitch_, to_freq(tuple(new_pitch_)[0]) / freq_, cent_round(diff_cent_)
        i_ = i_ + 1
# gen_tuning('C2', pythagorean) output:
# pythagorean ratio: 1.5  pitch: {'G2'}  frequency ratio: 1.4983070768766817  diff cent: 2
# pythagorean ratio: 1.125  pitch: {'D2'}  frequency ratio: 1.1224620483093728  diff cent: 4
# pythagorean ratio: 1.6875000000000002  pitch: {'A2'}  frequency ratio: 1.681792830507429  diff cent: 6
# pythagorean ratio: 1.265625  pitch: {'E2'}  frequency ratio: 1.2599210498948734  diff cent: 8
# pythagorean ratio: 1.8984375  pitch: {'B2'}  frequency ratio: 1.887748625363387  diff cent: 10
# pythagorean ratio: 1.423828125  pitch: {'Gb2', 'F#2'}  frequency ratio: 1.414213562373095  diff cent: 12
# pythagorean ratio: 1.06787109375  pitch: {'C#2', 'Db2'}  frequency ratio: 1.0594630943592953  diff cent: 14
# pythagorean ratio: 1.6018066406250002  pitch: {'G#2', 'Ab2'}  frequency ratio: 1.5874010519681994  diff cent: 16
# pythagorean ratio: 1.2013549804687502  pitch: {'Eb2', 'D#2'}  frequency ratio: 1.189207115002721  diff cent: 18
# pythagorean ratio: 1.8020324707031254  pitch: {'Bb2', 'A#2'}  frequency ratio: 1.7817974362806788  diff cent: 20
# pythagorean ratio: 1.3515243530273442  pitch: {'F2'}  frequency ratio: 1.3348398541700344  diff cent: 22
# pythagorean ratio: 1.013643264770508  pitch: {'C2'}  frequency ratio: 1.0  diff cent: 23

Pythagorean comma is 23

Three part profit and loss method: first three parts of loss and then three parts of profit and loss; After the sixth turn, three points will be lost.

def chinese_harmonic(new_freq, freq, idx):
    return 1.5 * new_freq if (idx % 2 != 0) ^ (idx > 6) else 0.75 * new_freq
# gen_tuning('C2', chinese_harmonic) output:
# chinese ratio: 1.5  pitch: {'G2'}  frequency ratio: 1.4983070768766817  diff cent: 2
# chinese ratio: 1.125  pitch: {'D2'}  frequency ratio: 1.1224620483093728  diff cent: 4
# chinese ratio: 1.6875000000000002  pitch: {'A2'}  frequency ratio: 1.681792830507429  diff cent: 6
# chinese ratio: 1.265625  pitch: {'E2'}  frequency ratio: 1.2599210498948734  diff cent: 8
# chinese ratio: 1.8984375  pitch: {'B2'}  frequency ratio: 1.887748625363387  diff cent: 10
# chinese ratio: 1.423828125  pitch: {'F#2', 'Gb2'}  frequency ratio: 1.414213562373095  diff cent: 12
# chinese ratio: 1.06787109375  pitch: {'Db2', 'C#2'}  frequency ratio: 1.0594630943592953  diff cent: 14
# chinese ratio: 1.6018066406250002  pitch: {'Ab2', 'G#2'}  frequency ratio: 1.5874010519681994  diff cent: 16
# chinese ratio: 1.2013549804687502  pitch: {'Eb2', 'D#2'}  frequency ratio: 1.189207115002721  diff cent: 18
# chinese ratio: 1.8020324707031254  pitch: {'A#2', 'Bb2'}  frequency ratio: 1.7817974362806788  diff cent: 20
# chinese ratio: 1.3515243530273442  pitch: {'F2'}  frequency ratio: 1.3348398541700344  diff cent: 22
# chinese ratio: 2.027286529541016  pitch: {'C3'}  frequency ratio: 2.0  diff cent: 23

The results show that the five degree phase generation law is the same as the three part profit and loss law.

Added by delphi on Sun, 02 Jan 2022 20:31:19 +0200

Programming VIP