Alright, my husband and I are having a 'discussion' about this one.
I know the technical numbers crunch of how it works, you sub your performance roll for the appropriate skill roll. The question is, from the perspective of the bard, HOW does the bard make the substitution? Oratory, Acting, yeah, those are easy. It's the intimidate from percussion that we're fighting over, likewise diplomacy from singing. He says I'd have to actually be drumming to use the switch (which how the hell are you going to use it for handle animal? That'd scare most animals.) or sing my argument if I want to sub singing for my diplomacy. Most of them are easily done, like dance for acrobatics and fly, but... string? Wind? How can you make a diplomacy check while blowing into a reed?
My argument (for the percussion, at least) is that I'm using the rhythm of my words, sort of a verbal drumming like rap.

what you are really asking about is the in game rationalization or descriptive rationale behind how an ability works. Mostly it is a game and it is made up. RAW is geared to be somewhat believeable and fun for a game. IT IS NOT as precise or accurate as old newtonian physics, it's not even close. Thus any physics 'explanation' by PF1 is automatically ludicrous.

How you execute it in your game and rationalize how and why it works customizes RAW to be more believeable in your home game. That's the fun of customizing the game. Make up something for your character and say, 'it works this way'. Your husbands character can use a different methodology. both are right so long as the mechanics operate according to RAW.
The descriptive elements of RAW are called 'fluff'. It doesn't affect the mechanics.

I play Dark Souls. A lot of the skills for Dark Souls don't exactly translate to Little Big Planet, but when I play Little Big Planet I have a much easier time of it than my wife does.

Even if the game rules are different, there are certain aspects that make learning new mechanics easier - knowing what the buttons on the playstation controller are called, knowing without looking where the buttons are located, knowing that there's a button on top of the joysticks (L3 and R3).

More than that though, it's knowing what the game expects of me, I know that passing a checkpoint means I'll respawn there if I die. I can see that there are a limited number of respawns by the white circles around the respawn points. I understand how games try to give us information implicitly - such as lighting up an area as a way to show us the way forward.

Versatile Performance is like that - you're taking skills learned for one area and applying them to another similar area. Frankly it's more weird that other classes don't get it.

From my understanding, and I have definitely been wrong before, there is absolutely no roleplay involved in the substitution.

Bards are skill-monkey/jack of all trades... it is simply a representation of your expansive depth of utility. Your "versatility", one could say.

It has absolutely nothing with banging kettles together in an attempt to Handle Animals... or sing Diplomacy... or whatever... it is simply allowing you to share an area of expertise with other skills.

As a Bard, you will "likely" have max ranks in your chosen Perform skill(s)... but possibly not Handle Animal... however, Bards are just so freaking Versatile they have virtual skill ranks everywhere.

Or, at least, that is what I believe is the intent of Versatile Performance.

Personally, I would have paired some of the skills differently, as an example, tuning keyboards and stringed instruments require some mechanical ability, so I would have linked Disable Device to those kinds of Performances. But, as the others have said, it is all about applying what you learn from the Perform skills (with all the ancillary knowledge and abilities learned) to a different skill.

Drums, bagpipes, and other instruments have been used in war to impress the opponents and olden your troops. Sometimes even when the two sides don't share cultural values. Knowing how some sounds resonate within the body of those you want to impress can make a difference. How you do that can vary, it can be your voice, clapping your hands, or other stuff (depending on the original Perform skill too), but it doesn't require the use of the actual instrument or to actually sing or dance.
You must remember that Versatile performance is an EX ability. So, while non-magical, it is in the field of the "barely possible" for real-life humans. Essentially the kind of stuff that only a few geniuses can do in our world.

you sub your performance roll for the appropriate skill roll.

Actually, this is wrong, and I think this misunderstanding is the source of the entire conflict.

When a character is using Versatile Performance (percussion) to intimidate, they do not make a perform (percussion) check, they make an intimidate check. They're just adding a different bonus to the roll. You substitute the perform skill bonus for the target skill bonus. In-universe, there is zero discernible difference between using Versatile Performance and making a check via the skill's regular bonus.

So no, you don't "have to actually be drumming" to intimidate someone. Maybe the physical training gave you broader shoulders, and you learned that deeper sounds have more of an affect on people and you use a deeper voice, and that's how the performance training helps you be intimidating, or whatever you want to come up with. But it's not a perform check.

