I suspect that the results would not sound good(I have no doubt that starting with a “c’mon man, you just concatenate some phonemes, how hard can it be?” assumption will trigger either profoundly weary sighs or attack frenzy behavior in all linguists in range); but, as a gratuitously electromechanical excercise in doing it the hard way just because solid-state lacks a certain charm, an Edison-tech-level text to speech apparatus would probably be really, really cool(and it is true that even some fairly recent text-to-speech systems aren’t shy about just bolting recorded snippets together. Telephone AVRs dealing with numbers and times are typically the most overt, since “reading” a number as relentlessly monospaced litany of it’s individual digits in order, absolutely without contextual modification, is generally treated as less grating that treating a word as a series of letters to be announced one at a time; though ‘ideomatic’ readings of number sequences usually do exhibit some grouping according to unstated-and- probably-really-hard rules of what sounds right and taxes short term memory least).
(Edit for implementation proposal when I’m not on a damn phone touchscreen)