Why Macs have millions of tiny files

[Permalink]

1 Like

Not recommended for deletion.

13 Likes

Odd, but the System V and Solaris servers I have managed never seemed to have a file bloat problem and file system checks and backups never seemed to take longer than a typical Windows server.
But yeah, let’s blame Unix.

3 Likes

This. I read the article, the author never explains how lots of little files are the fault of “Unix.” He then goes on to mention the symlinks created by Time Machine, and Spotlight. My Linux system has just shy of 1 million files on that, a large part of which are git object files, but I’m not blaming “Unix” for that.

2 Likes

Millions of tiny flies? Ugh. I knew there was a reason I didn’t like Macs.

5 Likes

That’s how I read it first too, and I was thinking “I don’t recall seeing any swarms when I had a Macbook”

Well, they’re, like, super tiny.

2 Likes

Not in detail, no. He says “There is something about Unix that loves a multiplicity of tiny files rather than monolithic larger ones, hence these huge counts.” Which is true. Unix believes in aggregating small programs to do operations. Do one thing, do it well. Chain those things together. This good behavior is, to use your term, the fault of Unix.

He also goes on to point out that the number of files is not big deal these days: “With HFS+, however, a drive can have nearly 4.3 billion files without each consuming unreasonable amounts of space.” Goes into some detail on Time Machine, then quotes somebody: “I wouldn’t worry about it too much — let the system handle its files, and don’t be too concerned about the count. As long as your drive isn’t mysteriously filling up, you’re good.”

Blame for what? For not having a giant-monolithic-slow-file approach like windows, as the article implies?

What’s the fuss, Gus? Did you actually read the article?

4 Likes

With HFS+, a room can have nearly 4.3 billion flies without each consuming unreasonable amounts of space.

The noise of their wings and mouthparts as they consume the moisture from your eyeballs is another story.

3 Likes

Remember that one time, in Wisconsin, where Mayflies showed up on the radar?

Do you know where Woz was on that day?

1 Like

I wondered this myself. The best I could find was Unix philosophy - Wikipedia which suggests this grew out of a tradition of building ‘lots of little things which only do one task’ vs ‘larger things which do multiple tasks’. I believe at this point it is a matter of tradition rather than engineering intent.

In the case of the mac, file counts are also high because of the way macintosh apps and files are ‘packaged’ – though mac applications, for example, appear as single file to the user, in fact they are folders holding all of the text, foreign language translations, icons, graphics, etc. that are seen in that program. To my knowledge no other operating system does that; either these ‘support’ materials are built into the application or some other type of container is used. I think this originated during the Mac OS 9 to Mac OS X transition. Few people here probably remember this but old-school mac files used to have the idea of a separate ‘fork’ or storage area where these support data were stored, but Mac OS X does not, so “packages as folders of little files” had to be created to hold the material that used to be stored there.

2 Likes

#Macs have millions of tiny files?

4 Likes

I think I see the problem.

/former Windows user due to work requirements

Will this help?

4 Likes

I’ll let you know! I ordered it the first time Mark posted it.

1 Like

Yes, and now that Mac app packages are bundles of files, they’re more likely to be able to survive being copied through file servers that don’t support file forks. There are fewer issues with less computer-literate users sending files through means that don’t grok file forks.

Some people may remember that versions of Mac OS X before 10.5 supported booting off of UFS, which did not support file forks. There was probably considerable internal debate at Apple on if HFS+ would be viable.

So the proliferation of files is probably more about supporting the Mac OS X experience when it interacts with non-Macs/non-HFS rather than anything attributable to Unix.

1 Like

Talk about a completely information free post.

I’ve noticed from the archives that created on Macs that there seem to be all the ._ files, usually in a ._MACOS (or something) subdirectory, that seem to be thumbnail files, but I’m not even sure that they are that. They seem to be a bit too small.

Blaming it on Unix is the real WTF. I’ve run TRS-XENIX (go ahead and laugh) to IBM’s version on AIX. I’ve never seen that number of files, or those hidden subdirectory/files either. Some hidden files and directories, sure. But it really doesn’t explain Mac behavior, this is a “just cuz!” article.

Deleting those files has never caused me a problem, at least on Windows.

Trust me, I’m no fan of the old resource fork (though when I was a kid I found manipulating those resources a great way to learn about computers / cheat at computer games). Though, there are other ways they could have handled this, for example using a compressed or non-compressed zip file as a container (as do Java apps, or epub books, or some microsoft office files) which would have protected mac files even more effectively when resident on or accessed by non-mac systems.

All that said… this is all a non-explanation for a non-problem. Shouldn’t matter at all how many files are around on a modern system. Focussing on that metric encourages micromanagement and the use of dubious third-party utilities to ‘optimize’ things that don’t really need optimizing. (That said, all systems seem to have issues with ‘left behind’ data after product uninstalls; no one seems to have solved that problem in any systematic way.)

1 Like

Firstly… Why is millions of tiny files a problem? A decent file system can handle this. Some may say that a monolithic single file accessed as a database is better, but that very much depends on the use case. Each method has its pros and cons.

Secondly, there is no real link to UNIX. Even the UNIX philosophy, as mentioned by others here, doesn’t mean you’ll have millions of files. It means that the tools you use are small, focused and capable of being used in pipes that feed data from one tool to the next, to get the desired result.
That does mean you might have some more configuration files. But many of the tools don’t have - or need - any configuration files. There may be a few scripts to accomplish common tasks, but they won’t be “millions of files”.
The two tools that are blamed for the bloat - Spotlight and Time Machine - are Apple’s own software, and are (as I understand it) monolithic products. Not built in the UNIX philosophy.

The rest of the files in these “millions of files” could be due to application packaging, as someone else suspects. And a small amount of them might be hidden folders making up for a lack of forking support in OS X - but again, probably not millions. A more likely source would be caches for web browsers and media players (web browsers in particular can often have tens or hundreds of thousands of files in their caches in my experience).

All the candidates I can think of (iTunes, Safari) and the two culprits in the article (Spotlight, Time Machine) are monolithic software written by Apple. And have very little to do with UNIX except that they are running on a variant of UNIX.

And I’ll go back to my original opening - at no point does the article really cover why millions of files are a problem.

So I start thinking about the times that I’ve dealt with large numbers of files in my career, and what downsides it brought with it. I’ve moved terabytes of data from one disk to another in enterprise environments - scheduled changes that required hours or even days of downtime to accomplish successfully. I’m well aware of the overhead that the findfirst/findnext filesystem behaviour adds to a copy of many files, be it local or across the network. It’s there, but in my experience it’s negligible - it might delay the copy by a few minutes per hour of data to be copied, if you’re using an efficient program (like rsync/robocopy/richcopy).
Another downside that occurs to me is that if you need to do a disk integrity check on the filesystem, it will take longer. But filesystems like ext4 show that even that can be sped up by sane design - I’ve seen fsck times drop to a tenth of the ext3 times on the same data. So if that’s an issue, then the fault is still with Apple as they have control over the HFS+ filesystem…
The only other downside I can think of is increased disk usage due to allocation policies - the fact that a 1 byte file will probably take up an entire allocation unit, which on a >1Gb HFS+ volume will be 4KB. This is a problem that has been solved elsewhere - it’s called block suballocation, and allows a block to be divided amongst smaller files. Another possible solution is to store small files the filesystem structure itself alongside other metadata, to avoid using an allocation block for the file content at all. Again, as with integrity checking, HFS+ is Apple’s technology and not a UNIX one, so this is an Apple problem.
I can’t think of any other downsides off the top of my head…

The upsides are clear - multiple small files can be more resilient and survivable in the event of a software failure (due to crash or just a bug). If you stored all your files in one big DB, a bug in any application you use could potentially wipe out all your data. Also, if the container file on the disk for that DB gets corrupted you’re more likely to lose everything. One big DB is one big point of failure, and one big risk.

None of which is anything to do with UNIX per se. The linked article is one of the worst I’ve read in recent memory - foundationally unsound, astonishingly misleading, and unable to bear any informed rational scrutiny.

tl;dr - 0/10, would not read again.

1 Like

I read it. I also read where the author believes the number of files causes slowdowns for file system checks and backups. My point is that Unix systems do not have this problem.

I began on AS400, moved to System V and MUMPS on VAX Alpha servers, and now maintain both Debian and Windows servers. I compared the file system check and backup times to Windows servers due to the fact that most people are familiar with Windows which makes for a good common comparison.