Easier said than done.
- You can use robots.txt, but only well-behaved robots will respect that
- You can block IP ranges, which will only work with crawlers you know of, for a while.
- You can block domains. Same problem.
What I’ve done is set pretty strict throttles on traffic. If you hit more than 60 pages in a minute, you’re kicked out for 6 hours.