I used to do this, but imho the used language is hardly a useful index. When does it happen that you want to see everything written python? For me that’s never.
Also where do you put multi-language projects? Like, go backend with typescript frontend or whatever.
git LFS might be for you. If the data takes so long to reprocess I think it is fine to check it in (possibly using LFS).