CS50 Part 2

09 May 2020

Did I give up on all this? Not yet! But it got hairy for a while here.

What’s New?

As the title suggests, I’ve been plugging away at CS50. Last time I posted nearly two weeks ago I was struggling through Problem Set 3. Today I managed to finish up Problem Set 6. It’s definitely been a struggle as CS50 has moved into unfamiliar territory. These are actually the topics I started this course to cover: recursion, algorithms, pointers, linked lists, hash tables, and tries. Like a lot of university-level lecture courses, CS50 flies through these topics at a very high level. I will note that it’s really very high quality teaching.

But it’s also extremely introductory- which as Malan notes is the goal, comfort comes later with practice. But it’s been gratifying to mix in recursion more and more, and I’m certainly looking forward to tackling hashing, searching, trees, and tries.

Specifics

So Submit50 recently added a date tracker to view most recent submission per problem. I can see for example, that I typically spent:

So what were the major themes?

  1. The problem sets are deceptive: typically they have a fairly simple objective, but reaching the objective is complicated in terms of both how it’s done, and how it’s implemented. So there’s a lot of time spent on learning the problem parameters in and of itself. Basically, you get one problem that does something like categorization or counting, but requires a bunch of intermediate steps.
  2. The problem sets quickly started breaking down what I call the “plan-implementation relationship”. Roadblocks at a basic level are either: a. “Plan”: I have no idea how I should do something. b. “Implementation”: where I have no idea how to program my plan.

In the former, I’ve run through all the techniques and things I’ve done before to think if I know how to program a solution and fallen short. In the latter, I’ve typically come up with an idea and then realized I don’t know how to do it in programming. But fundamentally there’s a higher level of roadblock where I have a concept I know would work, but I’m not familiar enough with the concept to know what I need to implement, or how to implement it.

A Fundamental Roadblock

Speller had a great example of this (nothing but examples of this!), but let’s talk about hash tables in specific. The basic goal is to create a super simple spell check, by loading a dictionary into a hash table, and then checking words in a document against it. A hash table isn’t strictly necessary, a trie could be used. But I’m taking baby steps here.

One of the larger struggles I had was understanding the structure of the hash table itself. Since I was using a hash function that I couldn’t guarantee would avoid hash collisions, the hash table had to hold linked lists of words. Structurally in C, would’ve expected an array of pointers, which in the end was exactly what it was, but when I was reading the code provided for setup I just couldn’t recognize it. I ended up spending a fair amount of time watching videos or reading around reddit and stackexchange, scratching away at some paper, thinking I understood what was happening, coding until I ran into bugs that didn’t make sense, then going back to my paper pad and starting the cycle over again.

The problem is that if I fundamentally misunderstand how a hash table of linked lists would look like, then I’m going to mess up how to load words into it, how to search the hash table, and how to free the hash table from memory. This was a more fundamental level problem, where I could write high level pseudocode (e.g. “insert a word into the hash table”), but not lower level pseudocode describing the exact text(e.g. “hash table at hash index is, in fact, a pointer that needs to be pointed at the initialized node”). If I can’t write the pseudocode, how can I write the code? Moreover, I have no experience with code to inform the “plan” that the lower level pseudocode represents.

This is typically the most frustrating and most informative work- where the groundwork has to be laid when first putting a concept into practice and there’s no experience to draw on.

Concluding

I did end up finishing speller. Did pretty well! Out of 914 submissions, I’m currently in 47th place with a total runtime of 7.138s(bernardkung. For the record, that’s ahead of the “official” staff solution in 307th place at 7.903s. I did do some light tweaking: I did see some suggestions like playing around with the size of the hash table which I played around with to drop nearly 200 places. I also saw other suggestions like moving previously checked words to the fronts of linked lists on the idea that it’s a “high frequency” word. I think if I were going forward with this though, I would try to incorporate word length when inserting words into the linked lists/hash table, just based on the idea that shorter words (“of”, “the”, “a”, “and”, etc) are used much more frequently and putting them up front would help search times. I also think that the next step would be to process the significant corpi of texts to start counting word frequencies and see if I could glean any insight from that on word frequency by traits like first letter or word length. All of this would be tied up in the hashing function I suspect.

Anyways, after finishing up speller/pset 5 this morning, I shot through Pset6 on Python, finishing up a couple hours ago. Week 7 is SQL, which I expect I can finish up tomorrow, and then I’m into the final project tracks. I’m not 100% sure on how much I’ll go forward with that. At this point, my meta goal is to develop good foundation, then build on it. Depending on this final project complexity, it might be better to move onto more in-depth web development course, rather than kludge something together just to “finish” this course. Especially since I never plan on shelling out for a certificate of completion. Take what I need, not what I’m given.